- 18 Dec, 2012 40 commits
-
-
Ed Cashin authored
This change avoids a race that could result in a NULL pointer derference following a WARNing from kobject_add_internal, "don't try to register things with the same name in the same directory." The problem was found with a test that forgets and discovers an aoe device in a loop: while test ! -r /tmp/stop; do aoe-flush -a aoe-discover done The race was between aoedev_flush taking aoedevs out of the devlist, allowing a new discovery of the same AoE target to take place before the driver gets around to calling sysfs_remove_group. Fixing that one revealed another race between do_open and add_disk, and this patch avoids that, too. The fix required some care, because for flushing (forgetting) an aoedev, some of the steps must be performed under lock and some must be able to sleep. Also, for discovering a new aoedev, some steps might sleep. The check for a bad aoedev pointer remains from a time when about half of this patch was done, and it was possible for the bdev->bd_disk->private_data to become corrupted. The check should be removed eventually, but it is not expected to add significant overhead, occurring in the aoeblk_open routine. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
An AoE target can have multiple network ports used for AoE, and in the aoe driver, those are tracked by the aoetgt struct. These changes allow the aoe driver to handle network paths, or aoetgts, that are not working well, compared to the others. Paths that do not get responses despite the retransmission of AoE commands are marked as "tainted", and non-tainted paths are preferred. Meanwhile, the aoe driver attempts to "probe" the tainted path in the background by issuing reads of LBA 0 that are padded out to full (possibly jumbo-frame) size. If the probes get responses, then the path is "redeemed", and its taint is removed. This mechanism has been shown to be helpful in transparently handling and recovering from real-world network "brown outs" in ways that the earlier "shoot the help-needing target in the head" mechanism could not. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
The value returned by the static minor device number number allocator is the real minor number, so it must be multiplied by the supported number of partitions per aoedev. Without this fix the support for systems without udev is incomplete, and the few users of aoe on such systems will have surprising results when device nodes names do not match the AoE target. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Because the minor_get and related functions use the return values for errors, the compiler doesn't know that sysminor will always either 1) be initialized in aoedev_by_aoeaddr by the call to minor_get, or 2) be unused as the "goto out" is executed. This patch avoids the compiler warning. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
For some special-purpose systems where udev isn't present, static allocation of minor numbers is desirable. This update distinguishes different failure scenarios, to help the user understand what went wrong. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
There is no need to call the request handler function in the I/O completion routine. The user impact of not doing it is a more "nice" aoe driver that is less susceptible to causing soft lockups. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
A misplaced comment was attached to the nout member of the aoetgt. This change corrects the comment. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
The aoe driver will never be waiting for more than aoe_maxout AoE commands from a given remote network port on an AoE target. Increasing the cap increases performance. Users can tighten the setting to reduce the amount of memory used for handling AoE traffic or the network bandwidth used for AoE. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Before the aoe driver was an I/O request handler, it was a make_request-style block driver. Even so, there was a problem where sysfs expected a request queue to exist, so one was provided in commit 7135a71b ("aoe: allocate unused request_queue for sysfs"). During the transition to the request-handler style, a patch was merged that was based on a driver without the noop queue, and the noop queue remained in place after the patch was merged, even though a new functional queue was introduced by the patch, allocated through blk_init_queue. The user impact is a memory leak proportional to the number of AoE targets discovered. This patch removes the memory leak and cleans up vestiges of the old do-nothing queue from the aoeblk_gdalloc function. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Commit f3b8e07af774 ("aoe: commands in retransmit queue use new destination on failure") omits the copying of the coarse-grained time when an AoE command was sent during the failover from one destination MAC address on the AoE target to another. The coarse-grained timing is only used when the system time changes or an unlikely length of time has passed since the sending of the AoE command. Users will not be impacted unless their system clock is very inaccurate or something unusual (e.g., 10 GbE link reset) happens during the period when the aoe driver is handling the failure of a port on the AoE target. Being effected will mean that an AoE target could be considered "down" too eagerly. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
When one remote MAC address isn't working as a destination for AoE commands, the frames used to track information associated with the AoE commands are moved to a new aoetgt (defined by the tuple of {AoE major, AoE minor, target MAC address}). This patch makes sure that the frames on the queue for retransmits that need to be done are updated to use the new destination, so that retransmits will be sent through a working network path. Without this change, packets on the retransmit queue will be needlessly retransmitted to the unresponsive destination MAC, possibly causing premature target failure before there's time for the retransmit timer to run again, decide to retransmit again, and finally update the destination to a working MAC address on the AoE target. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
These changes improve the accuracy of the decision about whether it's time to retransmit an AoE command by using the microsecond-resolution gettimeofday instead of jiffies. Because the system time can jump suddenly, the decision reverts to using jiffies if the high-resolution time difference is relatively large. Otherwise the AoE targets could be considered failed inappropriately. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
With this bugfix in place the calculation of the criterion for "lateness" is performed under lock. Without the lock, there is a chance that one of the non-atomic operations performed on the round trip time statistics could be incomplete, such that an incorrect lateness criterion would be calculated. Without this change, the effect of the bug would be rare unecessary but benign retransmissions. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
The /dev/etherd/err character device provides low-level information about normal but sometimes interesting AoE command retransmits and "unexpected responses", i.e., responses for packets that have already been retransmitted. This change adds MAC addresses to the messages about unexpected responses, so that when they occur, it's more easy to determine the network paths to which they belong. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
The aoe driver already had some congestion handling, but it was limited in its ability to cope with the kind of congestion that can arise on more complex networks such as those involving paths through multiple ethernet switches. Some of the lessons from TCP's history of development can be applied to improving the congestion control and avoidance on AoE storage networks. These changes use familar concepts from Van Jacobson's "Congestion Avoidance and Control" paper from '88, without adding significant overhead. This patch depends on an upcoming patch that covers the failover case when AoE commands being retransmitted are transferred from one retransmit queue to another. Another upcoming patch increases the timing accuracy. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Make the aoe driver follow expected behavior when the user uses ioctl to get the ATA device identify information, allowing access to model, serial number, etc. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
The userland aoetools package includes an "aoe-stat" command that can display a "payload size" column when the aoe driver exports this information. Users can quickly see what amount of user data is transferred inside each AoE command on the network, network headers excluded. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
The GPFS filesystem is an example of an aoe user that requires the aoe driver to support I/O request sizes larger than the default. Most users will not need large I/O request sizes, because they would need to be split up into multiple AoE commands anyway. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Users sometimes want to cause the aoe driver to forget a particular previously discovered device when it is no longer online. The aoetools provide an "aoe-flush" command that users run to perform this administrative task. The changes below provide the support needed in the driver. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
The ATA over Ethernet config query response contains a "buffer count" field reflecting the AoE target's capacity to buffer incoming AoE commands. By taking the current value of this field into accound, we increase performance throughput or avoid network congestion, when the value has increased or decreased, respectively. Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Dropped transmits are not common, but when they do occur, increasing the transmit queue length often helps. Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Ed Cashin authored
The context feature of sparse is used with the Linux kernel sources to check for imbalanced uses of locks. Document the annotations defined in include/linux/compiler.h that tell sparse what to expect when a lock is held on function entry, exit, or both. Signed-off-by: Ed Cashin <ecashin@coraid.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Christopher Li <sparse@chrisli.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Josh Triplett authored
linux/compiler.h has macros to denote functions that acquire or release locks, but not to denote functions called with a lock held that return with the lock still held. Add a __must_hold macro to cover that case. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Ed Cashin <ecashin@coraid.com> Tested-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Gao feng authored
Since commit 1cdcbec1 ("CRED: Neuter sys_capset()") is_container_init() has no callers. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <jmorris@namei.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
To avoid an explosion of request_module calls on a chain of abusive scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon as maximum recursion depth is hit, the error will fail all the way back up the chain, aborting immediately. This also has the side-effect of stopping the user's shell from attempting to reexecute the top-level file as a shell script. As seen in the dash source: if (cmd != path_bshell && errno == ENOEXEC) { *argv-- = cmd; *argv = cmd = path_bshell; goto repeat; } The above logic was designed for running scripts automatically that lacked the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC, things continue to behave as the shell expects. Additionally, when tracking recursion, the binfmt handlers should not be involved. The recursion being tracked is the depth of calls through search_binary_handler(), so that function should be exclusively responsible for tracking the depth. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Artem Bityutskiy authored
We display a list of supplementary group for each process in /proc/<pid>/status. However, we show only the first 32 groups, not all of them. Although this is rare, but sometimes processes do have more than 32 supplementary groups, and this kernel limitation breaks user-space apps that rely on the group list in /proc/<pid>/status. Number 32 comes from the internal NGROUPS_SMALL macro which defines the length for the internal kernel "small" groups buffer. There is no apparent reason to limit to this value. This patch removes the 32 groups printing limit. The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX, which is currently set to 65536. And this is the maximum count of groups we may possibly print. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
It is currently impossible to examine the state of seccomp for a given process. While attaching with gdb and attempting "call prctl(PR_GET_SECCOMP,...)" will work with some situations, it is not reliable. If the process is in seccomp mode 1, this query will kill the process (prctl not allowed), if the process is in mode 2 with prctl not allowed, it will similarly be killed, and in weird cases, if prctl is filtered to return errno 0, it can look like seccomp is disabled. When reviewing the state of running processes, there should be a way to externally examine the seccomp mode. ("Did this build of Chrome end up using seccomp?" "Did my distro ship ssh with seccomp enabled?") This adds the "Seccomp" line to /proc/$pid/status. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Morris <jmorris@namei.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Cyrill Gorcunov authored
During c/r sessions we've found that there is no way at the moment to fetch some VMA associated flags, such as mlock() and madvise(). This leads us to a problem -- we don't know if we should call for mlock() and/or madvise() after restore on the vma area we're bringing back to life. This patch intorduces a new field into "smaps" output called VmFlags, where all set flags associated with the particular VMA is shown as two letter mnemonics. [ Strictly speaking for c/r we only need mlock/madvise bits but it has been said that providing just a few flags looks somehow inconsistent. So all flags are here now. ] This feature is made available on CONFIG_CHECKPOINT_RESTORE=n kernels, as other applications may start to use these fields. The data is encoded in a somewhat awkward two letters mnemonic form, to encourage userspace to be prepared for fields being added or removed in the future. [a.p.zijlstra@chello.nl: props to use for_each_set_bit] [sfr@canb.auug.org.au: props to use array instead of struct] [akpm@linux-foundation.org: overall redesign and simplification] [akpm@linux-foundation.org: remove unneeded braces per sfr, avoid using bloaty for_each_set_bit()] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Vagin authored
Without this patch it is really hard to interpret a bounding set, if CAP_LAST_CAP is unknown for a current kernel. Non-existant capabilities can not be deleted from a bounding set with help of prctl. E.g.: Here are two examples without/with this patch. CapBnd: ffffffe0fdecffff CapBnd: 00000000fdecffff I suggest to hide non-existent capabilities. Here is two reasons. * It's logically and easier for using. * It helps to checkpoint-restore capabilities of tasks, because tasks can be restored on another kernel, where CAP_LAST_CAP is bigger. Signed-off-by: Andrew Vagin <avagin@openvz.org> Cc: Andrew G. Morgan <morgan@kernel.org> Reviewed-by: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Ptrace jailers want to be sure that the tracee can never escape from the control. However if the tracer dies unexpectedly the tracee continues to run in potentially unsafe mode. Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits it sends SIGKILL to every tracee which has this bit set. Note that the new option is not equal to the last-option << 1. Because currently all options have an event, and the new one starts the eventless group. It uses the random 20 bit, so we have the room for 12 more events, but we can also add the new eventless options below this one. Suggested by Amnon Shiloh. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Amnon Shiloh <u3557@miso.sublimeip.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Chris Evans <scarybeasts@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Eldad Zack authored
Update the documentation for simple_strto* to reflect that it has been obsoleted and advise the usage of kstrto*. Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Joe Perches <joe@perches.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Eldad Zack authored
As Bruce Fields pointed out, kstrto* is currently lacking kerneldoc comments. This patch adds kerneldoc comments to common variants of kstrto*: kstrto(u)l, kstrto(u)ll and kstrto(u)int. Signed-off-by: Eldad Zack <eldad@fogrefinery.com> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Joe Perches <joe@perches.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jarkko Sakkinen authored
keys-ecryptfs.txt was missing from 00-INDEX. Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Dave Reisner authored
Option parsing code expects an unsigned integer for the codepage option, but prefixes and stores this option with "cp" before passing to load_nls(). This makes the displayed option in /proc an invalid one. Strip the prefix when printing so that the displayed option is valid for reuse. Signed-off-by: Dave Reisner <dreisner@archlinux.org> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jan Kara authored
parse_options() is supposed to return value < 0 on error however we returned 0 (success) in a lot of cases. This actually was not a problem in practice because match_token() used by parse_options() is clever and catches most of the problems for us. Signed-off-by: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-