1. 31 Oct, 2005 40 commits
    • Oleg Nesterov's avatar
      [PATCH] coredump_wait() cleanup · 2384f55f
      Oleg Nesterov authored
      This patch deletes pointless code from coredump_wait().
      
      1. It does useless mm->core_waiters inc/dec under mm->mmap_sem,
         but any changes to ->core_waiters have no effect until we drop
         ->mmap_sem.
      
      2. It calls yield() for absolutely unknown reason.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2384f55f
    • Coywolf Qi Hunt's avatar
      [PATCH] PF_DEAD cleanup · 7407251a
      Coywolf Qi Hunt authored
      The PF_DEAD setting doesn't belong to exit_notify(), move it to a proper
      place.
      Signed-off-by: default avatarCoywolf Qi Hunt <qiyong@fc-cn.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7407251a
    • Jesper Juhl's avatar
      [PATCH] cleanup for kernel/printk.c · 40dc5651
      Jesper Juhl authored
      - Removes some trailing whitespace
      
      - Breaks long lines and make other small changes to conform to CodingStyle
      
      - Add explicit printk loglevels in two places.
      Signed-off-by: default avatarJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      40dc5651
    • Jesper Juhl's avatar
      [PATCH] ide-cd mini cleanup of casts · 2a91f3e5
      Jesper Juhl authored
      Remove some unneeded casts.
      Avoid an assignment in the case of kmalloc failure.
      Break a few instances of  if (foo) whatever;  into two lines.
      Signed-off-by: default avatarJesper Juhl <jesper.juhl@gmail.com>
      Acked-by: default avatarJens Axboe <axboe@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2a91f3e5
    • David Howells's avatar
      [PATCH] Keys: Get rid of warning in kmod.c if keys disabled · 20e1129a
      David Howells authored
      The attached patch gets rid of a "statement without effect" warning when
      CONFIG_KEYS is disabled by making use of the return value of key_get().
      The compiler will optimise all of this away when keys are disabled.
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      20e1129a
    • David Howells's avatar
      [PATCH] Keys: Add LSM hooks for key management [try #3] · 29db9190
      David Howells authored
      The attached patch adds LSM hooks for key management facilities. The notable
      changes are:
      
       (1) The key struct now supports a security pointer for the use of security
           modules. This will permit key labelling and restrictions on which
           programs may access a key.
      
       (2) Security modules get a chance to note (or abort) the allocation of a key.
      
       (3) The key permission checking can now be enhanced by the security modules;
           the permissions check consults LSM if all other checks bear out.
      
       (4) The key permissions checking functions now return an error code rather
           than a boolean value.
      
       (5) An extra permission has been added to govern the modification of
           attributes (UID, GID, permissions).
      
      Note that there isn't an LSM hook specifically for each keyctl() operation,
      but rather the permissions hook allows control of individual operations based
      on the permission request bits.
      
      Key management access control through LSM is enabled by automatically if both
      CONFIG_KEYS and CONFIG_SECURITY are enabled.
      
      This should be applied on top of the patch ensubjected:
      
      	[PATCH] Keys: Possessor permissions should be additive
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarChris Wright <chrisw@osdl.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      29db9190
    • David Howells's avatar
      [PATCH] Keys: Export user-defined keyring operations · 2aa349f6
      David Howells authored
      Export user-defined key operations so that those who wish to define their
      own key type based on the user-defined key operations may do so (as has
      been requested).
      
      The header file created has been placed into include/keys/user-type.h, thus
      creating a directory where other key types may also be placed.  Any
      objections to doing this?
      Signed-Off-By: default avatarDavid Howells <dhowells@redhat.com>
      Signed-Off-By: default avatarArjan van de Ven <arjan@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2aa349f6
    • Tejun Heo's avatar
      [PATCH] vm: remove unused/broken page_pte[_prot] macros · 1426d7a8
      Tejun Heo authored
      This patch removes page_pte_prot and page_pte macros from all
      architectures.  Some architectures define both, some only page_pte (broken)
      and others none.  These macros are not used anywhere.
      
      page_pte_prot(page, prot) is identical to mk_pte(page, prot) and
      page_pte(page) is identical to page_pte_prot(page, __pgprot(0)).
      
      * The following architectures define both page_pte_prot and page_pte
      
        arm, arm26, ia64, sh64, sparc, sparc64
      
      * The following architectures define only page_pte (broken)
      
        frv, i386, m32r, mips, sh, x86-64
      
      * All other architectures define neither
      Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1426d7a8
    • Tejun Heo's avatar
      [PATCH] vm: remove redundant assignment from __pagevec_release_nonlru() · c7e9dd4d
      Tejun Heo authored
      This patch removes redundant assignment from __pagevec_release_nonlru().
      pages_to_free.cold is set to pvec->cold by pagevec_init() call right above
      the assignment.
      Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c7e9dd4d
    • Tejun Heo's avatar
      [PATCH] fs: error case fix in __generic_file_aio_read · 39e88ca2
      Tejun Heo authored
      When __generic_file_aio_read() hits an error during reading, it reports the
      error iff nothing has successfully been read yet.  This is condition - when
      an error occurs, if nothing has been read/written, report the error code;
      otherwise, report the amount of bytes successfully transferred upto that
      point.
      
      This corner case can be exposed by performing readv(2) with the following
      iov.
      
       iov[0] = len0 @ ptr0
       iov[1] = len1 @ NULL (or any other invalid pointer)
       iov[2] = len2 @ ptr2
      
      When file size is enough, performing above readv(2) results in
      
       len0 bytes from file_pos @ ptr0
       len2 bytes from file_pos + len0 @ ptr2
      
      And the return value is len0 + len2.  Test program is attached to this
      mail.
      
      This patch makes __generic_file_aio_read()'s error handling identical to
      other functions.
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <sys/uio.h>
      #include <errno.h>
      #include <string.h>
      
      int main(int argc, char **argv)
      {
      	const char *path;
      	struct stat stbuf;
      	size_t len0, len1;
      	void *buf0, *buf1;
      	struct iovec iov[3];
      	int fd, i;
      	ssize_t ret;
      
      	if (argc < 2) {
      		fprintf(stderr, "Usage: testreadv path (better be a "
      			"small text file)\n");
      		return 1;
      	}
      	path = argv[1];
      
      	if (stat(path, &stbuf) < 0) {
      		perror("stat");
      		return 1;
      	}
      
      	len0 = stbuf.st_size / 2;
      	len1 = stbuf.st_size - len0;
      
      	if (!len0 || !len1) {
      		fprintf(stderr, "Dude, file is too small\n");
      		return 1;
      	}
      
      	if ((fd = open(path, O_RDONLY)) < 0) {
      		perror("open");
      		return 1;
      	}
      
      	if (!(buf0 = malloc(len0)) || !(buf1 = malloc(len1))) {
      		perror("malloc");
      		return 1;
      	}
      
      	memset(buf0, 0, len0);
      	memset(buf1, 0, len1);
      
      	iov[0].iov_base = buf0;
      	iov[0].iov_len = len0;
      	iov[1].iov_base = NULL;
      	iov[1].iov_len = len1;
      	iov[2].iov_base = buf1;
      	iov[2].iov_len = len1;
      
      	printf("vector ");
      	for (i = 0; i < 3; i++)
      		printf("%p:%zu ", iov[i].iov_base, iov[i].iov_len);
      	printf("\n");
      
      	ret = readv(fd, iov, 3);
      	if (ret < 0)
      		perror("readv");
      
      	printf("readv returned %zd\nbuf0 = [%s]\nbuf1 = [%s]\n",
      	       ret, (char *)buf0, (char *)buf1);
      
      	return 0;
      }
      Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      39e88ca2
    • Andrea Arcangeli's avatar
      [PATCH] ptrace/coredump/exit_group deadlock · 30e0fca6
      Andrea Arcangeli authored
      I could seldom reproduce a deadlock with a task not killable in T state
      (TASK_STOPPED, not TASK_TRACED) by attaching a NPTL threaded program to
      gdb, by segfaulting the task and triggering a core dump while some other
      task is executing exit_group and while one task is in ptrace_attached
      TASK_STOPPED state (not TASK_TRACED yet).  This originated from a gdb
      bugreport (the fact gdb was segfaulting the task wasn't a kernel bug), but
      I just incidentally noticed the gdb bug triggered a real kernel bug as
      well.
      
      Most threads hangs in exit_mm because the core_dumping is still going, the
      core dumping hangs because the stopped task doesn't exit, the stopped task
      can't wakeup because it has SIGNAL_GROUP_EXIT set, hence the deadlock.
      
      To me it seems that the problem is that the force_sig_specific(SIGKILL) in
      zap_threads is a noop if the task has PF_PTRACED set (like in this case
      because gdb is attached).  The __ptrace_unlink does nothing because the
      signal->flags is set to SIGNAL_GROUP_EXIT|SIGNAL_STOP_DEQUEUED (verified).
      
      The above info also shows that the stopped task hit a race and got the stop
      signal (presumably by the ptrace_attach, only the attach, state is still
      TASK_STOPPED and gdb hangs waiting the core before it can set it to
      TASK_TRACED) after one of the thread invoked the core dump (it's the core
      dump that sets signal->flags to SIGNAL_GROUP_EXIT).
      
      So beside the fact nobody would wakeup the task in __ptrace_unlink (the
      state is _not_ TASK_TRACED), there's a secondary problem in the signal
      handling code, where a task should ignore the ptrace-sigstops as long as
      SIGNAL_GROUP_EXIT is set (or the wakeup in __ptrace_unlink path wouldn't be
      enough).
      
      So I attempted to make this patch that seems to fix the problem.  There
      were various ways to fix it, perhaps you prefer a different one, I just
      opted to the one that looked safer to me.
      
      I also removed the clearing of the stopped bits from the zap_other_threads
      (zap_other_threads was safe unlike zap_threads).  I don't like useless
      code, this whole NPTL signal/ptrace thing is already unreadable enough and
      full of corner cases without confusing useless code into it to make it even
      less readable.  And if this code is really needed, then you may want to
      explain why it's not being done in the other paths that sets
      SIGNAL_GROUP_EXIT at least.
      
      Even after this patch I still wonder who serializes the read of
      p->ptrace in zap_threads.
      
      Patch is called ptrace-core_dump-exit_group-deadlock-1.
      
      This was the trace I've got:
      
      test          T ffff81003e8118c0     0 14305      1         14311 14309 (NOTLB)
      ffff810058ccdde8 0000000000000082 000001f4000037e1 ffff810000000013
             00000000000000f8 ffff81003e811b00 ffff81003e8118c0 ffff810011362100
             0000000000000012 ffff810017ca4180
      Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff80141677>{finish_stop+87}
             <ffffffff8014367f>{get_signal_to_deliver+1359} <ffffffff8010d3ad>{do_signal+157}
             <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80111575>{sys_ptrace+2293}
             <ffffffff80131810>{default_wake_function+0} <ffffffff80196399>{sys_ioctl+73}
             <ffffffff8010dd27>{sysret_signal+28} <ffffffff8010e00f>{ptregscall_common+103}
      
      test          D ffff810011362100     0 14309      1         14305 14312 (NOTLB)
      ffff810053c81cf8 0000000000000082 0000000000000286 0000000000000001
             0000000000000195 ffff810011362340 ffff810011362100 ffff81002e338040
             ffff810001e0ca80 0000000000000001
      Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff8044677d>{wait_for_completion+173}
             <ffffffff80131810>{default_wake_function+0} <ffffffff80137435>{exit_mm+149}
             <ffffffff801381af>{do_exit+479} <ffffffff80138d0c>{do_group_exit+252}
             <ffffffff801436db>{get_signal_to_deliver+1451} <ffffffff8010d3ad>{do_signal+157}
             <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80140850>{specific_send_sig_info+2
      
             <ffffffff8014208a>{force_sig_info+186} <ffffffff804479a0>{do_int3+112}
             <ffffffff8010e308>{retint_signal+61}
      test          D ffff81002e338040     0 14311      1         14716 14305 (NOTLB)
      ffff81005ca8dcf8 0000000000000082 0000000000000286 0000000000000001
             0000000000000120 ffff81002e338280 ffff81002e338040 ffff8100481cb740
             ffff810001e0ca80 0000000000000001
      Call Trace:<ffffffff801317ed>{try_to_wake_up+893} <ffffffff8044677d>{wait_for_completion+173}
             <ffffffff80131810>{default_wake_function+0} <ffffffff80137435>{exit_mm+149}
             <ffffffff801381af>{do_exit+479} <ffffffff80142d0e>{__dequeue_signal+558}
             <ffffffff80138d0c>{do_group_exit+252} <ffffffff801436db>{get_signal_to_deliver+1451}
             <ffffffff8010d3ad>{do_signal+157} <ffffffff8013deee>{ptrace_check_attach+222}
             <ffffffff80140850>{specific_send_sig_info+208} <ffffffff8014208a>{force_sig_info+186}
             <ffffffff804479a0>{do_int3+112} <ffffffff8010e308>{retint_signal+61}
      
      test          D ffff810017ca4180     0 14312      1         14309 13882 (NOTLB)
      ffff81005d15fcb8 0000000000000082 ffff81005d15fc58 ffffffff80130816
             0000000000000897 ffff810017ca43c0 ffff810017ca4180 ffff81003e8118c0
             0000000000000082 ffffffff801317ed
      Call Trace:<ffffffff80130816>{activate_task+150} <ffffffff801317ed>{try_to_wake_up+893}
             <ffffffff8044677d>{wait_for_completion+173} <ffffffff80131810>{default_wake_function+0}
             <ffffffff8018cdc3>{do_coredump+819} <ffffffff80445f52>{thread_return+82}
             <ffffffff801436d4>{get_signal_to_deliver+1444} <ffffffff8010d3ad>{do_signal+157}
             <ffffffff8013deee>{ptrace_check_attach+222} <ffffffff80140850>{specific_send_sig_info+2
      
             <ffffffff804472e5>{_spin_unlock_irqrestore+5} <ffffffff8014208a>{force_sig_info+186}
             <ffffffff804476ff>{do_general_protection+159} <ffffffff8010e308>{retint_signal+61}
      Signed-off-by: default avatarAndrea Arcangeli <andrea@suse.de>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Linus Torvalds <torvalds@osdl.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      30e0fca6
    • Paul Jackson's avatar
      [PATCH] cpusets: automatic numa mempolicy rebinding · 68860ec1
      Paul Jackson authored
      This patch automatically updates a tasks NUMA mempolicy when its cpuset
      memory placement changes.  It does so within the context of the task,
      without any need to support low level external mempolicy manipulation.
      
      If a system is not using cpusets, or if running on a system with just the
      root (all-encompassing) cpuset, then this remap is a no-op.  Only when a
      task is moved between cpusets, or a cpusets memory placement is changed
      does the following apply.  Otherwise, the main routine below,
      rebind_policy() is not even called.
      
      When mixing cpusets, scheduler affinity, and NUMA mempolicies, the
      essential role of cpusets is to place jobs (several related tasks) on a set
      of CPUs and Memory Nodes, the essential role of sched_setaffinity is to
      manage a jobs processor placement within its allowed cpuset, and the
      essential role of NUMA mempolicy (mbind, set_mempolicy) is to manage a jobs
      memory placement within its allowed cpuset.
      
      However, CPU affinity and NUMA memory placement are managed within the
      kernel using absolute system wide numbering, not cpuset relative numbering.
      
      This is ok until a job is migrated to a different cpuset, or what's the
      same, a jobs cpuset is moved to different CPUs and Memory Nodes.
      
      Then the CPU affinity and NUMA memory placement of the tasks in the job
      need to be updated, to preserve their cpuset-relative position.  This can
      be done for CPU affinity using sched_setaffinity() from user code, as one
      task can modify anothers CPU affinity.  This cannot be done from an
      external task for NUMA memory placement, as that can only be modified in
      the context of the task using it.
      
      However, it easy enough to remap a tasks NUMA mempolicy automatically when
      a task is migrated, using the existing cpuset mechanism to trigger a
      refresh of a tasks memory placement after its cpuset has changed.  All that
      is needed is the old and new nodemask, and notice to the task that it needs
      to rebind its mempolicy.  The tasks mems_allowed has the old mask, the
      tasks cpuset has the new mask, and the existing
      cpuset_update_current_mems_allowed() mechanism provides the notice.  The
      bitmap/cpumask/nodemask remap operators provide the cpuset relative
      calculations.
      
      This patch leaves open a couple of issues:
      
       1) Updating vma and shmfs/tmpfs/hugetlbfs memory policies:
      
          These mempolicies may reference nodes outside of those allowed to
          the current task by its cpuset.  Tasks are migrated as part of jobs,
          which reside on what might be several cpusets in a subtree.  When such
          a job is migrated, all NUMA memory policy references to nodes within
          that cpuset subtree should be translated, and references to any nodes
          outside that subtree should be left untouched.  A future patch will
          provide the cpuset mechanism needed to mark such subtrees.  With that
          patch, we will be able to correctly migrate these other memory policies
          across a job migration.
      
       2) Updating cpuset, affinity and memory policies in user space:
      
          This is harder.  Any placement state stored in user space using
          system-wide numbering will be invalidated across a migration.  More
          work will be required to provide user code with a migration-safe means
          to manage its cpuset relative placement, while preserving the current
          API's that pass system wide numbers, not cpuset relative numbers across
          the kernel-user boundary.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      68860ec1
    • Paul Jackson's avatar
      [PATCH] cpusets: bitmap and mask remap operators · fb5eeeee
      Paul Jackson authored
      In the forthcoming task migration support, a key calculation will be
      mapping cpu and node numbers from the old set to the new set while
      preserving cpuset-relative offset.
      
      For example, if a task and its pages on nodes 8-11 are being migrated to
      nodes 24-27, then pages on node 9 (the 2nd node in the old set) should be
      moved to node 25 (the 2nd node in the new set.)
      
      As with other bitmap operations, the proper way to code this is to provide
      the underlying calculation in lib/bitmap.c, and then to provide the usual
      cpumask and nodemask wrappers.
      
      This patch provides that.  These operations are termed 'remap' operations.
      Both remapping a single bit and a set of bits is supported.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      fb5eeeee
    • Paul Jackson's avatar
      [PATCH] cpusets: confine pdflush to its cpuset · 28a42b9e
      Paul Jackson authored
      This patch keeps pdflush daemons on the same cpuset as their parent, the
      kthread daemon.
      
      Some large NUMA configurations put as much as they can of kernel threads
      and other classic Unix load in what's called a bootcpuset, keeping the rest
      of the system free for dedicated jobs.
      
      This effort is thwarted by pdflush, which dynamically destroys and
      recreates pdflush daemons depending on load.
      
      It's easy enough to force the originally created pdflush deamons into the
      bootcpuset, at system boottime.  But the pdflush threads created later were
      allowed to run freely across the system, due to the necessary line in their
      startup kthread():
      
              set_cpus_allowed(current, CPU_MASK_ALL);
      
      By simply coding pdflush to start its threads with the cpus_allowed
      restrictions of its cpuset (inherited from kthread, its parent) we can
      ensure that dynamically created pdflush threads are also kept in the
      bootcpuset.
      
      On systems w/o cpusets, or w/o a bootcpuset implementation, the following
      will have no affect, leaving pdflush to run on any CPU, as before.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      28a42b9e
    • Paul Jackson's avatar
      [PATCH] cpusets: simple rename · 18a19cb3
      Paul Jackson authored
      Add support for renaming cpusets.  Only allow simple rename of cpuset
      directories in place.  Don't allow moving cpusets elsewhere in hierarchy or
      renaming the special cpuset files in each cpuset directory.
      
      The usefulness of this simple rename became apparent when developing task
      migration facilities.  It allows building a second cpuset hierarchy using
      new names and containing new CPUs and Memory Nodes, moving tasks from the
      old to the new cpusets, removing the old cpusets, and then renaming the new
      cpusets to be just like the old names, so that any knowledge that the tasks
      had of their cpuset names will still be valid.
      
      Leaf node cpusets can be migrated to other CPUs or Memory Nodes by just
      updating their 'cpus' and 'mems' files, but because no cpuset can contain
      CPUs or Nodes not in its parent cpuset, one cannot do this in a cpuset
      hierarchy without first expanding all the non-leaf cpusets to contain the
      union of both the old and new CPUs and Nodes, which would obfuscate the
      one-to-one migration of a task from one cpuset to another required to
      correctly migrate the physical page frames currently allocated to that
      task.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      18a19cb3
    • Paul Jackson's avatar
      [PATCH] cpusets: dual semaphore locking overhaul · 053199ed
      Paul Jackson authored
      Overhaul cpuset locking.  Replace single semaphore with two semaphores.
      
      The suggestion to use two locks was made by Roman Zippel.
      
      Both locks are global.  Code that wants to modify cpusets must first
      acquire the exclusive manage_sem, which allows them read-only access to
      cpusets, and holds off other would-be modifiers.  Before making actual
      changes, the second semaphore, callback_sem must be acquired as well.  Code
      that needs only to query cpusets must acquire callback_sem, which is also a
      global exclusive lock.
      
      The earlier problems with double tripping are avoided, because it is
      allowed for holders of manage_sem to nest the second callback_sem lock, and
      only callback_sem is needed by code called from within __alloc_pages(),
      where the double tripping had been possible.
      
      This is not quite the same as a normal read/write semaphore, because
      obtaining read-only access with intent to change must hold off other such
      attempts, while allowing read-only access w/o such intention.  Changing
      cpusets involves several related checks and changes, which must be done
      while allowing read-only queries (to avoid the double trip), but while
      ensuring nothing changes (holding off other would be modifiers.)
      
      This overhaul of cpuset locking also makes careful use of task_lock() to
      guard access to the task->cpuset pointer, closing a couple of race
      conditions noticed while reading this code (thanks, Roman).  I've never
      seen these races fail in any use or test.
      
      See further the comments in the code.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      053199ed
    • Paul Jackson's avatar
      [PATCH] cpusets: remove depth counted locking hack · 5aa15b5f
      Paul Jackson authored
      Remove a rather hackish depth counter on cpuset locking.  The depth counter
      was avoiding a possible double trip on the global cpuset_sem semaphore.  It
      worked, but now an improved version of cpuset locking is available, to come
      in the next patch, using two global semaphores.
      
      This patch reverses "cpuset semaphore depth check deadlock fix"
      
      The kernel still works, even after this patch, except for some rare and
      difficult to reproduce race conditions when agressively creating and
      destroying cpusets marked with the notify_on_release option, on very large
      systems.
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5aa15b5f
    • Paul Jackson's avatar
      [PATCH] cpuset cleanup · f35f31d7
      Paul Jackson authored
      Remove one more useless line from cpuset_common_file_read().
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f35f31d7
    • Kirill Korotaev's avatar
      [PATCH] proc: fix of error path in proc_get_inode() · e9543659
      Kirill Korotaev authored
      This patch fixes incorrect error path in proc_get_inode(), when module
      can't be get due to being unloaded.  When try_module_get() fails, this
      function puts de(!) and still returns inode with non-getted de.
      
      There are still unresolved known bugs in proc yet to be fixed:
      - proc_dir_entry tree is managed without any serialization
      - create_proc_entry() doesn't setup de->owner anyhow,
         so setting it later manually is inatomic.
      - looks like almost all modules do not care whether
         it's de->owner is set...
      Signed-Off-By: default avatarDenis Lunev <den@sw.ru>
      Signed-Off-By: default avatarKirill Korotaev <dev@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e9543659
    • Miklos Szeredi's avatar
      [PATCH] fuse: clean up dead code related to nfs exporting · f12ec440
      Miklos Szeredi authored
      Remove last remains of NFS exportability support.
      
      The code is actually buggy (as reported by Akshat Aranya), since 'alias'
      will be leaked if it's non-null and alias->d_flags has DCACHE_DISCONNECTED.
      
      This is not an active bug, since there will never be any disconnected
      dentries.  But it's better to get rid of the unnecessary complexity anyway.
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f12ec440
    • Andrew Morton's avatar
      [PATCH] add_timer() of a pending timer is illegal · 15d2bace
      Andrew Morton authored
      In the recent timer rework we lost the check for an add_timer() of an
      already-pending timer.  That check was useful for networking, so put it back.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      15d2bace
    • Nate Diller's avatar
      [PATCH] block cleanups: Fix iosched module refcount leak · 2ca7d93b
      Nate Diller authored
      If the requested I/O scheduler is already in place, elevator_switch simply
      leaves the queue alone, and returns.  However, it forgets to call
      elevator_put, so
      
      'echo [current_sched] > /sys/block/[dev]/queue/scheduler'
      
      will leak a reference, causing the current_sched module to be permanently
      pinned in memory.
      Signed-off-by: default avatarNate Diller <nate@namesys.com>
      Acked-by: default avatarJens Axboe <axboe@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2ca7d93b
    • Jean Delvare's avatar
      [PATCH] Typo fix: dot after newline in printk strings · 3fa63c7d
      Jean Delvare authored
      Typo fix: dots appearing after a newline in printk strings.
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3fa63c7d
    • Christoph Hellwig's avatar
      [PATCH] unify sys_ptrace prototype · dfb7dac3
      Christoph Hellwig authored
      Make sure we always return, as all syscalls should.  Also move the common
      prototype to <linux/syscalls.h>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      dfb7dac3
    • Christoph Hellwig's avatar
      [PATCH] adjust parisc sys_ptrace prototype · 7024a9b8
      Christoph Hellwig authored
      Make the pid argument a long as on every other arcihtecture.  Despite pid_t
      beeing a 32bit type even on 64bit parisc this is not an ABI change due to
      the parisc calling conventions.  And even if it did it wouldn't matter too
      much because 64bit userspace on parisc is in an embrionic stage.
      Acked-by: default avatarMatthew Wilcox <matthew@wil.cx>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7024a9b8
    • Oleg Nesterov's avatar
      [PATCH] posix-timers: use schedule_timeout() in common_nsleep() · 4eb9af2a
      Oleg Nesterov authored
      common_nsleep() reimplements schedule_timeout_interruptible() for unknown
      reason.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4eb9af2a
    • Jean Delvare's avatar
      [PATCH] Typo fix: explictly -> explicitly · 33430dc5
      Jean Delvare authored
      (akpm: I don't do typo patches, but one of these is in a printk string)
      Signed-off-by: default avatarJean Delvare <khali@linux-fr.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      33430dc5
    • Nate Diller's avatar
      [PATCH] block cleanups: Add kconfig default iosched submenu · 131dda7f
      Nate Diller authored
      Add a kconfig submenu to select the default I/O scheduler, in case
      anticipatory is not compiled in or another default is preferred.  Also,
      since no-op is always available, we should use it whenever the selected
      default is not.
      Signed-off-by: default avatarNate Diller <nate@namesys.com>
      Acked-by: default avatarJens Axboe <axboe@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      131dda7f
    • Vadim Lobanov's avatar
      [PATCH] Unify sys_tkill() and sys_tgkill() · 6dd69f10
      Vadim Lobanov authored
      The majority of the sys_tkill() and sys_tgkill() function code is
      duplicated between the two of them.  This patch pulls the duplication out
      into a separate function -- do_tkill() -- and lets sys_tkill() and
      sys_tgkill() be simple wrappers around it.  This should make it easier to
      maintain in light of future changes.
      Signed-off-by: default avatarVadim Lobanov <vlobanov@speakeasy.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6dd69f10
    • Oleg Nesterov's avatar
      [PATCH] kill sigqueue->lock · 19a4fcb5
      Oleg Nesterov authored
      This lock is used in sigqueue_free(), but it is always equal to
      current->sighand->siglock, so we don't need to keep it in the struct
      sigqueue.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      19a4fcb5
    • Oleg Nesterov's avatar
      [PATCH] fix de_thread vs it_real_fn() deadlock · 932aeafb
      Oleg Nesterov authored
      de_thread() calls del_timer_sync(->real_timer) under ->sighand->siglock.
      This is deadlockable, it_real_fn sends a signal and needs this lock too.
      
      Also, delete unneeded ->real_timer.data assignment.
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      932aeafb
    • Eric Dumazet's avatar
      [PATCH] reduce sizeof(struct file) · 2f512016
      Eric Dumazet authored
      Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
      in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
      file_free_rcu) time, to reduce the size of this critical kernel object.
      
      The trick I used is to move f_rcuhead and f_list in an union called f_u
      
      The callers are changed so that f_rcuhead becomes f_u.fu_rcuhead and f_list
      becomes f_u.f_list
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2f512016
    • Randy Dunlap's avatar
      [PATCH] clarify menuconfig /(search) help text · 503af334
      Randy Dunlap authored
      Add explicit text about
      - where menuconfig '/' (search) searches for strings,
      - that substrings are allowed, and
      - that regular expressions are supported.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      503af334
    • Jesper Juhl's avatar
      [PATCH] Whitespace and CodingStyle cleanup for lib/idr.c · e15ae2dd
      Jesper Juhl authored
      Cleanup trailing whitespace, blank lines, CodingStyle issues etc, for
      lib/idr.c
      Signed-off-by: default avatarJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e15ae2dd
    • Jesper Juhl's avatar
      [PATCH] lib/string.c cleanup: remove pointless explicit casts · 850b9247
      Jesper Juhl authored
      The first two hunks of the patch really belongs in patch 1, but I missed
      them on the first pass and instead of redoing all 3 patches I stuck them in
      this one.
      Signed-off-by: default avatarJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      850b9247
    • Jesper Juhl's avatar
      [PATCH] lib/string.c cleanup: remove pointless register keyword · cc75fb71
      Jesper Juhl authored
      Removes a few pointless register keywords.  register is merely a compiler
      hint that access to the variable should be optimized, but gcc (3.3.6 in my
      case) generates the exact same code with and without the keyword, and even
      if gcc did something different with register present I think it is doubtful
      we would want to optimize access to these variables - especially since this
      is generic library code and there are supposed to be optimized versions in
      asm/ for anything that really matters speed wise.
      
      (akpm: iirc, keyword register is a gcc no-op unless using -O0)
      Signed-off-by: default avatarJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      cc75fb71
    • Jesper Juhl's avatar
      [PATCH] lib/string.c cleanup: whitespace and CodingStyle cleanups · 51a0f0f6
      Jesper Juhl authored
      Removes some blank lines, removes some trailing whitespace, adds spaces
      after commas and a few similar changes.
      Signed-off-by: default avatarJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      51a0f0f6
    • Amos Waterland's avatar
      [PATCH] protect ide_cdrom_capacity by ifdef · d97b3214
      Amos Waterland authored
      The only call to ide_cdrom_capacity is in code protected by
      CONFIG_PROC_FS, so when that is not enabled, the compiler complains:
      
       drivers/ide/ide-cd.c:3259: warning: `ide_cdrom_capacity' defined but not used
      
      Here is a patch that fixes that.  It provides some space savings for
      embedded systems that are not using procfs, as well:
      
           text    data     bss     dec     hex filename
       -  33540    6504    1032   41076    a074 drivers/ide/ide-cd.o
       +  33468    6480    1032   40980    a014 drivers/ide/ide-cd.o
      Signed-off-by: default avatarAmos Waterland <apw@us.ibm.com>
      Cc: Jens Axboe <axboe@suse.de>
      Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d97b3214
    • Miklos Szeredi's avatar
      [PATCH] open: cleanup in lookup_flags() · 42e50a5a
      Miklos Szeredi authored
      lookup_flags() is only called from the non-create case, so it needn't check
      for O_CREAT|O_EXCL.
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      42e50a5a
    • Eric W. Biederman's avatar
      [PATCH] Don't uselessly export task_struct to userspace in core dumps · a9289728
      Eric W. Biederman authored
      task_struct is an internal structure to the kernel with a lot of good
      information, that is probably interesting in core dumps.  However there is
      no way for user space to know what format that information is in making it
      useless.
      
      I grepped the GDB 6.3 source code and NT_TASKSTRUCT while defined is not
      used anywhere else.  So I would be surprised if anyone notices it is
      missing.
      
      In addition exporting kernel pointers to all the interesting kernel data
      structures sounds like the very definition of an information leak.  I
      haven't a clue what someone with evil intentions could do with that
      information, but in any attack against the kernel it looks like this is the
      perfect tool for aiming that attack.
      
      So since NT_TASKSTRUCT is useless as currently defined and is potentially
      dangerous, let's just not export it.
      
      (akpm: Daniel Jacobowitz <dan@debian.org> "would be amazed" if anything was
      using NT_TASKSTRUCT).
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a9289728