1. 14 Sep, 2010 3 commits
    • Roland McGrath's avatar
      x86-64, compat: Retruncate rax after ia32 syscall entry tracing · eefdca04
      Roland McGrath authored
      In commit d4d67150, we reopened an old hole for a 64-bit ptracer touching a
      32-bit tracee in system call entry.  A %rax value set via ptrace at the
      entry tracing stop gets used whole as a 32-bit syscall number, while we
      only check the low 32 bits for validity.
      
      Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
      in addition to testing the full 64 bits as has already been added.
      Reported-by: default avatarBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      eefdca04
    • H. Peter Anvin's avatar
      x86-64, compat: Test %rax for the syscall number, not %eax · 36d001c7
      H. Peter Anvin authored
      On 64 bits, we always, by necessity, jump through the system call
      table via %rax.  For 32-bit system calls, in theory the system call
      number is stored in %eax, and the code was testing %eax for a valid
      system call number.  At one point we loaded the stored value back from
      the stack to enforce zero-extension, but that was removed in checkin
      d4d67150.  An actual 32-bit process
      will not be able to introduce a non-zero-extended number, but it can
      happen via ptrace.
      
      Instead of re-introducing the zero-extension, test what we are
      actually going to use, i.e. %rax.  This only adds a handful of REX
      prefixes to the code.
      Reported-by: default avatarBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@kernel.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      36d001c7
    • H. Peter Anvin's avatar
      compat: Make compat_alloc_user_space() incorporate the access_ok() · c41d68a5
      H. Peter Anvin authored
      compat_alloc_user_space() expects the caller to independently call
      access_ok() to verify the returned area.  A missing call could
      introduce problems on some architectures.
      
      This patch incorporates the access_ok() check into
      compat_alloc_user_space() and also adds a sanity check on the length.
      The existing compat_alloc_user_space() implementations are renamed
      arch_compat_alloc_user_space() and are used as part of the
      implementation of the new global function.
      
      This patch assumes NULL will cause __get_user()/__put_user() to either
      fail or access userspace on all architectures.  This should be
      followed by checking the return value of compat_access_user_space()
      for NULL in the callers, at which time the access_ok() in the callers
      can also be removed.
      Reported-by: default avatarBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: <stable@kernel.org>
      c41d68a5
  2. 13 Sep, 2010 14 commits
  3. 12 Sep, 2010 1 commit
  4. 11 Sep, 2010 13 commits
  5. 10 Sep, 2010 9 commits
    • mark gross's avatar
      PM QoS: Correct pr_debug() misuse and improve parameter checks · 0109c2c4
      mark gross authored
      Correct some pr_debug() misuse and add a stronger parameter check to
      pm_qos_write() for the ASCII hex value case.  Thanks to Dan Carpenter
      for pointing out the problem!
      Signed-off-by: default avatarmark gross <markgross@thegnar.org>
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      0109c2c4
    • Dave Chinner's avatar
      xfs: log IO completion workqueue is a high priority queue · 51749e47
      Dave Chinner authored
      The workqueue implementation in 2.6.36-rcX has changed, resulting
      in the workqueues no longer having dedicated threads for work
      processing. This has caused severe livelocks under heavy parallel
      create workloads because the log IO completions have been getting
      held up behind metadata IO completions.  Hence log commits would
      stall, memory allocation would stall because pages could not be
      cleaned, and lock contention on the AIL during inode IO completion
      processing was being seen to slow everything down even further.
      
      By making the log Io completion workqueue a high priority workqueue,
      they are queued ahead of all data/metadata IO completions and
      processed before the data/metadata completions. Hence the log never
      gets stalled, and operations needed to clean memory can continue as
      quickly as possible. This avoids the livelock conditions and allos
      the system to keep running under heavy load as per normal.
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
      51749e47
    • Roland McGrath's avatar
      execve: make responsive to SIGKILL with large arguments · 9aea5a65
      Roland McGrath authored
      An execve with a very large total of argument/environment strings
      can take a really long time in the execve system call.  It runs
      uninterruptibly to count and copy all the strings.  This change
      makes it abort the exec quickly if sent a SIGKILL.
      
      Note that this is the conservative change, to interrupt only for
      SIGKILL, by using fatal_signal_pending().  It would be perfectly
      correct semantics to let any signal interrupt the string-copying in
      execve, i.e. use signal_pending() instead of fatal_signal_pending().
      We'll save that change for later, since it could have user-visible
      consequences, such as having a timer set too quickly make it so that
      an execve can never complete, though it always happened to work before.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9aea5a65
    • Roland McGrath's avatar
      execve: improve interactivity with large arguments · 7993bc1f
      Roland McGrath authored
      This adds a preemption point during the copying of the argument and
      environment strings for execve, in copy_strings().  There is already
      a preemption point in the count() loop, so this doesn't add any new
      points in the abstract sense.
      
      When the total argument+environment strings are very large, the time
      spent copying them can be much more than a normal user time slice.
      So this change improves the interactivity of the rest of the system
      when one process is doing an execve with very large arguments.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7993bc1f
    • Roland McGrath's avatar
      setup_arg_pages: diagnose excessive argument size · 1b528181
      Roland McGrath authored
      The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not
      check the size of the argument/environment area on the stack.
      When it is unworkably large, shift_arg_pages() hits its BUG_ON.
      This is exploitable with a very large RLIMIT_STACK limit, to
      create a crash pretty easily.
      
      Check that the initial stack is not too large to make it possible
      to map in any executable.  We're not checking that the actual
      executable (or intepreter, for binfmt_elf) will fit.  So those
      mappings might clobber part of the initial stack mapping.  But
      that is just userland lossage that userland made happen, not a
      kernel problem.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b528181
    • Linus Torvalds's avatar
      Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm · be6200aa
      Linus Torvalds authored
      * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: Perform hardware_enable in CPU_STARTING callback
        KVM: i8259: fix migration
        KVM: fix i8259 oops when no vcpus are online
        KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
      be6200aa
    • Linus Torvalds's avatar
      Merge branch 'perf-fixes-for-linus' of... · f2955b49
      Linus Torvalds authored
      Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread
        perf symbols: Fix multiple initialization of symbol system
        perf: Fix CPU hotplug
        perf, trace: Fix module leak
        tracing/kprobe: Fix handling of C-unlike argument names
        tracing/kprobes: Fix handling of argument names
        perf probe: Fix handling of arguments names
        perf probe: Fix return probe support
        tracing/kprobe: Fix a memory leak in error case
        tracing: Do not allow llseek to set_ftrace_filter
      f2955b49
    • David Howells's avatar
      KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyring · 3d96406c
      David Howells authored
      Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
      of the parent process's session keyring whether or not the parent has a session
      keyring [CVE-2010-2960].
      
      This results in the following oops:
      
        BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
        IP: [<ffffffff811ae4dd>] keyctl_session_to_parent+0x251/0x443
        ...
        Call Trace:
         [<ffffffff811ae2f3>] ? keyctl_session_to_parent+0x67/0x443
         [<ffffffff8109d286>] ? __do_fault+0x24b/0x3d0
         [<ffffffff811af98c>] sys_keyctl+0xb4/0xb8
         [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b
      
      if the parent process has no session keyring.
      
      If the system is using pam_keyinit then it mostly protected against this as all
      processes derived from a login will have inherited the session keyring created
      by pam_keyinit during the log in procedure.
      
      To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.
      Reported-by: default avatarTavis Ormandy <taviso@cmpxchg8b.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarTavis Ormandy <taviso@cmpxchg8b.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3d96406c
    • David Howells's avatar
      KEYS: Fix RCU no-lock warning in keyctl_session_to_parent() · 9d1ac65a
      David Howells authored
      There's an protected access to the parent process's credentials in the middle
      of keyctl_session_to_parent().  This results in the following RCU warning:
      
        ===================================================
        [ INFO: suspicious rcu_dereference_check() usage. ]
        ---------------------------------------------------
        security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!
      
        other info that might help us debug this:
      
        rcu_scheduler_active = 1, debug_locks = 0
        1 lock held by keyctl-session-/2137:
         #0:  (tasklist_lock){.+.+..}, at: [<ffffffff811ae2ec>] keyctl_session_to_parent+0x60/0x236
      
        stack backtrace:
        Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
        Call Trace:
         [<ffffffff8105606a>] lockdep_rcu_dereference+0xaa/0xb3
         [<ffffffff811ae379>] keyctl_session_to_parent+0xed/0x236
         [<ffffffff811af77e>] sys_keyctl+0xb4/0xb6
         [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b
      
      The code should take the RCU read lock to make sure the parents credentials
      don't go away, even though it's holding a spinlock and has IRQ disabled.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9d1ac65a