1. 09 Sep, 2013 2 commits
    • Artem Savkov's avatar
      aio: rcu_read_lock protection for new rcu_dereference calls · d9b2c871
      Artem Savkov authored
      Patch "aio: fix rcu sparse warnings introduced by ioctx table lookup patch"
      (77d30b14 in linux-next.git) introduced a
      couple of new rcu_dereference calls which are not protected by rcu_read_lock
      and result in following warnings during syscall fuzzing(trinity):
      
      [  471.646379] ===============================
      [  471.649727] [ INFO: suspicious RCU usage. ]
      [  471.653919] 3.11.0-next-20130906+ #496 Not tainted
      [  471.657792] -------------------------------
      [  471.661235] fs/aio.c:503 suspicious rcu_dereference_check() usage!
      [  471.665968]
      [  471.665968] other info that might help us debug this:
      [  471.665968]
      [  471.672141]
      [  471.672141] rcu_scheduler_active = 1, debug_locks = 1
      [  471.677549] 1 lock held by trinity-child0/3774:
      [  471.681675]  #0:  (&(&mm->ioctx_lock)->rlock){+.+...}, at: [<c119ba1a>] SyS_io_setup+0x63a/0xc70
      [  471.688721]
      [  471.688721] stack backtrace:
      [  471.692488] CPU: 1 PID: 3774 Comm: trinity-child0 Not tainted 3.11.0-next-20130906+ #496
      [  471.698437] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  471.703151]  00000000 00000000 c58bbf30 c18a814b de2234c0 c58bbf58 c10a4ec6 c1b0d824
      [  471.709544]  c1b0f60e 00000001 00000001 c1af61b0 00000000 cb670ac0 c3aca000 c58bbfac
      [  471.716251]  c119bc7c 00000002 00000001 00000000 c119b8dd 00000000 c10cf684 c58bbfb4
      [  471.722902] Call Trace:
      [  471.724859]  [<c18a814b>] dump_stack+0x4b/0x66
      [  471.728772]  [<c10a4ec6>] lockdep_rcu_suspicious+0xc6/0x100
      [  471.733716]  [<c119bc7c>] SyS_io_setup+0x89c/0xc70
      [  471.737806]  [<c119b8dd>] ? SyS_io_setup+0x4fd/0xc70
      [  471.741689]  [<c10cf684>] ? __audit_syscall_entry+0x94/0xe0
      [  471.746080]  [<c18b1fcc>] syscall_call+0x7/0xb
      [  471.749723]  [<c1080000>] ? task_fork_fair+0x240/0x260
      Signed-off-by: default avatarArtem Savkov <artem.savkov@gmail.com>
      Reviewed-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      d9b2c871
    • Benjamin LaHaise's avatar
      aio: fix race in ring buffer page lookup introduced by page migration support · d6c355c7
      Benjamin LaHaise authored
      Prior to the introduction of page migration support in "fs/aio: Add support
      to aio ring pages migration" / 36bc08cc,
      mapping of the ring buffer pages was done via get_user_pages() while
      retaining mmap_sem held for write.  This avoided possible races with userland
      racing an munmap() or mremap().  The page migration patch, however, switched
      to using mm_populate() to prime the page mapping.  mm_populate() cannot be
      called with mmap_sem held.
      
      Instead of dropping the mmap_sem, revert to the old behaviour and simply
      drop the use of mm_populate() since get_user_pages() will cause the pages to
      get mapped anyways.  Thanks to Al Viro for spotting this issue.
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      d6c355c7
  2. 30 Aug, 2013 2 commits
  3. 07 Aug, 2013 1 commit
  4. 06 Aug, 2013 1 commit
  5. 05 Aug, 2013 1 commit
    • Benjamin LaHaise's avatar
      aio: fix error handling and rcu usage in "convert the ioctx list to table lookup v3" · da90382c
      Benjamin LaHaise authored
      In the patch "aio: convert the ioctx list to table lookup v3", incorrect
      handling in the ioctx_alloc() error path was introduced that lead to an
      ioctx being added via ioctx_add_table() while freed when the ioctx_alloc()
      call returned -EAGAIN due to hitting the aio_max_nr limit.  Fix this by
      only calling ioctx_add_table() as the last step in ioctx_alloc().
      
      Also, several unnecessary rcu_dereference() calls were added that lead to
      RCU warnings where the system was already protected by a spin lock for
      accessing mm->ioctx_table.
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      da90382c
  6. 31 Jul, 2013 1 commit
  7. 30 Jul, 2013 11 commits
    • Benjamin LaHaise's avatar
      aio: convert the ioctx list to table lookup v3 · db446a08
      Benjamin LaHaise authored
      On Wed, Jun 12, 2013 at 11:14:40AM -0700, Kent Overstreet wrote:
      > On Mon, Apr 15, 2013 at 02:40:55PM +0300, Octavian Purdila wrote:
      > > When using a large number of threads performing AIO operations the
      > > IOCTX list may get a significant number of entries which will cause
      > > significant overhead. For example, when running this fio script:
      > >
      > > rw=randrw; size=256k ;directory=/mnt/fio; ioengine=libaio; iodepth=1
      > > blocksize=1024; numjobs=512; thread; loops=100
      > >
      > > on an EXT2 filesystem mounted on top of a ramdisk we can observe up to
      > > 30% CPU time spent by lookup_ioctx:
      > >
      > >  32.51%  [guest.kernel]  [g] lookup_ioctx
      > >   9.19%  [guest.kernel]  [g] __lock_acquire.isra.28
      > >   4.40%  [guest.kernel]  [g] lock_release
      > >   4.19%  [guest.kernel]  [g] sched_clock_local
      > >   3.86%  [guest.kernel]  [g] local_clock
      > >   3.68%  [guest.kernel]  [g] native_sched_clock
      > >   3.08%  [guest.kernel]  [g] sched_clock_cpu
      > >   2.64%  [guest.kernel]  [g] lock_release_holdtime.part.11
      > >   2.60%  [guest.kernel]  [g] memcpy
      > >   2.33%  [guest.kernel]  [g] lock_acquired
      > >   2.25%  [guest.kernel]  [g] lock_acquire
      > >   1.84%  [guest.kernel]  [g] do_io_submit
      > >
      > > This patchs converts the ioctx list to a radix tree. For a performance
      > > comparison the above FIO script was run on a 2 sockets 8 core
      > > machine. This are the results (average and %rsd of 10 runs) for the
      > > original list based implementation and for the radix tree based
      > > implementation:
      > >
      > > cores         1         2         4         8         16        32
      > > list       109376 ms  69119 ms  35682 ms  22671 ms  19724 ms  16408 ms
      > > %rsd         0.69%      1.15%     1.17%     1.21%     1.71%     1.43%
      > > radix       73651 ms  41748 ms  23028 ms  16766 ms  15232 ms   13787 ms
      > > %rsd         1.19%      0.98%     0.69%     1.13%    0.72%      0.75%
      > > % of radix
      > > relative    66.12%     65.59%    66.63%    72.31%   77.26%     83.66%
      > > to list
      > >
      > > To consider the impact of the patch on the typical case of having
      > > only one ctx per process the following FIO script was run:
      > >
      > > rw=randrw; size=100m ;directory=/mnt/fio; ioengine=libaio; iodepth=1
      > > blocksize=1024; numjobs=1; thread; loops=100
      > >
      > > on the same system and the results are the following:
      > >
      > > list        58892 ms
      > > %rsd         0.91%
      > > radix       59404 ms
      > > %rsd         0.81%
      > > % of radix
      > > relative    100.87%
      > > to list
      >
      > So, I was just doing some benchmarking/profiling to get ready to send
      > out the aio patches I've got for 3.11 - and it looks like your patch is
      > causing a ~1.5% throughput regression in my testing :/
      ... <snip>
      
      I've got an alternate approach for fixing this wart in lookup_ioctx()...
      Instead of using an rbtree, just use the reserved id in the ring buffer
      header to index an array pointing the ioctx.  It's not finished yet, and
      it needs to be tidied up, but is most of the way there.
      
      		-ben
      --
      "Thought is the essence of where you are now."
      --
      kmo> And, a rework of Ben's code, but this was entirely his idea
      kmo>		-Kent
      
      bcrl> And fix the code to use the right mm_struct in kill_ioctx(), actually
      free memory.
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      db446a08
    • Benjamin LaHaise's avatar
      aio: double aio_max_nr in calculations · 4cd81c3d
      Benjamin LaHaise authored
      With the changes to use percpu counters for aio event ring size calculation,
      existing increases to aio_max_nr are now insufficient to allow for the
      allocation of enough events.  Double the value used for aio_max_nr to account
      for the doubling introduced by the percpu slack.
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      4cd81c3d
    • Kent Overstreet's avatar
      aio: Kill ki_dtor · d29c445b
      Kent Overstreet authored
      sock_aio_dtor() is dead code - and stuff that does need to do cleanup
      can simply do it before calling aio_complete().
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      d29c445b
    • Kent Overstreet's avatar
      aio: Kill ki_users · 57282d8f
      Kent Overstreet authored
      The kiocb refcount is only needed for cancellation - to ensure a kiocb
      isn't freed while a ki_cancel callback is running. But if we restrict
      ki_cancel callbacks to not block (which they currently don't), we can
      simply drop the refcount.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      57282d8f
    • Kent Overstreet's avatar
      aio: Kill unneeded kiocb members · 8bc92afc
      Kent Overstreet authored
      The old aio retry infrastucture needed to save the various arguments to
      to aio operations. But with the retry infrastructure gone, we can trim
      struct kiocb quite a bit.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      8bc92afc
    • Kent Overstreet's avatar
      aio: Kill aio_rw_vect_retry() · 73a7075e
      Kent Overstreet authored
      This code doesn't serve any purpose anymore, since the aio retry
      infrastructure has been removed.
      
      This change should be safe because aio_read/write are also used for
      synchronous IO, and called from do_sync_read()/do_sync_write() - and
      there's no looping done in the sync case (the read and write syscalls).
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      73a7075e
    • Kent Overstreet's avatar
      aio: Don't use ctx->tail unnecessarily · 5ffac122
      Kent Overstreet authored
      aio_complete() (arguably) needs to keep its own trusted copy of the tail
      pointer, but io_getevents() doesn't have to use it - it's already using
      the head pointer from the ring buffer.
      
      So convert it to use the tail from the ring buffer so it touches fewer
      cachelines and doesn't contend with the cacheline aio_complete() needs.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      5ffac122
    • Kent Overstreet's avatar
      aio: io_cancel() no longer returns the io_event · bec68faa
      Kent Overstreet authored
      Originally, io_event() was documented to return the io_event if
      cancellation succeeded - the io_event wouldn't be delivered via the ring
      buffer like it normally would.
      
      But this isn't what the implementation was actually doing; the only
      driver implementing cancellation, the usb gadget code, never returned an
      io_event in its cancel function. And aio_complete() was recently changed
      to no longer suppress event delivery if the kiocb had been cancelled.
      
      This gets rid of the unused io_event argument to kiocb_cancel() and
      kiocb->ki_cancel(), and changes io_cancel() to return -EINPROGRESS if
      kiocb->ki_cancel() returned success.
      
      Also tweak the refcounting in kiocb_cancel() to make more sense.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      bec68faa
    • Kent Overstreet's avatar
      aio: percpu ioctx refcount · 723be6e3
      Kent Overstreet authored
      This just converts the ioctx refcount to the new generic dynamic percpu
      refcount code.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      723be6e3
    • Kent Overstreet's avatar
      aio: percpu reqs_available · e1bdd5f2
      Kent Overstreet authored
      See the previous patch ("aio: reqs_active -> reqs_available") for why we
      want to do this - this basically implements a per cpu allocator for
      reqs_available that doesn't actually allocate anything.
      
      Note that we need to increase the size of the ringbuffer we allocate,
      since a single thread won't necessarily be able to use all the
      reqs_available slots - some (up to about half) might be on other per cpu
      lists, unavailable for the current thread.
      
      We size the ringbuffer based on the nr_events userspace passed to
      io_setup(), so this is a slight behaviour change - but nr_events wasn't
      being used as a hard limit before, it was being rounded up to the next
      page before so this doesn't change the actual semantics.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      e1bdd5f2
    • Kent Overstreet's avatar
      aio: reqs_active -> reqs_available · 34e83fc6
      Kent Overstreet authored
      The number of outstanding kiocbs is one of the few shared things left that
      has to be touched for every kiocb - it'd be nice to make it percpu.
      
      We can make it per cpu by treating it like an allocation problem: we have
      a maximum number of kiocbs that can be outstanding (i.e.  slots) - then we
      just allocate and free slots, and we know how to write per cpu allocators.
      
      So as prep work for that, we convert reqs_active to reqs_available.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      34e83fc6
  8. 17 Jul, 2013 1 commit
  9. 16 Jul, 2013 2 commits
  10. 15 Jul, 2013 1 commit
  11. 14 Jul, 2013 15 commits
    • Linus Torvalds's avatar
      Linux 3.11-rc1 · ad81f054
      Linus Torvalds authored
      ad81f054
    • Linus Torvalds's avatar
      Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux · 54be8200
      Linus Torvalds authored
      Pull slab update from Pekka Enberg:
       "Highlights:
      
        - Fix for boot-time problems on some architectures due to
          init_lock_keys() not respecting kmalloc_caches boundaries
          (Christoph Lameter)
      
        - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim)
      
        - Fix for excessive slab freelist draining (Wanpeng Li)
      
        - SLUB and SLOB cleanups and fixes (various people)"
      
      I ended up editing the branch, and this avoids two commits at the end
      that were immediately reverted, and I instead just applied the oneliner
      fix in between myself.
      
      * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
        slub: Check for page NULL before doing the node_match check
        mm/slab: Give s_next and s_stop slab-specific names
        slob: Check for NULL pointer before calling ctor()
        slub: Make cpu partial slab support configurable
        slab: add kmalloc() to kernel API documentation
        slab: fix init_lock_keys
        slob: use DIV_ROUND_UP where possible
        slub: do not put a slab to cpu partial list when cpu_partial is 0
        mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo
        mm/slub: Drop unnecessary nr_partials
        mm/slab: Fix /proc/slabinfo unwriteable for slab
        mm/slab: Sharing s_next and s_stop between slab and slub
        mm/slab: Fix drain freelist excessively
        slob: Rework #ifdeffery in slab.h
        mm, slab: moved kmem_cache_alloc_node comment to correct place
      54be8200
    • Steven Rostedt's avatar
      slub: Check for page NULL before doing the node_match check · c25f195e
      Steven Rostedt authored
      In the -rt kernel (mrg), we hit the following dump:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
      PGD a2d39067 PUD b1641067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
      CPU 3
      Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
      RIP: 0010:[<ffffffff811573f1>]  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
      RSP: 0018:ffff8800a9b17d70  EFLAGS: 00010213
      RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
      RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
      RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
      R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
      R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
      FS:  00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
      Stack:
       ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
       0000000001200011 0000000001200011 0000000000000000 0000000000000000
       00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
      Call Trace:
       [<ffffffff81202e08>] ? current_has_perm+0x68/0x80
       [<ffffffff81041cbd>] copy_process+0xdd/0x15b0
       [<ffffffff810a2125>] ? rt_up_read+0x25/0x30
       [<ffffffff8104369a>] do_fork+0x5a/0x360
       [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220
       [<ffffffff8100b068>] sys_clone+0x28/0x30
       [<ffffffff81527423>] stub_clone+0x13/0x20
       [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b
      Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
      RIP  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
       RSP <ffff8800a9b17d70>
      CR2: 0000000000000000
      ---[ end trace 0000000000000002 ]---
      
      Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
      with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
      disable migration. But the SLUB code is relatively lockless, and the
      spin_locks there are raw_spin_locks (not converted to mutexes), thus I
      believe this bug can happen in mainline without -rt features. The -rt
      patch is just good at triggering mainline bugs ;-)
      
      Anyway, looking at where this crashed, it seems that the page variable
      can be NULL when passed to the node_match() function (which does not
      check if it is NULL). When this happens we get the above panic.
      
      As page is only used in slab_alloc() to check if the node matches, if
      it's NULL I'm assuming that we can say it doesn't and call the
      __slab_alloc() code. Is this a correct assumption?
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c25f195e
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 41d9884c
      Linus Torvalds authored
      Pull more vfs stuff from Al Viro:
       "O_TMPFILE ABI changes, Oleg's fput() series, misc cleanups, including
        making simple_lookup() usable for filesystems with non-NULL s_d_op,
        which allows us to get rid of quite a bit of ugliness"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        sunrpc: now we can just set ->s_d_op
        cgroup: we can use simple_lookup() now
        efivarfs: we can use simple_lookup() now
        make simple_lookup() usable for filesystems that set ->s_d_op
        configfs: don't open-code d_alloc_name()
        __rpc_lookup_create_exclusive: pass string instead of qstr
        rpc_create_*_dir: don't bother with qstr
        llist: llist_add() can use llist_add_batch()
        llist: fix/simplify llist_add() and llist_add_batch()
        fput: turn "list_head delayed_fput_list" into llist_head
        fs/file_table.c:fput(): add comment
        Safer ABI for O_TMPFILE
      41d9884c
    • Al Viro's avatar
      sunrpc: now we can just set ->s_d_op · dae3794f
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      dae3794f
    • Al Viro's avatar
      cgroup: we can use simple_lookup() now · 786e1448
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      786e1448
    • Al Viro's avatar
      efivarfs: we can use simple_lookup() now · 6e8cd2cb
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6e8cd2cb
    • Al Viro's avatar
    • Al Viro's avatar
      configfs: don't open-code d_alloc_name() · ec193cf5
      Al Viro authored
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      ec193cf5
    • Al Viro's avatar
      __rpc_lookup_create_exclusive: pass string instead of qstr · d3db90b0
      Al Viro authored
      ... and use d_hash_and_lookup() instead of open-coding it, for fsck sake...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d3db90b0
    • Al Viro's avatar
      rpc_create_*_dir: don't bother with qstr · a95e691f
      Al Viro authored
      just pass the name
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a95e691f
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86 · 63345b47
      Linus Torvalds authored
      Pull x86 platform driver updates from Matthew Garrett:
       "Nothing overly exciting here - a couple of new drivers that don't do a
        great deal, along with some miscellaneous fixes and a couple of small
        feature enablement patches"
      
      * 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
        x86 platform drivers: fix gpio leak
        toshiba_acpi: Add dependency on SERIO_I8042
        asus-nb-wmi: set wapf=4 for ASUSTeK COMPUTER INC. 1015E/U
        Add trivial driver to disable Intel Smart Connect
        Add support driver for Intel Rapid Start Technology
        hp-wmi: add supports for POST code error
        asus-wmi: control wlan-led only if wapf == 4
        drivers/platform/x86/intel_ips: Convert to module_pci_driver
        asus-nb-wmi: ignore ALS notification key code
        asus-wmi: append newline to messages
        x86: asus-laptop: fix invalid point access
        x86: msi-laptop: fix memleak
        amilo-rfkill: Add dependency on SERIO_I8042
        dell-laptop: fix error return code in dell_init()
        hp-wmi: Enable hotkeys on some systems
      63345b47
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 18fb38e2
      Linus Torvalds authored
      Pull second round of input updates from Dmitry Torokhov:
       "An update to Elantech driver to support hardware v7, fix to the new
        cyttsp4 driver to use proper addressing, ads7846 device tree support
        and nspire-keypad got a small cleanup."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: nspire-keypad - replace magic offset with define
        Input: elantech - fix for newer hardware versions (v7)
        Input: cyttsp4 - use 16bit address for I2C/SPI communication
        Input: ads7846 - add device tree bindings
        Input: ads7846 - make sure we do not change platform data
      18fb38e2
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · be9c6d91
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Just a bunch of small fixes and tidy ups:
      
         1) Finish the "busy_poll" renames, from Eliezer Tamir.
      
         2) Fix RCU stalls in IFB driver, from Ding Tianhong.
      
         3) Linearize buffers properly in tun/macvtap zerocopy code.
      
         4) Don't crash on rmmod in vxlan, from Pravin B Shelar.
      
         5) Spinlock used before init in alx driver, from Maarten Lankhorst.
      
         6) A sparse warning fix in bnx2x broke TSO checksums, fix from Dmitry
            Kravkov.
      
         7) Dummy and ifb driver load failure paths can oops, fixes from Tan
            Xiaojun and Ding Tianhong.
      
         8) Correct MTU calculations in IP tunnels, from Alexander Duyck.
      
         9) Account all TCP retransmits in SNMP stats properly, from Yuchung
            Cheng.
      
        10) atl1e and via-rhine do not handle DMA mapping failures properly,
            from Neil Horman.
      
        11) Various equal-cost multipath route fixes in ipv6 from Hannes
            Frederic Sowa"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
        ipv6: only static routes qualify for equal cost multipathing
        via-rhine: fix dma mapping errors
        atl1e: fix dma mapping warnings
        tcp: account all retransmit failures
        usb/net/r815x: fix cast to restricted __le32
        usb/net/r8152: fix integer overflow in expression
        net: access page->private by using page_private
        net: strict_strtoul is obsolete, use kstrtoul instead
        drivers/net/ieee802154: don't use devm_pinctrl_get_select_default() in probe
        drivers/net/ethernet/cadence: don't use devm_pinctrl_get_select_default() in probe
        drivers/net/can/c_can: don't use devm_pinctrl_get_select_default() in probe
        net/usb: add relative mii functions for r815x
        net/tipc: use %*phC to dump small buffers in hex form
        qlcnic: Adding Maintainers.
        gre: Fix MTU sizing check for gretap tunnels
        pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
        pkt_sched: sch_qfq: improve efficiency of make_eligible
        gso: Update tunnel segmentation to support Tx checksum offload
        inet: fix spacing in assignment
        ifb: fix oops when loading the ifb failed
        ...
      be9c6d91
    • Linus Torvalds's avatar
      Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 03ce3ca4
      Linus Torvalds authored
      Pull final round of SCSI updates from James Bottomley:
       "This is the remaining set of SCSI patches for the merge window.  It's
        mostly driver updates (scsi_debug, qla2xxx, storvsc, mp3sas).  There
        are also several bug fixes in fcoe, libfc, and megaraid_sas.  We also
        have a couple of core changes to try to make device destruction more
        deterministic"
      
      * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (46 commits)
        [SCSI] scsi constants: command, sense key + additional sense strings
        fcoe: Reduce number of sparse warnings
        fcoe: Stop fc_rport_priv structure leak
        libfcoe: Fix meaningless log statement
        libfc: Differentiate echange timer cancellation debug statements
        libfc: Remove extra space in fc_exch_timer_cancel definition
        fcoe: fix the link error status block sparse warnings
        fcoe: Fix smatch warning in fcoe_fdmi_info function
        libfc: Reject PLOGI from nodes with incompatible role
        [SCSI] enable destruction of blocked devices which fail LUN scanning
        [SCSI] Fix race between starved list and device removal
        [SCSI] megaraid_sas: fix a bug for 64 bit arches
        [SCSI] scsi_debug: reduce duplication between prot_verify_read and prot_verify_write
        [SCSI] scsi_debug: simplify offset calculation for dif_storep
        [SCSI] scsi_debug: invalidate protection info for unmapped region
        [SCSI] scsi_debug: fix NULL pointer dereference with parameters dif=0 dix=1
        [SCSI] scsi_debug: fix incorrectly nested kmap_atomic()
        [SCSI] scsi_debug: fix invalid address passed to kunmap_atomic()
        [SCSI] mpt3sas: Bump driver version to v02.100.00.00
        [SCSI] mpt3sas: when async scanning is enabled then while scanning, devices are removed but their transport layer entries are not removed
        ...
      03ce3ca4
  12. 13 Jul, 2013 2 commits