1. 13 Aug, 2018 18 commits
  2. 12 Aug, 2018 4 commits
    • Linus Torvalds's avatar
      Linux 4.18 · 94710cac
      Linus Torvalds authored
      94710cac
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 921195d3
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Eight fixes.
      
        The most important one is the mpt3sas fix which makes the driver work
        again on big endian systems. The rest are mostly minor error path or
        checker issues and the vmw_scsi one fixes a performance problem"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: vmw_pvscsi: Return DID_RESET for status SAM_STAT_COMMAND_TERMINATED
        scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
        scsi: mpt3sas: Swap I/O memory read value back to cpu endianness
        scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO
        scsi: fcoe: drop frames in ELS LOGO error path
        scsi: fcoe: fix use-after-free in fcoe_ctlr_els_send
        scsi: qedi: Fix a potential buffer overflow
        scsi: qla2xxx: Fix memory leak for allocating abort IOCB
      921195d3
    • Linus Torvalds's avatar
      init: rename and re-order boot_cpu_state_init() · b5b1404d
      Linus Torvalds authored
      This is purely a preparatory patch for upcoming changes during the 4.19
      merge window.
      
      We have a function called "boot_cpu_state_init()" that isn't really
      about the bootup cpu state: that is done much earlier by the similarly
      named "boot_cpu_init()" (note lack of "state" in name).
      
      This function initializes some hotplug CPU state, and needs to run after
      the percpu data has been properly initialized.  It even has a comment to
      that effect.
      
      Except it _doesn't_ actually run after the percpu data has been properly
      initialized.  On x86 it happens to do that, but on at least arm and
      arm64, the percpu base pointers are initialized by the arch-specific
      'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
      
      This had some unexpected results, and in particular we have a patch
      pending for the merge window that did the obvious cleanup of using
      'this_cpu_write()' in the cpu hotplug init code:
      
        -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
        +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
      
      which is obviously the right thing to do.  Except because of the
      ordering issue, it actually failed miserably and unexpectedly on arm64.
      
      So this just fixes the ordering, and changes the name of the function to
      be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
      hotplug state, because the core CPU state was supposed to have already
      been done earlier.
      
      Marked for stable, since the (not yet merged) patch that will show this
      problem is marked for stable.
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarMian Yousaf Kaukab <yousaf.kaukab@suse.com>
      Suggested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5b1404d
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · d6dd6431
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "A bunch of race fixes, mostly around lazy pathwalk.
      
        All of it is -stable fodder, a large part going back to 2013"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        make sure that __dentry_kill() always invalidates d_seq, unhashed or not
        fix __legitimize_mnt()/mntput() race
        fix mntput/mntput race
        root dentries need RCU-delayed freeing
      d6dd6431
  3. 11 Aug, 2018 9 commits
  4. 10 Aug, 2018 2 commits
  5. 09 Aug, 2018 7 commits
    • Al Viro's avatar
      make sure that __dentry_kill() always invalidates d_seq, unhashed or not · 4c0d7cd5
      Al Viro authored
      RCU pathwalk relies upon the assumption that anything that changes
      ->d_inode of a dentry will invalidate its ->d_seq.  That's almost
      true - the one exception is that the final dput() of already unhashed
      dentry does *not* touch ->d_seq at all.  Unhashing does, though,
      so for anything we'd found by RCU dcache lookup we are fine.
      Unfortunately, we can *start* with an unhashed dentry or jump into
      it.
      
      We could try and be careful in the (few) places where that could
      happen.  Or we could just make the final dput() invalidate the damn
      thing, unhashed or not.  The latter is much simpler and easier to
      backport, so let's do it that way.
      Reported-by: default avatar"Dae R. Jeong" <threeearcat@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4c0d7cd5
    • Al Viro's avatar
      fix __legitimize_mnt()/mntput() race · 119e1ef8
      Al Viro authored
      __legitimize_mnt() has two problems - one is that in case of success
      the check of mount_lock is not ordered wrt preceding increment of
      refcount, making it possible to have successful __legitimize_mnt()
      on one CPU just before the otherwise final mntpu() on another,
      with __legitimize_mnt() not seeing mntput() taking the lock and
      mntput() not seeing the increment done by __legitimize_mnt().
      Solved by a pair of barriers.
      
      Another is that failure of __legitimize_mnt() on the second
      read_seqretry() leaves us with reference that'll need to be
      dropped by caller; however, if that races with final mntput()
      we can end up with caller dropping rcu_read_lock() and doing
      mntput() to release that reference - with the first mntput()
      having freed the damn thing just as rcu_read_lock() had been
      dropped.  Solution: in "do mntput() yourself" failure case
      grab mount_lock, check if MNT_DOOMED has been set by racing
      final mntput() that has missed our increment and if it has -
      undo the increment and treat that as "failure, caller doesn't
      need to drop anything" case.
      
      It's not easy to hit - the final mntput() has to come right
      after the first read_seqretry() in __legitimize_mnt() *and*
      manage to miss the increment done by __legitimize_mnt() before
      the second read_seqretry() in there.  The things that are almost
      impossible to hit on bare hardware are not impossible on SMP
      KVM, though...
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      119e1ef8
    • Al Viro's avatar
      fix mntput/mntput race · 9ea0a46c
      Al Viro authored
      mntput_no_expire() does the calculation of total refcount under mount_lock;
      unfortunately, the decrement (as well as all increments) are done outside
      of it, leading to false positives in the "are we dropping the last reference"
      test.  Consider the following situation:
      	* mnt is a lazy-umounted mount, kept alive by two opened files.  One
      of those files gets closed.  Total refcount of mnt is 2.  On CPU 42
      mntput(mnt) (called from __fput()) drops one reference, decrementing component
      	* After it has looked at component #0, the process on CPU 0 does
      mntget(), incrementing component #0, gets preempted and gets to run again -
      on CPU 69.  There it does mntput(), which drops the reference (component #69)
      and proceeds to spin on mount_lock.
      	* On CPU 42 our first mntput() finishes counting.  It observes the
      decrement of component #69, but not the increment of component #0.  As the
      result, the total it gets is not 1 as it should've been - it's 0.  At which
      point we decide that vfsmount needs to be killed and proceed to free it and
      shut the filesystem down.  However, there's still another opened file
      on that filesystem, with reference to (now freed) vfsmount, etc. and we are
      screwed.
      
      It's not a wide race, but it can be reproduced with artificial slowdown of
      the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups.
      
      Fix consists of moving the refcount decrement under mount_lock; the tricky
      part is that we want (and can) keep the fast case (i.e. mount that still
      has non-NULL ->mnt_ns) entirely out of mount_lock.  All places that zero
      mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu()
      before that mntput().  IOW, if mntput() observes (under rcu_read_lock())
      a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to
      be dropped.
      Reported-by: default avatarJann Horn <jannh@google.com>
      Tested-by: default avatarJann Horn <jannh@google.com>
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9ea0a46c
    • Daniel Borkmann's avatar
      Merge branch 'bpf-fix-cpu-and-devmap-teardown' · 9c954201
      Daniel Borkmann authored
      Jesper Dangaard Brouer says:
      
      ====================
      Removing entries from cpumap and devmap, goes through a number of
      syncronization steps to make sure no new xdp_frames can be enqueued.
      But there is a small chance, that xdp_frames remains which have not
      been flushed/processed yet.  Flushing these during teardown, happens
      from RCU context and not as usual under RX NAPI context.
      
      The optimization introduced in commt 389ab7f0 ("xdp: introduce
      xdp_return_frame_rx_napi"), missed that the flush operation can also
      be called from RCU context.  Thus, we cannot always use the
      xdp_return_frame_rx_napi call, which take advantage of the protection
      provided by XDP RX running under NAPI protection.
      
      The samples/bpf xdp_redirect_cpu have a --stress-mode, that is
      adjusted to easier reproduce (verified by Red Hat QA).
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      9c954201
    • Jesper Dangaard Brouer's avatar
      xdp: fix bug in devmap teardown code path · 1bf9116d
      Jesper Dangaard Brouer authored
      Like cpumap teardown, the devmap teardown code also flush remaining
      xdp_frames, via bq_xmit_all() in case map entry is removed.  The code
      can call xdp_return_frame_rx_napi, from the the wrong context, in-case
      ndo_xdp_xmit() fails.
      
      Fixes: 389ab7f0 ("xdp: introduce xdp_return_frame_rx_napi")
      Fixes: 735fc405 ("xdp: change ndo_xdp_xmit API to support bulking")
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1bf9116d
    • Jesper Dangaard Brouer's avatar
      samples/bpf: xdp_redirect_cpu adjustment to reproduce teardown race easier · 37d7ff25
      Jesper Dangaard Brouer authored
      The teardown race in cpumap is really hard to reproduce.  These changes
      makes it easier to reproduce, for QA.
      
      The --stress-mode now have a case of a very small queue size of 8, that helps
      to trigger teardown flush to encounter a full queue, which results in calling
      xdp_return_frame API, in a non-NAPI protect context.
      
      Also increase MAX_CPUS, as my QA department have larger machines than me.
      Tested-by: default avatarJean-Tsung Hsiao <jhsiao@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      37d7ff25
    • Jesper Dangaard Brouer's avatar
      xdp: fix bug in cpumap teardown code path · ad0ab027
      Jesper Dangaard Brouer authored
      When removing a cpumap entry, a number of syncronization steps happen.
      Eventually the teardown code __cpu_map_entry_free is invoked from/via
      call_rcu.
      
      The teardown code __cpu_map_entry_free() flushes remaining xdp_frames,
      by invoking bq_flush_to_queue, which calls xdp_return_frame_rx_napi().
      The issues is that the teardown code is not running in the RX NAPI
      code path.  Thus, it is not allowed to invoke the NAPI variant of
      xdp_return_frame.
      
      This bug was found and triggered by using the --stress-mode option to
      the samples/bpf program xdp_redirect_cpu.  It is hard to trigger,
      because the ptr_ring have to be full and cpumap bulk queue max
      contains 8 packets, and a remote CPU is racing to empty the ptr_ring
      queue.
      
      Fixes: 389ab7f0 ("xdp: introduce xdp_return_frame_rx_napi")
      Tested-by: default avatarJean-Tsung Hsiao <jhsiao@redhat.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ad0ab027