1. 02 Apr, 2020 32 commits
  2. 20 Mar, 2020 8 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.9.217 · 10a20903
      Greg Kroah-Hartman authored
      10a20903
    • Matteo Croce's avatar
      ipv4: ensure rcu_read_lock() in cipso_v4_error() · 0bde22da
      Matteo Croce authored
      commit 3e72dfdf upstream.
      
      Similarly to commit c543cb4a ("ipv4: ensure rcu_read_lock() in
      ipv4_link_failure()"), __ip_options_compile() must be called under rcu
      protection.
      
      Fixes: 3da1ed7a ("net: avoid use IPCB in cipso_v4_error")
      Suggested-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarMatteo Croce <mcroce@redhat.com>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bde22da
    • Jann Horn's avatar
      mm: slub: add missing TID bump in kmem_cache_alloc_bulk() · ff58bb34
      Jann Horn authored
      commit fd4d9c7d upstream.
      
      When kmem_cache_alloc_bulk() attempts to allocate N objects from a percpu
      freelist of length M, and N > M > 0, it will first remove the M elements
      from the percpu freelist, then call ___slab_alloc() to allocate the next
      element and repopulate the percpu freelist. ___slab_alloc() can re-enable
      IRQs via allocate_slab(), so the TID must be bumped before ___slab_alloc()
      to properly commit the freelist head change.
      
      Fix it by unconditionally bumping c->tid when entering the slowpath.
      
      Cc: stable@vger.kernel.org
      Fixes: ebe909e0 ("slub: improve bulk alloc strategy")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff58bb34
    • Kees Cook's avatar
      ARM: 8958/1: rename missed uaccess .fixup section · ed14ef08
      Kees Cook authored
      commit f87b1c49 upstream.
      
      When the uaccess .fixup section was renamed to .text.fixup, one case was
      missed. Under ld.bfd, the orphaned section was moved close to .text
      (since they share the "ax" bits), so things would work normally on
      uaccess faults. Under ld.lld, the orphaned section was placed outside
      the .text section, making it unreachable.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/282
      Link: https://bugs.chromium.org/p/chromium/issues/detail?id=1020633#c44
      Link: https://lore.kernel.org/r/nycvar.YSQ.7.76.1912032147340.17114@knanqh.ubzr
      Link: https://lore.kernel.org/lkml/202002071754.F5F073F1D@keescook/
      
      Fixes: c4a84ae3 ("ARM: 8322/1: keep .text and .fixup regions closer together")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed14ef08
    • Florian Fainelli's avatar
      ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional() · 3a4c51d0
      Florian Fainelli authored
      commit 45939ce2 upstream.
      
      It is possible for a system with an ARMv8 timer to run a 32-bit kernel.
      When this happens we will unconditionally have the vDSO code remove the
      __vdso_gettimeofday and __vdso_clock_gettime symbols because
      cntvct_functional() returns false since it does not match that
      compatibility string.
      
      Fixes: ecf99a43 ("ARM: 8331/1: VDSO initialization, mapping, and synchronization")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a4c51d0
    • Qian Cai's avatar
      jbd2: fix data races at struct journal_head · e06aeb9f
      Qian Cai authored
      [ Upstream commit 6c5d9112 ]
      
      journal_head::b_transaction and journal_head::b_next_transaction could
      be accessed concurrently as noticed by KCSAN,
      
       LTP: starting fsync04
       /dev/zero: Can't open blockdev
       EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
       EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
       ==================================================================
       BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]
      
       write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
        __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
        __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
        jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
        (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
        kjournald2+0x13b/0x450 [jbd2]
        kthread+0x1cd/0x1f0
        ret_from_fork+0x27/0x50
      
       read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
        jbd2_write_access_granted+0x1b2/0x250 [jbd2]
        jbd2_write_access_granted at fs/jbd2/transaction.c:1155
        jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
        __ext4_journal_get_write_access+0x50/0x90 [ext4]
        ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
        ext4_mb_new_blocks+0x54f/0xca0 [ext4]
        ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
        ext4_map_blocks+0x3b4/0x950 [ext4]
        _ext4_get_block+0xfc/0x270 [ext4]
        ext4_get_block+0x3b/0x50 [ext4]
        __block_write_begin_int+0x22e/0xae0
        __block_write_begin+0x39/0x50
        ext4_write_begin+0x388/0xb50 [ext4]
        generic_perform_write+0x15d/0x290
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb05
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       5 locks held by fsync04/25724:
        #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
        #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
        #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
        #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
        #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
       irq event stamp: 1407125
       hardirqs last  enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
       hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
       softirqs last  enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
       softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
      
      The plain reads are outside of jh->b_state_lock critical section which result
      in data races. Fix them by adding pairs of READ|WRITE_ONCE().
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pwSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e06aeb9f
    • Linus Torvalds's avatar
      signal: avoid double atomic counter increments for user accounting · 4306259f
      Linus Torvalds authored
      [ Upstream commit fda31c50 ]
      
      When queueing a signal, we increment both the users count of pending
      signals (for RLIMIT_SIGPENDING tracking) and we increment the refcount
      of the user struct itself (because we keep a reference to the user in
      the signal structure in order to correctly account for it when freeing).
      
      That turns out to be fairly expensive, because both of them are atomic
      updates, and particularly under extreme signal handling pressure on big
      machines, you can get a lot of cache contention on the user struct.
      That can then cause horrid cacheline ping-pong when you do these
      multiple accesses.
      
      So change the reference counting to only pin the user for the _first_
      pending signal, and to unpin it when the last pending signal is
      dequeued.  That means that when a user sees a lot of concurrent signal
      queuing - which is the only situation when this matters - the only
      atomic access needed is generally the 'sigpending' count update.
      
      This was noticed because of a particularly odd timing artifact on a
      dual-socket 96C/192T Cascade Lake platform: when you get into bad
      contention, on that machine for some reason seems to be much worse when
      the contention happens in the upper 32-byte half of the cacheline.
      
      As a result, the kernel test robot will-it-scale 'signal1' benchmark had
      an odd performance regression simply due to random alignment of the
      'struct user_struct' (and pointed to a completely unrelated and
      apparently nonsensical commit for the regression).
      
      Avoiding the double increments (and decrements on the dequeueing side,
      of course) makes for much less contention and hugely improved
      performance on that will-it-scale microbenchmark.
      
      Quoting Feng Tang:
      
       "It makes a big difference, that the performance score is tripled! bump
        from original 17000 to 54000. Also the gap between 5.0-rc6 and
        5.0-rc6+Jiri's patch is reduced to around 2%"
      
      [ The "2% gap" is the odd cacheline placement difference on that
        platform: under the extreme contention case, the effect of which half
        of the cacheline was hot was 5%, so with the reduced contention the
        odd timing artifact is reduced too ]
      
      It does help in the non-contended case too, but is not nearly as
      noticeable.
      Reported-and-tested-by: default avatarFeng Tang <feng.tang@intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Philip Li <philip.li@intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4306259f
    • Madhuparna Bhowmik's avatar
      mac80211: rx: avoid RCU list traversal under mutex · d2a49017
      Madhuparna Bhowmik authored
      [ Upstream commit 253216ff ]
      
      local->sta_mtx is held in __ieee80211_check_fast_rx_iface().
      No need to use list_for_each_entry_rcu() as it also requires
      a cond argument to avoid false lockdep warnings when not used in
      RCU read-side section (with CONFIG_PROVE_RCU_LIST).
      Therefore use list_for_each_entry();
      Signed-off-by: default avatarMadhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
      Link: https://lore.kernel.org/r/20200223143302.15390-1-madhuparnabhowmik10@gmail.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d2a49017