1. 08 Nov, 2019 1 commit
  2. 07 Nov, 2019 1 commit
    • Tejun Heo's avatar
      blkcg: make blkcg_print_stat() print stats only for online blkgs · b0814361
      Tejun Heo authored
      blkcg_print_stat() iterates blkgs under RCU and doesn't test whether
      the blkg is online.  This can call into pd_stat_fn() on a pd which is
      still being initialized leading to an oops.
      
      The heaviest operation - recursively summing up rwstat counters - is
      already done while holding the queue_lock.  Expand queue_lock to cover
      the other operations and skip the blkg if it isn't online yet.  The
      online state is protected by both blkcg and queue locks, so this
      guarantees that only online blkgs are processed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Fixes: 903d23f0 ("blk-cgroup: allow controllers to output their own stats")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b0814361
  3. 05 Nov, 2019 4 commits
    • Jens Axboe's avatar
      Merge branch 'nvme-5.4-rc7' of git://git.infradead.org/nvme into for-linus · 0473976c
      Jens Axboe authored
      Pull NVMe fixes from Keith:
      
      "We have a few late nvme fixes for a couple device removal kernel
       crashes, and a compat fix for a new ioctl introduced during this merge
       window."
      
      * 'nvme-5.4-rc7' of git://git.infradead.org/nvme:
        nvme: change nvme_passthru_cmd64 to explicitly mark rsvd
        nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths
        nvme-rdma: fix a segmentation fault during module unload
      0473976c
    • Charles Machalow's avatar
      nvme: change nvme_passthru_cmd64 to explicitly mark rsvd · 0d6eeb1f
      Charles Machalow authored
      Changing nvme_passthru_cmd64 to add a field: rsvd2. This field is an explicit
      marker for the padding space added on certain platforms as a result of the
      enlargement of the result field from 32 bit to 64 bits in size, and
      fixes differences in struct size when using compat ioctl for 32-bit
      binaries on 64-bit architecture.
      
      Fixes: 65e68edc ("nvme: allow 64-bit results in passthru commands")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarCharles Machalow <csm10495@gmail.com>
      [changelog]
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      0d6eeb1f
    • Anton Eidelman's avatar
      nvme-multipath: fix crash in nvme_mpath_clear_ctrl_paths · 763303a8
      Anton Eidelman authored
      nvme_mpath_clear_ctrl_paths() iterates through
      the ctrl->namespaces list while holding ctrl->scan_lock.
      This does not seem to be the correct way of protecting
      from concurrent list modification.
      
      Specifically, nvme_scan_work() sorts ctrl->namespaces
      AFTER unlocking scan_lock.
      
      This may result in the following (rare) crash in ctrl disconnect
      during scan_work:
      
          BUG: kernel NULL pointer dereference, address: 0000000000000050
          Oops: 0000 [#1] SMP PTI
          CPU: 0 PID: 3995 Comm: nvme 5.3.5-050305-generic
          RIP: 0010:nvme_mpath_clear_current_path+0xe/0x90 [nvme_core]
          ...
          Call Trace:
           nvme_mpath_clear_ctrl_paths+0x3c/0x70 [nvme_core]
           nvme_remove_namespaces+0x35/0xe0 [nvme_core]
           nvme_do_delete_ctrl+0x47/0x90 [nvme_core]
           nvme_sysfs_delete+0x49/0x60 [nvme_core]
           dev_attr_store+0x17/0x30
           sysfs_kf_write+0x3e/0x50
           kernfs_fop_write+0x11e/0x1a0
           __vfs_write+0x1b/0x40
           vfs_write+0xb9/0x1a0
           ksys_write+0x67/0xe0
           __x64_sys_write+0x1a/0x20
           do_syscall_64+0x5a/0x130
           entry_SYSCALL_64_after_hwframe+0x44/0xa9
          RIP: 0033:0x7f8d02bfb154
      
      Fix:
      After taking scan_lock in nvme_mpath_clear_ctrl_paths()
      down_read(&ctrl->namespaces_rwsem) as well to make list traversal safe.
      This will not cause deadlocks because taking scan_lock never happens
      while holding the namespaces_rwsem.
      Moreover, scan work downs namespaces_rwsem in the same order.
      
      Alternative: sort ctrl->namespaces in nvme_scan_work()
      while still holding the scan_lock.
      This would leave nvme_mpath_clear_ctrl_paths() without correct protection
      against ctrl->namespaces modification by anyone other than scan_work.
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAnton Eidelman <anton@lightbitslabs.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      763303a8
    • Max Gurtovoy's avatar
      nvme-rdma: fix a segmentation fault during module unload · 9ad9e8d6
      Max Gurtovoy authored
      In case there are controllers that are not associated with any RDMA
      device (e.g. during unsuccessful reconnection) and the user will unload
      the module, these controllers will not be freed and will access already
      freed memory. The same logic appears in other fabric drivers as well.
      
      Fixes: 87fd1253 ("nvme-rdma: remove redundant reference between ib_device and tagset")
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      9ad9e8d6
  4. 31 Oct, 2019 1 commit
  5. 30 Oct, 2019 1 commit
    • Jens Axboe's avatar
      io_uring: ensure we clear io_kiocb->result before each issue · 6873e0bd
      Jens Axboe authored
      We use io_kiocb->result == -EAGAIN as a way to know if we need to
      re-submit a polled request, as -EAGAIN reporting happens out-of-line
      for IO submission failures. This field is cleared when we originally
      allocate the request, but it isn't reset when we retry the submission
      from async context. This can cause issues where we think something
      needs a re-issue, but we're really just reading stale data.
      
      Reset ->result whenever we re-prep a request for polled submission.
      
      Cc: stable@vger.kernel.org
      Fixes: 9e645e11 ("io_uring: add support for sqe links")
      Reported-by: default avatarBijan Mottahedeh <bijan.mottahedeh@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6873e0bd
  6. 29 Oct, 2019 3 commits
    • Anton Ivanov's avatar
      um-ubd: Entrust re-queue to the upper layers · d848074b
      Anton Ivanov authored
      Fixes crashes due to ubd requeue logic conflicting with the block-mq
      logic. Crash is reproducible in 5.0 - 5.3.
      
      Fixes: 53766def ("um: Clean-up command processing in UML UBD driver")
      Cc: stable@vger.kernel.org # v5.0+
      Signed-off-by: default avatarAnton Ivanov <anton.ivanov@cambridgegreys.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d848074b
    • Anton Eidelman's avatar
      nvme-multipath: remove unused groups_only mode in ana log · 86cccfbf
      Anton Eidelman authored
      groups_only mode in nvme_read_ana_log() is no longer used: remove it.
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarAnton Eidelman <anton@lightbitslabs.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      86cccfbf
    • Anton Eidelman's avatar
      nvme-multipath: fix possible io hang after ctrl reconnect · af8fd042
      Anton Eidelman authored
      The following scenario results in an IO hang:
      1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
         NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
      2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
         This fails because ctrl disconnects.
         Therefore nvme_update_ns_ana_state() is not called
         and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
      3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
         nvme_read_ana_log(ctrl, groups_only=true).
         However, nvme_update_ana_state() does not update namespaces
         because nr_nsids = 0 (due to groups_only mode).
      4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.
      
      Result:
      The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
      Consequently ctrl will never be considered a viable path by __nvme_find_path().
      IO will hang if ctrl is the only or the last path to the namespace.
      
      More generally, while ctrl is reconnecting, its ANA state may change.
      And because nvme_mpath_init() requests ANA log in groups_only mode,
      these changes are not propagated to the existing ctrl namespaces.
      This may result in a mal-function or an IO hang.
      
      Solution:
      nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
      This will not harm the new ctrl case (no namespaces present),
      and will make sure the ANA state of namespaces gets updated after reconnect.
      
      Note: Another option would be for nvme_mpath_init() to invoke
      nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarAnton Eidelman <anton@lightbitslabs.com>
      Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      af8fd042
  7. 28 Oct, 2019 2 commits
  8. 27 Oct, 2019 7 commits
    • Linus Torvalds's avatar
      Linux 5.4-rc5 · d6d5df1d
      Linus Torvalds authored
      d6d5df1d
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 153a971f
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "Two fixes for the VMWare guest support:
      
         - Unbreak VMWare platform detection which got wreckaged by converting
           an integer constant to a string constant.
      
         - Fix the clang build of the VMWAre hypercall by explicitely
           specifying the ouput register for INL instead of using the short
           form"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
        x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
      153a971f
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2b776b54
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "A small set of fixes for time(keeping):
      
         - Add a missing include to prevent compiler warnings.
      
         - Make the VDSO implementation of clock_getres() POSIX compliant
           again. A recent change dropped the NULL pointer guard which is
           required as NULL is a valid pointer value for this function.
      
         - Fix two function documentation typos"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        posix-cpu-timers: Fix two trivial comments
        timers/sched_clock: Include local timekeeping.h for missing declarations
        lib/vdso: Make clock_getres() POSIX compliant again
      2b776b54
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a8a31fdc
      Linus Torvalds authored
      Pull perf fixes from Thomas Gleixner:
       "A set of perf fixes:
      
        kernel:
      
         - Unbreak the tracking of auxiliary buffer allocations which got
           imbalanced causing recource limit failures.
      
         - Fix the fallout of splitting of ToPA entries which missed to shift
           the base entry PA correctly.
      
         - Use the correct context to lookup the AUX event when unmapping the
           associated AUX buffer so the event can be stopped and the buffer
           reference dropped.
      
        tools:
      
         - Fix buildiid-cache mode setting in copyfile_mode_ns() when copying
           /proc/kcore
      
         - Fix freeing id arrays in the event list so the correct event is
           closed.
      
         - Sync sched.h anc kvm.h headers with the kernel sources.
      
         - Link jvmti against tools/lib/ctype.o to have weak strlcpy().
      
         - Fix multiple memory and file descriptor leaks, found by coverity in
           perf annotate.
      
         - Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found
           by a static analysis tool"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/aux: Fix AUX output stopping
        perf/aux: Fix tracking of auxiliary trace buffer allocation
        perf/x86/intel/pt: Fix base for single entry topa
        perf kmem: Fix memory leak in compact_gfp_flags()
        tools headers UAPI: Sync sched.h with the kernel
        tools headers kvm: Sync kvm.h headers with the kernel sources
        tools headers kvm: Sync kvm headers with the kernel sources
        tools headers kvm: Sync kvm headers with the kernel sources
        perf c2c: Fix memory leak in build_cl_output()
        perf tools: Fix mode setting in copyfile_mode_ns()
        perf annotate: Fix multiple memory and file descriptor leaks
        perf tools: Fix resource leak of closedir() on the error paths
        perf evlist: Fix fix for freed id arrays
        perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()
      a8a31fdc
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1e1ac1cb
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Two fixes for interrupt controller drivers:
      
         - Skip IRQ_M_EXT entries in the device tree when initializing the
           RISCV PLIC controller to avoid a double init attempt.
      
         - Use the correct ITS list when issuing the VMOVP synchronization
           command so the operation works only on the ITS instances which are
           associated to a VM"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/sifive-plic: Skip contexts except supervisor in plic_init()
        irqchip/gic-v3-its: Use the exact ITSList for VMOVP
      1e1ac1cb
    • Linus Torvalds's avatar
      Merge tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · c9a2e4a8
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Seven cifs/smb3 fixes, including three for stable"
      
      * tag '5.4-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Fix cifsInodeInfo lock_sem deadlock when reconnect occurs
        CIFS: Fix use after free of file info structures
        CIFS: Fix retry mid list corruption on reconnects
        cifs: Fix missed free operations
        CIFS: avoid using MID 0xFFFF
        cifs: clarify comment about timestamp granularity for old servers
        cifs: Handle -EINPROGRESS only when noblockcnt is set
      c9a2e4a8
    • Linus Torvalds's avatar
      Merge tag 'riscv/for-v5.4-rc5-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 6995a6a5
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
       "Several minor fixes and cleanups for v5.4-rc5:
      
         - Three build fixes for various SPARSEMEM-related kernel
           configurations
      
         - Two cleanup patches for the kernel bug and breakpoint trap handler
           code"
      
      * tag 'riscv/for-v5.4-rc5-b' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: cleanup do_trap_break
        riscv: cleanup <asm/bug.h>
        riscv: Fix undefined reference to vmemmap_populate_basepages
        riscv: Fix implicit declaration of 'page_to_section'
        riscv: fix fs/proc/kcore.c compilation with sparsemem enabled
      6995a6a5
  9. 26 Oct, 2019 13 commits
  10. 25 Oct, 2019 7 commits
    • Christoph Hellwig's avatar
      riscv: cleanup do_trap_break · e8f44c50
      Christoph Hellwig authored
      If we always compile the get_break_insn_length inline function we can
      remove the ifdefs and let dead code elimination take care of the warn
      branch that is now unreadable because the report_bug stub always
      returns BUG_TRAP_TYPE_BUG.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarAnup Patel <anup@brainfault.org>
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      e8f44c50
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · b4b61b22
      Linus Torvalds authored
      Pull input fix from Dmitry Torokhov:
       "A fix for st1232 driver to properly report coordinates for 2nd and
        subsequent fingers when more than one is on the surface"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: st1232 - fix reporting multitouch coordinates
      b4b61b22
    • Mike Christie's avatar
      nbd: verify socket is supported during setup · cf1b2326
      Mike Christie authored
      nbd requires socket families to support the shutdown method so the nbd
      recv workqueue can be woken up from its sock_recvmsg call. If the socket
      does not support the callout we will leave recv works running or get hangs
      later when the device or module is removed.
      
      This adds a check during socket connection/reconnection to make sure the
      socket being passed in supports the needed callout.
      
      Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
      Fixes: e9e006f5 ("nbd: fix max number of supported devs")
      Tested-by: default avatarRichard W.M. Jones <rjones@redhat.com>
      Signed-off-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cf1b2326
    • Mark Brown's avatar
      ata: libahci_platform: Fix regulator_get_optional() misuse · 962399bb
      Mark Brown authored
      This driver is using regulator_get_optional() to handle all the supplies
      that it handles, and only ever enables and disables all supplies en masse
      without ever doing any other configuration of the device to handle missing
      power. These are clear signs that the API is being misused - it should only
      be used for supplies that may be physically absent from the system and in
      these cases the hardware usually needs different configuration if the
      supply is missing. Instead use normal regualtor_get(), if the supply is
      not described in DT then the framework will substitute a dummy regulator in
      so no special handling is needed by the consumer driver.
      
      In the case of the PHY regulator the handling in the driver is a hack to
      deal with integrated PHYs; the supplies are only optional in the sense
      that that there's some confusion in the code about where they're bound to.
      From a code point of view they function exactly as normal supplies so can
      be treated as such. It'd probably be better to model this by instantiating
      a PHY object for integrated PHYs.
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      962399bb
    • Josef Bacik's avatar
      nbd: handle racing with error'ed out commands · 7ce23e8e
      Josef Bacik authored
      We hit the following warning in production
      
      print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
      ------------[ cut here ]------------
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
      Workqueue: knbd-recv recv_work [nbd]
      RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
      Call Trace:
       blk_mq_free_request+0xb7/0xf0
       blk_mq_complete_request+0x62/0xf0
       recv_work+0x29/0xa1 [nbd]
       process_one_work+0x1f5/0x3f0
       worker_thread+0x2d/0x3d0
       ? rescuer_thread+0x340/0x340
       kthread+0x111/0x130
       ? kthread_create_on_node+0x60/0x60
       ret_from_fork+0x1f/0x30
      ---[ end trace b079c3c67f98bb7c ]---
      
      This was preceded by us timing out everything and shutting down the
      sockets for the device.  The problem is we had a request in the queue at
      the same time, so we completed the request twice.  This can actually
      happen in a lot of cases, we fail to get a ref on our config, we only
      have one connection and just error out the command, etc.
      
      Fix this by checking cmd->status in nbd_read_stat.  We only change this
      under the cmd->lock, so we are safe to check this here and see if we've
      already error'ed this command out, which would indicate that we've
      completed it as well.
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7ce23e8e
    • Josef Bacik's avatar
      nbd: protect cmd->status with cmd->lock · de6346ec
      Josef Bacik authored
      We already do this for the most part, except in timeout and clear_req.
      For the timeout case we take the lock after we grab a ref on the config,
      but that isn't really necessary because we're safe to touch the cmd at
      this point, so just move the order around.
      
      For the clear_req cause this is initiated by the user, so again is safe.
      Reviewed-by: default avatarMike Christie <mchristi@redhat.com>
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      de6346ec
    • Linus Torvalds's avatar
      Merge tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · 9e2dd2ca
      Linus Torvalds authored
      Pull modules fixes from Jessica Yu:
      
       - Revert __ksymtab_$namespace.$symbol naming scheme back to
         __ksymtab_$symbol, as it was causing issues with depmod.
      
         Instead, have modpost extract a symbol's namespace from __kstrtabns
         and __ksymtab_strings.
      
       - Fix `make nsdeps` for out of tree kernel builds (make O=...) caused
         by unescaped '/'.
      
         Use a different sed delimiter to avoid this problem.
      
      * tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        scripts/nsdeps: use alternative sed delimiter
        symbol namespaces: revert to previous __ksymtab name scheme
        modpost: make updating the symbol namespace explicit
        modpost: delegate updating namespaces to separate function
      9e2dd2ca