1. 30 Nov, 2018 1 commit
  2. 27 Nov, 2018 3 commits
    • Igor Konopko's avatar
      nvme-pci: fix surprise removal · 751a0cc0
      Igor Konopko authored
      When a PCIe NVMe device is not present, nvme_dev_remove_admin() calls
      blk_cleanup_queue() on the admin queue, which frees the hctx for that
      queue.  Moments later, on the same path nvme_kill_queues() calls
      blk_mq_unquiesce_queue() on admin queue and tries to access hctx of it,
      which leads to following OOPS:
      
      Oops: 0000 [#1] SMP PTI
      RIP: 0010:sbitmap_any_bit_set+0xb/0x40
      Call Trace:
       blk_mq_run_hw_queue+0xd5/0x150
       blk_mq_run_hw_queues+0x3a/0x50
       nvme_kill_queues+0x26/0x50
       nvme_remove_namespaces+0xb2/0xc0
       nvme_remove+0x60/0x140
       pci_device_remove+0x3b/0xb0
      
      Fixes: cb4bfda6 ("nvme-pci: fix hot removal during error handling")
      Signed-off-by: default avatarIgor Konopko <igor.j.konopko@intel.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      751a0cc0
    • Ewan D. Milne's avatar
      nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request() · dfa74422
      Ewan D. Milne authored
      __nvme_fc_init_request() invokes memset() on the nvme_fcp_op_w_sgl structure, which
      NULLed-out the nvme_req(req)->ctrl field previously set by nvme_fc_init_request().
      This apparently was not referenced until commit faf4a44fff ("nvme: support traffic
      based keep-alive") which now results in a crash in nvme_complete_rq():
      
      [ 8386.897130] RIP: 0010:panic+0x220/0x26c
      [ 8386.901406] Code: 83 3d 6f ee 72 01 00 74 05 e8 e8 54 02 00 48 c7 c6 40 fd 5b b4 48 c7 c7 d8 8d c6 b3 31e
      [ 8386.922359] RSP: 0018:ffff99650019fc40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      [ 8386.930804] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
      [ 8386.938764] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8e325f8168b0
      [ 8386.946725] RBP: ffff99650019fcb0 R08: 0000000000000000 R09: 00000000000004f8
      [ 8386.954687] R10: 0000000000000000 R11: ffff99650019f9b8 R12: ffffffffb3c55f3c
      [ 8386.962648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
      [ 8386.970613]  oops_end+0xd1/0xe0
      [ 8386.974116]  no_context+0x1b2/0x3c0
      [ 8386.978006]  do_page_fault+0x32/0x140
      [ 8386.982090]  page_fault+0x1e/0x30
      [ 8386.985786] RIP: 0010:nvme_complete_rq+0x65/0x1d0 [nvme_core]
      [ 8386.992195] Code: 41 bc 03 00 00 00 74 16 0f 86 c3 00 00 00 66 3d 83 00 41 bc 06 00 00 00 0f 85 e7 00 000
      [ 8387.013147] RSP: 0018:ffff99650019fe18 EFLAGS: 00010246
      [ 8387.018973] RAX: 0000000000000000 RBX: ffff8e322ae51280 RCX: 0000000000000001
      [ 8387.026935] RDX: 0000000000000400 RSI: 0000000000000001 RDI: ffff8e322ae51280
      [ 8387.034897] RBP: ffff8e322ae51280 R08: 0000000000000000 R09: ffffffffb2f0b890
      [ 8387.042859] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [ 8387.050821] R13: 0000000000000100 R14: 0000000000000004 R15: ffff8e2b0446d990
      [ 8387.058782]  ? swiotlb_unmap_page+0x40/0x40
      [ 8387.063448]  nvme_fc_complete_rq+0x2d/0x70 [nvme_fc]
      [ 8387.068986]  blk_done_softirq+0xa1/0xd0
      [ 8387.073264]  __do_softirq+0xd6/0x2a9
      [ 8387.077251]  run_ksoftirqd+0x26/0x40
      [ 8387.081238]  smpboot_thread_fn+0x10e/0x160
      [ 8387.085807]  kthread+0xf8/0x130
      [ 8387.089309]  ? sort_range+0x20/0x20
      [ 8387.093198]  ? kthread_stop+0x110/0x110
      [ 8387.097475]  ret_from_fork+0x35/0x40
      [ 8387.101462] ---[ end trace 7106b0adf5e422f8 ]---
      
      Fixes: faf4a44fff ("nvme: support traffic based keep-alive")
      Signed-off-by: default avatarEwan D. Milne <emilne@redhat.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      dfa74422
    • Keith Busch's avatar
      nvme: Free ctrl device name on init failure · d6a2b953
      Keith Busch authored
      Free the kobject name that was allocated for the controller device on
      failure rather than its parent.
      
      Fixes: d22524a4 ("nvme: switch controller refcounting to use struct device")
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      d6a2b953
  3. 21 Nov, 2018 1 commit
  4. 15 Nov, 2018 1 commit
    • James Smart's avatar
      nvme-fc: resolve io failures during connect · 4cff280a
      James Smart authored
      If an io error occurs on an io issued while connecting, recovery
      of the io falls flat as the state checking ends up nooping the error
      handler.
      
      Create an err_work work item that is scheduled upon an io error while
      connecting. The work thread terminates all io on all queues and marks
      the queues as not connected.  The termination of the io will return
      back to the callee, which will then back out of the connection attempt
      and will reschedule, if possible, the connection attempt.
      
      The changes:
      - in case there are several commands hitting the error handler, a
        state flag is kept so that the error work is only scheduled once,
        on the first error. The subsequent errors can be ignored.
      - The calling sequence to stop keep alive and terminate the queues
        and their io is lifted from the reset routine. Made a small
        service routine used by both reset and err_work.
      - During debugging, found that the teardown path can reference
        an uninitialized pointer, resulting in a NULL pointer oops.
        The aen_ops weren't initialized yet. Add validation on their
        initialization before calling the teardown routine.
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      4cff280a
  5. 14 Nov, 2018 2 commits
    • Ming Lei's avatar
      SCSI: fix queue cleanup race before queue initialization is done · 8dc765d4
      Ming Lei authored
      c2856ae2 ("blk-mq: quiesce queue before freeing queue") has
      already fixed this race, however the implied synchronize_rcu()
      in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
      performance regression.
      
      Then 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      tried to quiesce queue for avoiding unnecessary synchronize_rcu()
      only when queue initialization is done, because it is usual to see
      lots of inexistent LUNs which need to be probed.
      
      However, turns out it isn't safe to quiesce queue only when queue
      initialization is done. Because when one SCSI command is completed,
      the user of sending command can be waken up immediately, then the
      scsi device may be removed, meantime the run queue in scsi_end_request()
      is still in-progress, so kernel panic can be caused.
      
      In Red Hat QE lab, there are several reports about this kind of kernel
      panic triggered during kernel booting.
      
      This patch tries to address the issue by grabing one queue usage
      counter during freeing one request and the following run queue.
      
      Fixes: 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      Cc: Andrew Jones <drjones@redhat.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
      Cc: stable <stable@vger.kernel.org>
      Cc: jianchao.wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8dc765d4
    • Dave Chinner's avatar
      block: fix 32 bit overflow in __blkdev_issue_discard() · 4800bf7b
      Dave Chinner authored
      A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to
      fall into an endless loop in the discard code. The test is creating
      a device that is exactly 2^32 sectors in size to test mkfs boundary
      conditions around the 32 bit sector overflow region.
      
      mkfs issues a discard for the entire device size by default, and
      hence this throws a sector count of 2^32 into
      blkdev_issue_discard(). It takes the number of sectors to discard as
      a sector_t - a 64 bit value.
      
      The commit ba5d7385 ("block: cleanup __blkdev_issue_discard")
      takes this sector count and casts it to a 32 bit value before
      comapring it against the maximum allowed discard size the device
      has. This truncates away the upper 32 bits, and so if the lower 32
      bits of the sector count is zero, it starts issuing discards of
      length 0. This causes the code to fall into an endless loop, issuing
      a zero length discards over and over again on the same sector.
      
      Fixes: ba5d7385 ("block: cleanup __blkdev_issue_discard")
      Tested-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      
      Killed pointless WARN_ON().
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4800bf7b
  6. 12 Nov, 2018 3 commits
  7. 10 Nov, 2018 1 commit
    • Jens Axboe's avatar
      floppy: fix race condition in __floppy_read_block_0() · de7b75d8
      Jens Axboe authored
      LKP recently reported a hang at bootup in the floppy code:
      
      [  245.678853] INFO: task mount:580 blocked for more than 120 seconds.
      [  245.679906]       Tainted: G                T 4.19.0-rc6-00172-ga9f38e1d #1
      [  245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  245.682181] mount           D 6372   580      1 0x00000004
      [  245.683023] Call Trace:
      [  245.683425]  __schedule+0x2df/0x570
      [  245.683975]  schedule+0x2d/0x80
      [  245.684476]  schedule_timeout+0x19d/0x330
      [  245.685090]  ? wait_for_common+0xa5/0x170
      [  245.685735]  wait_for_common+0xac/0x170
      [  245.686339]  ? do_sched_yield+0x90/0x90
      [  245.686935]  wait_for_completion+0x12/0x20
      [  245.687571]  __floppy_read_block_0+0xfb/0x150
      [  245.688244]  ? floppy_resume+0x40/0x40
      [  245.688844]  floppy_revalidate+0x20f/0x240
      [  245.689486]  check_disk_change+0x43/0x60
      [  245.690087]  floppy_open+0x1ea/0x360
      [  245.690653]  __blkdev_get+0xb4/0x4d0
      [  245.691212]  ? blkdev_get+0x1db/0x370
      [  245.691777]  blkdev_get+0x1f3/0x370
      [  245.692351]  ? path_put+0x15/0x20
      [  245.692871]  ? lookup_bdev+0x4b/0x90
      [  245.693539]  blkdev_get_by_path+0x3d/0x80
      [  245.694165]  mount_bdev+0x2a/0x190
      [  245.694695]  squashfs_mount+0x10/0x20
      [  245.695271]  ? squashfs_alloc_inode+0x30/0x30
      [  245.695960]  mount_fs+0xf/0x90
      [  245.696451]  vfs_kern_mount+0x43/0x130
      [  245.697036]  do_mount+0x187/0xc40
      [  245.697563]  ? memdup_user+0x28/0x50
      [  245.698124]  ksys_mount+0x60/0xc0
      [  245.698639]  sys_mount+0x19/0x20
      [  245.699167]  do_int80_syscall_32+0x61/0x130
      [  245.699813]  entry_INT80_32+0xc7/0xc7
      
      showing that we never complete that read request. The reason is that
      the completion setup is racy - it initializes the completion event
      AFTER submitting the IO, which means that the IO could complete
      before/during the init. If it does, we are passing garbage to
      complete() and we may sleep forever waiting for the event to
      occur.
      
      Fixes: 7b7b68bb ("floppy: bail out in open() if drive is not responding to block0 read")
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      de7b75d8
  8. 09 Nov, 2018 10 commits
  9. 08 Nov, 2018 12 commits
  10. 07 Nov, 2018 6 commits
    • Keith Busch's avatar
      block: Clear kernel memory before copying to user · f3587d76
      Keith Busch authored
      If the kernel allocates a bounce buffer for user read data, this memory
      needs to be cleared before copying it to the user, otherwise it may leak
      kernel memory to user space.
      
      Laurence Oberman <loberman@redhat.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f3587d76
    • Geert Uytterhoeven's avatar
      MAINTAINERS: Fix remaining pointers to obsolete libata.git · e31d36b0
      Geert Uytterhoeven authored
      libata.git no longer exists.  Replace the remaining pointers to it by
      pointers to the block tree, which is where all libata development
      happens now.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e31d36b0
    • Jens Axboe's avatar
      ubd: fix missing lock around request issue · 6961cd4d
      Jens Axboe authored
      We need to hold the device lock (and disable interrupts) while
      writing new commands, or we could be interrupted while that
      is happening and read invalid requests in the completion path.
      
      Fixes: 4e6da0fe ("um: Convert ubd driver to blk-mq")
      Tested-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6961cd4d
    • Geert Uytterhoeven's avatar
      Documentation: ABI: led-trigger-pattern: Fix typos · 406e7f98
      Geert Uytterhoeven authored
        - Spelling s/brigntess/brightness/,
        - Double "use".
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarJacek Anaszewski <jacek.anaszewski@gmail.com>
      406e7f98
    • Baolin Wang's avatar
      leds: trigger: Fix sleeping function called from invalid context · 3a40cfe8
      Baolin Wang authored
      We will meet below issue due to mutex_lock() is called in interrupt context.
      The mutex lock is used to protect the pattern trigger data, but before changing
      new pattern trigger data (pattern values or repeat value) by users, we always
      cancel the timer firstly to clear previous patterns' performance. That means
      there is no race in pattern_trig_timer_function(), so we can drop the mutex
      lock in pattern_trig_timer_function() to avoid this issue.
      
      Moreover we can move the timer cancelling into mutex protection, since there
      is no deadlock risk if we remove the mutex lock in pattern_trig_timer_function().
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.c:254
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted
      4.20.0-rc1-koelsch-00841-ga338c8181013c1a9 #171
      Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
      [<c020f19c>] (unwind_backtrace) from [<c020aecc>] (show_stack+0x10/0x14)
      [<c020aecc>] (show_stack) from [<c07affb8>] (dump_stack+0x7c/0x9c)
      [<c07affb8>] (dump_stack) from [<c02417d4>] (___might_sleep+0xf4/0x158)
      [<c02417d4>] (___might_sleep) from [<c07c92c4>] (mutex_lock+0x18/0x60)
      [<c07c92c4>] (mutex_lock) from [<c067b28c>] (pattern_trig_timer_function+0x1c/0x11c)
      [<c067b28c>] (pattern_trig_timer_function) from [<c027f6fc>] (call_timer_fn+0x1c/0x90)
      [<c027f6fc>] (call_timer_fn) from [<c027f944>] (expire_timers+0x94/0xa4)
      [<c027f944>] (expire_timers) from [<c027fc98>] (run_timer_softirq+0x108/0x15c)
      [<c027fc98>] (run_timer_softirq) from [<c02021cc>] (__do_softirq+0x1d4/0x258)
      [<c02021cc>] (__do_softirq) from [<c0224d24>] (irq_exit+0x64/0xc4)
      [<c0224d24>] (irq_exit) from [<c0268dd0>] (__handle_domain_irq+0x80/0xb4)
      [<c0268dd0>] (__handle_domain_irq) from [<c045e1b0>] (gic_handle_irq+0x58/0x90)
      [<c045e1b0>] (gic_handle_irq) from [<c02019f8>] (__irq_svc+0x58/0x74)
      Exception stack(0xeb483f60 to 0xeb483fa8)
      3f60: 00000000 00000000 eb9afaa0 c0217e80 00000000 ffffe000 00000000 c0e06408
      3f80: 00000002 c0e0647c c0c6a5f0 00000000 c0e04900 eb483fb0 c0207ea8 c0207e98
      3fa0: 60020013 ffffffff
      [<c02019f8>] (__irq_svc) from [<c0207e98>] (arch_cpu_idle+0x1c/0x38)
      [<c0207e98>] (arch_cpu_idle) from [<c0247ca8>] (do_idle+0x138/0x268)
      [<c0247ca8>] (do_idle) from [<c0248050>] (cpu_startup_entry+0x18/0x1c)
      [<c0248050>] (cpu_startup_entry) from [<402022ec>] (0x402022ec)
      
      Fixes: 5fd752b6 ("leds: core: Introduce LED pattern trigger")
      Signed-off-by: default avatarBaolin Wang <baolin.wang@linaro.org>
      Reported-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarJacek Anaszewski <jacek.anaszewski@gmail.com>
      3a40cfe8
    • Johannes Thumshirn's avatar
      block: respect virtual boundary mask in bvecs · df376b2e
      Johannes Thumshirn authored
      With drivers that are settting a virtual boundary constrain, we are
      seeing a lot of bio splitting and smaller I/Os being submitted to the
      driver.
      
      This happens because the bio gap detection code does not account cases
      where PAGE_SIZE - 1 is bigger than queue_virt_boundary() and thus will
      split the bio unnecessarily.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Acked-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      df376b2e