1. 13 Mar, 2023 1 commit
    • Xiao Ni's avatar
      md: Free resources in __md_stop · 3e453522
      Xiao Ni authored
      If md_run() fails after ->active_io is initialized, then percpu_ref_exit
      is called in error path. However, later md_free_disk will call
      percpu_ref_exit again which leads to a panic because of null pointer
      dereference. It can also trigger this bug when resources are initialized
      but are freed in error path, then will be freed again in md_free_disk.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000038
      Oops: 0000 [#1] PREEMPT SMP
      Workqueue: md_misc mddev_delayed_delete
      RIP: 0010:free_percpu+0x110/0x630
      Call Trace:
       <TASK>
       __percpu_ref_exit+0x44/0x70
       percpu_ref_exit+0x16/0x90
       md_free_disk+0x2f/0x80
       disk_release+0x101/0x180
       device_release+0x84/0x110
       kobject_put+0x12a/0x380
       kobject_put+0x160/0x380
       mddev_delayed_delete+0x19/0x30
       process_one_work+0x269/0x680
       worker_thread+0x266/0x640
       kthread+0x151/0x1b0
       ret_from_fork+0x1f/0x30
      
      For creating raid device, md raid calls do_md_run->md_run, dm raid calls
      md_run. We alloc those memory in md_run. For stopping raid device, md raid
      calls do_md_stop->__md_stop, dm raid calls md_stop->__md_stop. So we can
      free those memory resources in __md_stop.
      
      Fixes: 72adae23 ("md: Change active_io to percpu")
      Reported-and-tested-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      3e453522
  2. 03 Mar, 2023 1 commit
  3. 01 Mar, 2023 1 commit
    • Jens Axboe's avatar
      Merge tag 'nvme-6.3-2022-03-01' of git://git.infradead.org/nvme into for-6.3/block · 326ac2c5
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "nvme fixes for Linux 6.3
      
       - don't access released socket during error recovery (Akinobu Mita)
       - bring back auto-removal of deleted namespaces during sequential scan
         (Christoph Hellwig)
       - fix an error code in nvme_auth_process_dhchap_challenge
         (Dan Carpenter)
       - show well known discovery name (Daniel Wagner)
       - add a missing endianess conversion in effects masking (Keith Busch)"
      
      * tag 'nvme-6.3-2022-03-01' of git://git.infradead.org/nvme:
        nvme-fabrics: show well known discovery name
        nvme-tcp: don't access released socket during error recovery
        nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge()
        nvme: bring back auto-removal of deleted namespaces during sequential scan
        nvme: fix sparse warning on effects masking
      326ac2c5
  4. 28 Feb, 2023 5 commits
    • Daniel Wagner's avatar
      nvme-fabrics: show well known discovery name · 26a57cb3
      Daniel Wagner authored
      The kernel always logs the unique subsystem name for a discovery
      controller, even in the case user space asked for the well known.
      
      This has lead to confusion as the logs of nvme-cli and the kernel
      logs didn't match.
      
      First, nvme-cli connects to the well known discovery controller to
      figure out if it supports TP8013. If so then nvme-cli disconnects and
      connects to the unique discovery controller. Currently, the kernel show
      that user space connected twice to the unique one.
      
      To avoid further confusion, show the well known discovery controller if
      user space asked for it:
      
        $ nvme connect-all -v -t tcp -a 192.168.0.1
        nvme0: nqn.2014-08.org.nvmexpress.discovery connected
        nvme0: nqn.2014-08.org.nvmexpress.discovery disconnected
        nvme0: nqn.discovery connected
      
        kernel log:
        nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.0.1:8009
        nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
        nvme nvme0: new ctrl: NQN "nqn.discovery", addr 192.168.0.1:8009
      
      Fixes: e5ea42fa ("nvme: display correct subsystem NQN")
      Signed-off-by: default avatarDaniel Wagner <dwagner@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      26a57cb3
    • Akinobu Mita's avatar
      nvme-tcp: don't access released socket during error recovery · 76d54bf2
      Akinobu Mita authored
      While the error recovery work is temporarily failing reconnect attempts,
      running the 'nvme list' command causes a kernel NULL pointer dereference
      by calling getsockname() with a released socket.
      
      During error recovery work, the nvme tcp socket is released and a new one
      created, so it is not safe to access the socket without proper check.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Fixes: 02c57a82 ("nvme-tcp: print actual source IP address through sysfs "address" attr")
      Reviewed-by: default avatarMartin Belanger <martin.belanger@dell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      76d54bf2
    • Dan Carpenter's avatar
      nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge() · 51d24f70
      Dan Carpenter authored
      This function was transitioned from returning NVMe status codes to
      returning traditional kernel error codes.  However, this particular
      return now accidentally returns positive error codes like ENOMEM instead
      of negative -ENOMEM.
      
      Fixes: b0ef1b11 ("nvme-auth: don't use NVMe status codes")
      Signed-off-by: default avatarDan Carpenter <error27@gmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      51d24f70
    • Christoph Hellwig's avatar
      nvme: bring back auto-removal of deleted namespaces during sequential scan · 0dd6fff2
      Christoph Hellwig authored
      Bring back the check of the Identify Namespace return value for the
      legacy NVMe 1.0-style sequential scanning.  While NVMe 1.0 does not
      support namespace management, there are "modern" cloud solutions like
      Google Cloud Platform that claim the obsolete 1.0 compliance for no
      good reason while supporting proprietary sideband namespace management.
      
      Fixes: 1a893c2b ("nvme: refactor namespace probing")
      Reported-by: default avatarNils Hanke <nh@edgeless.systems>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Tested-by: default avatarNils Hanke <nh@edgeless.systems>
      0dd6fff2
    • Breno Leitao's avatar
      blk-iocost: Pass gendisk to ioc_refresh_params · e33b9365
      Breno Leitao authored
      Current kernel (d2980d8d) crashes
      when blk_iocost_init for `nvme1` disk.
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000050
      	#PF: supervisor read access in kernel mode
      	#PF: error_code(0x0000) - not-present page
      
      	blk_iocost_init (include/asm-generic/qspinlock.h:128
      			 include/linux/spinlock.h:203
      			 include/linux/spinlock_api_smp.h:158
      			 include/linux/spinlock.h:400
      			 block/blk-iocost.c:2884)
      	ioc_qos_write (block/blk-iocost.c:3198)
      	? kretprobe_perf_func (kernel/trace/trace_kprobe.c:1566)
      	? kernfs_fop_write_iter (include/linux/slab.h:584 fs/kernfs/file.c:311)
      	? __kmem_cache_alloc_node (mm/slab.h:? mm/slub.c:3452 mm/slub.c:3491)
      	? _copy_from_iter (arch/x86/include/asm/uaccess_64.h:46
      			   arch/x86/include/asm/uaccess_64.h:52
      			   lib/iov_iter.c:183 lib/iov_iter.c:628)
      	? kretprobe_dispatcher (kernel/trace/trace_kprobe.c:1693)
      	cgroup_file_write (kernel/cgroup/cgroup.c:4061)
      	kernfs_fop_write_iter (fs/kernfs/file.c:334)
      	vfs_write (include/linux/fs.h:1849 fs/read_write.c:491
      		   fs/read_write.c:584)
      	ksys_write (fs/read_write.c:637)
      	do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      	entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
      This happens because ioc_refresh_params() is being called without
      a properly initialized ioc->rqos, which is happening later in the callee
      side.
      
      ioc_refresh_params() -> ioc_autop_idx() tries to access
      ioc->rqos.disk->queue but ioc->rqos.disk is NULL, causing the BUG above.
      
      Create function, called ioc_refresh_params_disk(), that is similar to
      ioc_refresh_params() but where the "struct gendisk" could be passed as
      an explicit argument. This function will be called when ioc->rqos.disk
      is not initialized.
      
      Fixes: ce57b558 ("blk-rq-qos: make rq_qos_add and rq_qos_del more useful")
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20230228111654.1778120-1-leitao@debian.orgReviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e33b9365
  5. 27 Feb, 2023 1 commit
  6. 24 Feb, 2023 2 commits
    • Jens Axboe's avatar
      block: be a bit more careful in checking for NULL bdev while polling · 310726c3
      Jens Axboe authored
      Wei reports a crash with an application using polled IO:
      
      PGD 14265e067 P4D 14265e067 PUD 47ec50067 PMD 0
      Oops: 0000 [#1] SMP
      CPU: 0 PID: 21915 Comm: iocore_0 Kdump: loaded Tainted: G S                5.12.0-0_fbk12_clang_7346_g1bb6f2e7058f #1
      Hardware name: Wiwynn Delta Lake MP T8/Delta Lake-Class2, BIOS Y3DLM08 04/10/2022
      RIP: 0010:bio_poll+0x25/0x200
      Code: 0f 1f 44 00 00 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 48 83 ec 28 65 48 8b 04 25 28 00 00 00 48 89 44 24 20 48 8b 47 08 <48> 8b 80 70 02 00 00 4c 8b 70 50 8b 6f 34 31 db 83 fd ff 75 25 65
      RSP: 0018:ffffc90005fafdf8 EFLAGS: 00010292
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 74b43cd65dd66600
      RDX: 0000000000000003 RSI: ffffc90005fafe78 RDI: ffff8884b614e140
      RBP: ffff88849964df78 R08: 0000000000000000 R09: 0000000000000008
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff88849964df00
      R13: ffffc90005fafe78 R14: ffff888137d3c378 R15: 0000000000000001
      FS:  00007fd195000640(0000) GS:ffff88903f400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000270 CR3: 0000000466121001 CR4: 00000000007706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       iocb_bio_iopoll+0x1d/0x30
       io_do_iopoll+0xac/0x250
       __se_sys_io_uring_enter+0x3c5/0x5a0
       ? __x64_sys_write+0x89/0xd0
       do_syscall_64+0x2d/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x94f225d
      Code: 24 cc 00 00 00 41 8b 84 24 d0 00 00 00 c1 e0 04 83 e0 10 41 09 c2 8b 33 8b 53 04 4c 8b 43 18 4c 63 4b 0c b8 aa 01 00 00 0f 05 <85> c0 0f 88 85 00 00 00 29 03 45 84 f6 0f 84 88 00 00 00 41 f6 c7
      RSP: 002b:00007fd194ffcd88 EFLAGS: 00000202 ORIG_RAX: 00000000000001aa
      RAX: ffffffffffffffda RBX: 00007fd194ffcdc0 RCX: 00000000094f225d
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
      RBP: 00007fd194ffcdb0 R08: 0000000000000000 R09: 0000000000000008
      R10: 0000000000000001 R11: 0000000000000202 R12: 00007fd269d68030
      R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
      
      which is due to bio->bi_bdev being NULL. This can happen if we have two
      tasks doing polled IO, and task B ends up completing IO from task A if
      they are sharing a poll queue. If task B completes the IO and puts the
      bio into our cache, then it can allocate that bio again before task A
      is done polling for it. As that would necessitate a preempt between the
      two tasks, it's enough to just be a bit more careful in checking for
      whether or not bio->bi_bdev is NULL.
      Reported-and-tested-by: default avatarWei Zhang <wzhang@meta.com>
      Cc: stable@vger.kernel.org
      Fixes: be4d234d ("bio: add allocation cache abstraction")
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      310726c3
    • Jens Axboe's avatar
      block: clear bio->bi_bdev when putting a bio back in the cache · 11eb695f
      Jens Axboe authored
      This isn't strictly needed in terms of correctness, but it does allow
      polling to know if the bio has been put already by a different task
      and hence avoid polling something that we don't need to.
      
      Cc: stable@vger.kernel.org
      Fixes: be4d234d ("bio: add allocation cache abstraction")
      Reviewed-by: default avatarKeith Busch <kbusch@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      11eb695f
  7. 23 Feb, 2023 1 commit
  8. 21 Feb, 2023 2 commits
  9. 17 Feb, 2023 6 commits
  10. 16 Feb, 2023 6 commits
  11. 15 Feb, 2023 1 commit
  12. 14 Feb, 2023 7 commits
  13. 13 Feb, 2023 1 commit
  14. 10 Feb, 2023 3 commits
  15. 09 Feb, 2023 2 commits