1. 16 Oct, 2021 6 commits
  2. 14 Oct, 2021 2 commits
  3. 12 Oct, 2021 1 commit
  4. 07 Oct, 2021 1 commit
  5. 04 Oct, 2021 1 commit
  6. 02 Oct, 2021 1 commit
  7. 30 Sep, 2021 1 commit
  8. 28 Sep, 2021 1 commit
  9. 27 Sep, 2021 1 commit
  10. 24 Sep, 2021 4 commits
    • Ming Lei's avatar
      block: hold ->invalidate_lock in blkdev_fallocate · f278eb3d
      Ming Lei authored
      When running ->fallocate(), blkdev_fallocate() should hold
      mapping->invalidate_lock to prevent page cache from being accessed,
      otherwise stale data may be read in page cache.
      
      Without this patch, blktests block/009 fails sometimes. With this patch,
      block/009 can pass always.
      
      Also as Jan pointed out, no pages can be created in the discarded area
      while you are holding the invalidate_lock, so remove the 2nd
      truncate_bdev_range().
      
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210923023751.1441091-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f278eb3d
    • Zhihao Cheng's avatar
      blktrace: Fix uaf in blk_trace access after removing by sysfs · 5afedf67
      Zhihao Cheng authored
      There is an use-after-free problem triggered by following process:
      
            P1(sda)				P2(sdb)
      			echo 0 > /sys/block/sdb/trace/enable
      			  blk_trace_remove_queue
      			    synchronize_rcu
      			    blk_trace_free
      			      relay_close
      rcu_read_lock
      __blk_add_trace
        trace_note_tsk
        (Iterate running_trace_list)
      			        relay_close_buf
      				  relay_destroy_buf
      				    kfree(buf)
          trace_note(sdb's bt)
            relay_reserve
              buf->offset <- nullptr deference (use-after-free) !!!
      rcu_read_unlock
      
      [  502.714379] BUG: kernel NULL pointer dereference, address:
      0000000000000010
      [  502.715260] #PF: supervisor read access in kernel mode
      [  502.715903] #PF: error_code(0x0000) - not-present page
      [  502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0
      [  502.717252] Oops: 0000 [#1] SMP
      [  502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360
      [  502.732872] Call Trace:
      [  502.733193]  __blk_add_trace.cold+0x137/0x1a3
      [  502.733734]  blk_add_trace_rq+0x7b/0xd0
      [  502.734207]  blk_add_trace_rq_issue+0x54/0xa0
      [  502.734755]  blk_mq_start_request+0xde/0x1b0
      [  502.735287]  scsi_queue_rq+0x528/0x1140
      ...
      [  502.742704]  sg_new_write.isra.0+0x16e/0x3e0
      [  502.747501]  sg_ioctl+0x466/0x1100
      
      Reproduce method:
        ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
        ioctl(/dev/sda, BLKTRACESTART)
        ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
        ioctl(/dev/sdb, BLKTRACESTART)
      
        echo 0 > /sys/block/sdb/trace/enable &
        // Add delay(mdelay/msleep) before kernel enters blk_trace_free()
      
        ioctl$SG_IO(/dev/sda, SG_IO, ...)
        // Enters trace_note_tsk() after blk_trace_free() returned
        // Use mdelay in rcu region rather than msleep(which may schedule out)
      
      Remove blk_trace from running_list before calling blk_trace_free() by
      sysfs if blk_trace is at Blktrace_running state.
      
      Fixes: c71a8961 ("blktrace: add ftrace plugin")
      Signed-off-by: default avatarZhihao Cheng <chengzhihao1@huawei.com>
      Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5afedf67
    • Ming Lei's avatar
      block: don't call rq_qos_ops->done_bio if the bio isn't tracked · a647a524
      Ming Lei authored
      rq_qos framework is only applied on request based driver, so:
      
      1) rq_qos_done_bio() needn't to be called for bio based driver
      
      2) rq_qos_done_bio() needn't to be called for bio which isn't tracked,
      such as bios ended from error handling code.
      
      Especially in bio_endio():
      
      1) request queue is referred via bio->bi_bdev->bd_disk->queue, which
      may be gone since request queue refcount may not be held in above two
      cases
      
      2) q->rq_qos may be freed in blk_cleanup_queue() when calling into
      __rq_qos_done_bio()
      
      Fix the potential kernel panic by not calling rq_qos_ops->done_bio if
      the bio isn't tracked. This way is safe because both ioc_rqos_done_bio()
      and blkcg_iolatency_done_bio() are nop if the bio isn't tracked.
      Reported-by: default avatarYu Kuai <yukuai3@huawei.com>
      Cc: tj@kernel.org
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20210924110704.1541818-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a647a524
    • Jens Axboe's avatar
      Merge tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme into block-5.15 · 5cad8756
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "nvme fixes for Linux 5.15:
      
       - keep ctrl->namespaces ordered (me)
       - fix incorrect h2cdata pdu offset accounting in nvme-tcp
         (Sagi Grimberg)
       - handled updated hw_queues in nvme-fc more carefully (Daniel Wagner,
         James Smart)"
      
      * tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme:
        nvme: keep ctrl->namespaces ordered
        nvme-tcp: fix incorrect h2cdata pdu offset accounting
        nvme-fc: remove freeze/unfreeze around update_nr_hw_queues
        nvme-fc: avoid race between time out and tear down
        nvme-fc: update hardware queues before using them
      5cad8756
  11. 22 Sep, 2021 2 commits
  12. 21 Sep, 2021 5 commits
  13. 15 Sep, 2021 6 commits
    • Li Jinlin's avatar
      blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd · 858560b2
      Li Jinlin authored
      KASAN reports a use-after-free report when doing fuzz test:
      
      [693354.104835] ==================================================================
      [693354.105094] BUG: KASAN: use-after-free in bfq_io_set_weight_legacy+0xd3/0x160
      [693354.105336] Read of size 4 at addr ffff888be0a35664 by task sh/1453338
      
      [693354.105607] CPU: 41 PID: 1453338 Comm: sh Kdump: loaded Not tainted 4.18.0-147
      [693354.105610] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.81 07/02/2018
      [693354.105612] Call Trace:
      [693354.105621]  dump_stack+0xf1/0x19b
      [693354.105626]  ? show_regs_print_info+0x5/0x5
      [693354.105634]  ? printk+0x9c/0xc3
      [693354.105638]  ? cpumask_weight+0x1f/0x1f
      [693354.105648]  print_address_description+0x70/0x360
      [693354.105654]  kasan_report+0x1b2/0x330
      [693354.105659]  ? bfq_io_set_weight_legacy+0xd3/0x160
      [693354.105665]  ? bfq_io_set_weight_legacy+0xd3/0x160
      [693354.105670]  bfq_io_set_weight_legacy+0xd3/0x160
      [693354.105675]  ? bfq_cpd_init+0x20/0x20
      [693354.105683]  cgroup_file_write+0x3aa/0x510
      [693354.105693]  ? ___slab_alloc+0x507/0x540
      [693354.105698]  ? cgroup_file_poll+0x60/0x60
      [693354.105702]  ? 0xffffffff89600000
      [693354.105708]  ? usercopy_abort+0x90/0x90
      [693354.105716]  ? mutex_lock+0xef/0x180
      [693354.105726]  kernfs_fop_write+0x1ab/0x280
      [693354.105732]  ? cgroup_file_poll+0x60/0x60
      [693354.105738]  vfs_write+0xe7/0x230
      [693354.105744]  ksys_write+0xb0/0x140
      [693354.105749]  ? __ia32_sys_read+0x50/0x50
      [693354.105760]  do_syscall_64+0x112/0x370
      [693354.105766]  ? syscall_return_slowpath+0x260/0x260
      [693354.105772]  ? do_page_fault+0x9b/0x270
      [693354.105779]  ? prepare_exit_to_usermode+0xf9/0x1a0
      [693354.105784]  ? enter_from_user_mode+0x30/0x30
      [693354.105793]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      [693354.105875] Allocated by task 1453337:
      [693354.106001]  kasan_kmalloc+0xa0/0xd0
      [693354.106006]  kmem_cache_alloc_node_trace+0x108/0x220
      [693354.106010]  bfq_pd_alloc+0x96/0x120
      [693354.106015]  blkcg_activate_policy+0x1b7/0x2b0
      [693354.106020]  bfq_create_group_hierarchy+0x1e/0x80
      [693354.106026]  bfq_init_queue+0x678/0x8c0
      [693354.106031]  blk_mq_init_sched+0x1f8/0x460
      [693354.106037]  elevator_switch_mq+0xe1/0x240
      [693354.106041]  elevator_switch+0x25/0x40
      [693354.106045]  elv_iosched_store+0x1a1/0x230
      [693354.106049]  queue_attr_store+0x78/0xb0
      [693354.106053]  kernfs_fop_write+0x1ab/0x280
      [693354.106056]  vfs_write+0xe7/0x230
      [693354.106060]  ksys_write+0xb0/0x140
      [693354.106064]  do_syscall_64+0x112/0x370
      [693354.106069]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      [693354.106114] Freed by task 1453336:
      [693354.106225]  __kasan_slab_free+0x130/0x180
      [693354.106229]  kfree+0x90/0x1b0
      [693354.106233]  blkcg_deactivate_policy+0x12c/0x220
      [693354.106238]  bfq_exit_queue+0xf5/0x110
      [693354.106241]  blk_mq_exit_sched+0x104/0x130
      [693354.106245]  __elevator_exit+0x45/0x60
      [693354.106249]  elevator_switch_mq+0xd6/0x240
      [693354.106253]  elevator_switch+0x25/0x40
      [693354.106257]  elv_iosched_store+0x1a1/0x230
      [693354.106261]  queue_attr_store+0x78/0xb0
      [693354.106264]  kernfs_fop_write+0x1ab/0x280
      [693354.106268]  vfs_write+0xe7/0x230
      [693354.106271]  ksys_write+0xb0/0x140
      [693354.106275]  do_syscall_64+0x112/0x370
      [693354.106280]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      [693354.106329] The buggy address belongs to the object at ffff888be0a35580
                       which belongs to the cache kmalloc-1k of size 1024
      [693354.106736] The buggy address is located 228 bytes inside of
                       1024-byte region [ffff888be0a35580, ffff888be0a35980)
      [693354.107114] The buggy address belongs to the page:
      [693354.107273] page:ffffea002f828c00 count:1 mapcount:0 mapping:ffff888107c17080 index:0x0 compound_mapcount: 0
      [693354.107606] flags: 0x17ffffc0008100(slab|head)
      [693354.107760] raw: 0017ffffc0008100 ffffea002fcbc808 ffffea0030bd3a08 ffff888107c17080
      [693354.108020] raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000
      [693354.108278] page dumped because: kasan: bad access detected
      
      [693354.108511] Memory state around the buggy address:
      [693354.108671]  ffff888be0a35500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [693354.116396]  ffff888be0a35580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [693354.124473] >ffff888be0a35600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [693354.132421]                                                        ^
      [693354.140284]  ffff888be0a35680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [693354.147912]  ffff888be0a35700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [693354.155281] ==================================================================
      
      blkgs are protected by both queue and blkcg locks and holding
      either should stabilize them. However, the path of destroying
      blkg policy data is only protected by queue lock in
      blkcg_activate_policy()/blkcg_deactivate_policy(). Other tasks
      can get the blkg policy data before the blkg policy data is
      destroyed, and use it after destroyed, which will result in a
      use-after-free.
      
      CPU0                             CPU1
      blkcg_deactivate_policy
        spin_lock_irq(&q->queue_lock)
                                       bfq_io_set_weight_legacy
                                         spin_lock_irq(&blkcg->lock)
                                         blkg_to_bfqg(blkg)
                                           pd_to_bfqg(blkg->pd[pol->plid])
                                           ^^^^^^blkg->pd[pol->plid] != NULL
                                                 bfqg != NULL
        pol->pd_free_fn(blkg->pd[pol->plid])
          pd_to_bfqg(blkg->pd[pol->plid])
          bfqg_put(bfqg)
            kfree(bfqg)
        blkg->pd[pol->plid] = NULL
        spin_unlock_irq(q->queue_lock);
                                         bfq_group_set_weight(bfqg, val, 0)
                                           bfqg->entity.new_weight
                                           ^^^^^^trigger uaf here
                                         spin_unlock_irq(&blkcg->lock);
      
      Fix by grabbing the matching blkcg lock before trying to
      destroy blkg policy data.
      Suggested-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarLi Jinlin <lijinlin3@huawei.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20210914042605.3260596-1-lijinlin3@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      858560b2
    • Yanfei Xu's avatar
      blkcg: fix memory leak in blk_iolatency_init · 6f5ddde4
      Yanfei Xu authored
      BUG: memory leak
      unreferenced object 0xffff888129acdb80 (size 96):
        comm "syz-executor.1", pid 12661, jiffies 4294962682 (age 15.220s)
        hex dump (first 32 bytes):
          20 47 c9 85 ff ff ff ff 20 d4 8e 29 81 88 ff ff   G...... ..)....
          01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff82264ec8>] kmalloc include/linux/slab.h:591 [inline]
          [<ffffffff82264ec8>] kzalloc include/linux/slab.h:721 [inline]
          [<ffffffff82264ec8>] blk_iolatency_init+0x28/0x190 block/blk-iolatency.c:724
          [<ffffffff8225b8c4>] blkcg_init_queue+0xb4/0x1c0 block/blk-cgroup.c:1185
          [<ffffffff822253da>] blk_alloc_queue+0x22a/0x2e0 block/blk-core.c:566
          [<ffffffff8223b175>] blk_mq_init_queue_data block/blk-mq.c:3100 [inline]
          [<ffffffff8223b175>] __blk_mq_alloc_disk+0x25/0xd0 block/blk-mq.c:3124
          [<ffffffff826a9303>] loop_add+0x1c3/0x360 drivers/block/loop.c:2344
          [<ffffffff826a966e>] loop_control_get_free drivers/block/loop.c:2501 [inline]
          [<ffffffff826a966e>] loop_control_ioctl+0x17e/0x2e0 drivers/block/loop.c:2516
          [<ffffffff81597eec>] vfs_ioctl fs/ioctl.c:51 [inline]
          [<ffffffff81597eec>] __do_sys_ioctl fs/ioctl.c:874 [inline]
          [<ffffffff81597eec>] __se_sys_ioctl fs/ioctl.c:860 [inline]
          [<ffffffff81597eec>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
          [<ffffffff843fa745>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<ffffffff843fa745>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
          [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Once blk_throtl_init() queue init failed, blkcg_iolatency_exit() will
      not be invoked for cleanup. That leads a memory leak. Swap the
      blk_throtl_init() and blk_iolatency_init() calls can solve this.
      
      Reported-by: syzbot+01321b15cc98e6bf96d6@syzkaller.appspotmail.com
      Fixes: 19688d7f (block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls)
      Signed-off-by: default avatarYanfei Xu <yanfei.xu@windriver.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20210915072426.4022924-1-yanfei.xu@windriver.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6f5ddde4
    • Jens Axboe's avatar
      Merge tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme into block-5.15 · 65ed1e69
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "nvme fixes for Linux 5.15
      
       - fix ANA state updates when a namespace is not present (Anton Eidelman)
       - nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show
         (Dan Carpenter)
       - avoid race in shutdown namespace removal (Daniel Wagner)
       - fix io_work priority inversion in nvme-tcp (Keith Busch)
       - destroy cm id before destroy qp to avoid use after free (Ruozhu Li)"
      
      * tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme:
        nvme-tcp: fix io_work priority inversion
        nvme-rdma: destroy cm id before destroy qp to avoid use after free
        nvme-multipath: fix ANA state updates when a namespace is not present
        nvme: avoid race in shutdown namespace removal
        nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show()
      65ed1e69
    • Christoph Hellwig's avatar
      nvme: remove the call to nvme_update_disk_info in nvme_ns_remove · 9da4c727
      Christoph Hellwig authored
      There is no need to explicitly unregister the integrity profile when
      deleting the gendisk.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Link: https://lore.kernel.org/r/20210914070657.87677-4-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9da4c727
    • Lihong Kou's avatar
      block: flush the integrity workqueue in blk_integrity_unregister · 3df49967
      Lihong Kou authored
      When the integrity profile is unregistered there can still be integrity
      reads queued up which could see a NULL verify_fn as shown by the race
      window below:
      
      CPU0                                    CPU1
        process_one_work                      nvme_validate_ns
          bio_integrity_verify_fn                nvme_update_ns_info
      	                                     nvme_update_disk_info
      	                                       blk_integrity_unregister
                                                     ---set queue->integrity as 0
      	bio_integrity_process
      	--access bi->profile->verify_fn(bi is a pointer of queue->integity)
      
      Before calling blk_integrity_unregister in nvme_update_disk_info, we must
      make sure that there is no work item in the kintegrityd_wq. Just call
      blk_flush_integrity to flush the work queue so the bug can be resolved.
      Signed-off-by: default avatarLihong Kou <koulihong@huawei.com>
      [hch: split up and shortened the changelog]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Link: https://lore.kernel.org/r/20210914070657.87677-3-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3df49967
    • Christoph Hellwig's avatar
      block: check if a profile is actually registered in blk_integrity_unregister · 783a40a1
      Christoph Hellwig authored
      While clearing the profile itself is harmless, we really should not clear
      the stable writes flag if it wasn't set due to a registered integrity
      profile.
      Reported-by: default avatarLihong Kou <koulihong@huawei.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Link: https://lore.kernel.org/r/20210914070657.87677-2-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      783a40a1
  14. 14 Sep, 2021 3 commits
  15. 13 Sep, 2021 3 commits
  16. 12 Sep, 2021 2 commits
    • Linus Torvalds's avatar
      Linux 5.15-rc1 · 6880fa6c
      Linus Torvalds authored
      6880fa6c
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-for-v5.15-2021-09-11' of... · b5b65f13
      Linus Torvalds authored
      Merge tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull more perf tools updates from Arnaldo Carvalho de Melo:
      
       - Add missing fields and remove some duplicate fields when printing a
         perf_event_attr.
      
       - Fix hybrid config terms list corruption.
      
       - Update kernel header copies, some resulted in new kernel features
         being automagically added to 'perf trace' syscall/tracepoint argument
         id->string translators.
      
       - Add a file generated during the documentation build to .gitignore.
      
       - Add an option to build without libbfd, as some distros, like Debian
         consider its ABI unstable.
      
       - Add support to print a textual representation of IBS raw sample data
         in 'perf report'.
      
       - Fix bpf 'perf test' sample mismatch reporting
      
       - Fix passing arguments to stackcollapse report in a 'perf script'
         python script.
      
       - Allow build-id with trailing zeros.
      
       - Look for ImageBase in PE file to compute .text offset.
      
      * tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (25 commits)
        tools headers UAPI: Update tools's copy of drm.h headers
        tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
        tools headers UAPI: Sync linux/fs.h with the kernel sources
        tools headers UAPI: Sync linux/in.h copy with the kernel sources
        perf tools: Add an option to build without libbfd
        perf tools: Allow build-id with trailing zeros
        perf tools: Fix hybrid config terms list corruption
        perf tools: Factor out copy_config_terms() and free_config_terms()
        perf tools: Fix perf_event_attr__fprintf() missing/dupl. fields
        perf tools: Ignore Documentation dependency file
        perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions
        tools include UAPI: Update linux/mount.h copy
        perf beauty: Cover more flags in the  move_mount syscall argument beautifier
        tools headers UAPI: Sync linux/prctl.h with the kernel sources
        tools include UAPI: Sync sound/asound.h copy with the kernel sources
        tools headers UAPI: Sync linux/kvm.h with the kernel sources
        tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
        perf report: Add support to print a textual representation of IBS raw sample data
        perf report: Add tools/arch/x86/include/asm/amd-ibs.h
        perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings
        ...
      b5b65f13