1. 25 Oct, 2021 4 commits
    • Jens Axboe's avatar
      sbitmap: silence data race warning · 9f8b93a7
      Jens Axboe authored
      KCSAN complaints about the sbitmap hint update:
      
      ==================================================================
      BUG: KCSAN: data-race in sbitmap_queue_clear / sbitmap_queue_clear
      
      write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 1:
       sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606
       blk_mq_put_tag+0x82/0x90
       __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507
       blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541
       __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565
       blk_mq_end_request+0x37/0x50 block/blk-mq.c:574
       lo_complete_rq+0xca/0x170 drivers/block/loop.c:541
       blk_complete_reqs block/blk-mq.c:584 [inline]
       blk_done_softirq+0x69/0x90 block/blk-mq.c:589
       __do_softirq+0x12c/0x26e kernel/softirq.c:558
       run_ksoftirqd+0x13/0x20 kernel/softirq.c:920
       smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164
       kthread+0x262/0x280 kernel/kthread.c:319
       ret_from_fork+0x1f/0x30
      
      write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 0:
       sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606
       blk_mq_put_tag+0x82/0x90
       __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507
       blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541
       __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565
       blk_mq_end_request+0x37/0x50 block/blk-mq.c:574
       lo_complete_rq+0xca/0x170 drivers/block/loop.c:541
       blk_complete_reqs block/blk-mq.c:584 [inline]
       blk_done_softirq+0x69/0x90 block/blk-mq.c:589
       __do_softirq+0x12c/0x26e kernel/softirq.c:558
       run_ksoftirqd+0x13/0x20 kernel/softirq.c:920
       smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164
       kthread+0x262/0x280 kernel/kthread.c:319
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000035 -> 0x00000044
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.15.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      ==================================================================
      
      which is a data race, but not an important one. This is just updating the
      percpu alloc hint, and the reader of that hint doesn't ever require it to
      be valid.
      
      Just annotate it with data_race() to silence this one.
      
      Reported-by: syzbot+4f8bfd804b4a1f95b8f6@syzkaller.appspotmail.com
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9f8b93a7
    • Yu Kuai's avatar
      blk-cgroup: synchronize blkg creation against policy deactivation · 0c9d338c
      Yu Kuai authored
      Our test reports a null pointer dereference:
      
      [  168.534653] ==================================================================
      [  168.535614] Disabling lock debugging due to kernel taint
      [  168.536346] BUG: kernel NULL pointer dereference, address: 0000000000000008
      [  168.537274] #PF: supervisor read access in kernel mode
      [  168.537964] #PF: error_code(0x0000) - not-present page
      [  168.538667] PGD 0 P4D 0
      [  168.539025] Oops: 0000 [#1] PREEMPT SMP KASAN
      [  168.539656] CPU: 13 PID: 759 Comm: bash Tainted: G    B             5.15.0-rc2-next-202100
      [  168.540954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_0738364
      [  168.542736] RIP: 0010:bfq_pd_init+0x88/0x1e0
      [  168.543318] Code: 98 00 00 00 e8 c9 e4 5b ff 4c 8b 65 00 49 8d 7c 24 08 e8 bb e4 5b ff 4d0
      [  168.545803] RSP: 0018:ffff88817095f9c0 EFLAGS: 00010002
      [  168.546497] RAX: 0000000000000001 RBX: ffff888101a1c000 RCX: 0000000000000000
      [  168.547438] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff888106553428
      [  168.548402] RBP: ffff888106553400 R08: ffffffff961bcaf4 R09: 0000000000000001
      [  168.549365] R10: ffffffffa2e16c27 R11: fffffbfff45c2d84 R12: 0000000000000000
      [  168.550291] R13: ffff888101a1c098 R14: ffff88810c7a08c8 R15: ffffffffa55541a0
      [  168.551221] FS:  00007fac75227700(0000) GS:ffff88839ba80000(0000) knlGS:0000000000000000
      [  168.552278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  168.553040] CR2: 0000000000000008 CR3: 0000000165ce7000 CR4: 00000000000006e0
      [  168.554000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  168.554929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  168.555888] Call Trace:
      [  168.556221]  <TASK>
      [  168.556510]  blkg_create+0x1c0/0x8c0
      [  168.556989]  blkg_conf_prep+0x574/0x650
      [  168.557502]  ? stack_trace_save+0x99/0xd0
      [  168.558033]  ? blkcg_conf_open_bdev+0x1b0/0x1b0
      [  168.558629]  tg_set_conf.constprop.0+0xb9/0x280
      [  168.559231]  ? kasan_set_track+0x29/0x40
      [  168.559758]  ? kasan_set_free_info+0x30/0x60
      [  168.560344]  ? tg_set_limit+0xae0/0xae0
      [  168.560853]  ? do_sys_openat2+0x33b/0x640
      [  168.561383]  ? do_sys_open+0xa2/0x100
      [  168.561877]  ? __x64_sys_open+0x4e/0x60
      [  168.562383]  ? __kasan_check_write+0x20/0x30
      [  168.562951]  ? copyin+0x48/0x70
      [  168.563390]  ? _copy_from_iter+0x234/0x9e0
      [  168.563948]  tg_set_conf_u64+0x17/0x20
      [  168.564467]  cgroup_file_write+0x1ad/0x380
      [  168.565014]  ? cgroup_file_poll+0x80/0x80
      [  168.565568]  ? __mutex_lock_slowpath+0x30/0x30
      [  168.566165]  ? pgd_free+0x100/0x160
      [  168.566649]  kernfs_fop_write_iter+0x21d/0x340
      [  168.567246]  ? cgroup_file_poll+0x80/0x80
      [  168.567796]  new_sync_write+0x29f/0x3c0
      [  168.568314]  ? new_sync_read+0x410/0x410
      [  168.568840]  ? __handle_mm_fault+0x1c97/0x2d80
      [  168.569425]  ? copy_page_range+0x2b10/0x2b10
      [  168.570007]  ? _raw_read_lock_bh+0xa0/0xa0
      [  168.570622]  vfs_write+0x46e/0x630
      [  168.571091]  ksys_write+0xcd/0x1e0
      [  168.571563]  ? __x64_sys_read+0x60/0x60
      [  168.572081]  ? __kasan_check_write+0x20/0x30
      [  168.572659]  ? do_user_addr_fault+0x446/0xff0
      [  168.573264]  __x64_sys_write+0x46/0x60
      [  168.573774]  do_syscall_64+0x35/0x80
      [  168.574264]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  168.574960] RIP: 0033:0x7fac74915130
      [  168.575456] Code: 73 01 c3 48 8b 0d 58 ed 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 444
      [  168.577969] RSP: 002b:00007ffc3080e288 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  168.578986] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007fac74915130
      [  168.579937] RDX: 0000000000000009 RSI: 000056007669f080 RDI: 0000000000000001
      [  168.580884] RBP: 000056007669f080 R08: 000000000000000a R09: 00007fac75227700
      [  168.581841] R10: 000056007655c8f0 R11: 0000000000000246 R12: 0000000000000009
      [  168.582796] R13: 0000000000000001 R14: 00007fac74be55e0 R15: 00007fac74be08c0
      [  168.583757]  </TASK>
      [  168.584063] Modules linked in:
      [  168.584494] CR2: 0000000000000008
      [  168.584964] ---[ end trace 2475611ad0f77a1a ]---
      
      This is because blkg_alloc() is called from blkg_conf_prep() without
      holding 'q->queue_lock', and elevator is exited before blkg_create():
      
      thread 1                            thread 2
      blkg_conf_prep
       spin_lock_irq(&q->queue_lock);
       blkg_lookup_check -> return NULL
       spin_unlock_irq(&q->queue_lock);
      
       blkg_alloc
        blkcg_policy_enabled -> true
        pd = ->pd_alloc_fn
        blkg->pd[i] = pd
                                         blk_mq_exit_sched
                                          bfq_exit_queue
                                           blkcg_deactivate_policy
                                            spin_lock_irq(&q->queue_lock);
                                            __clear_bit(pol->plid, q->blkcg_pols);
                                            spin_unlock_irq(&q->queue_lock);
                                          q->elevator = NULL;
        spin_lock_irq(&q->queue_lock);
         blkg_create
          if (blkg->pd[i])
           ->pd_init_fn -> q->elevator is NULL
        spin_unlock_irq(&q->queue_lock);
      
      Because blkcg_deactivate_policy() requires queue to be frozen, we can
      grab q_usage_counter to synchoronize blkg_conf_prep() against
      blkcg_deactivate_policy().
      
      Fixes: e21b7a0b ("block, bfq: add full hierarchical scheduling and cgroups support")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/r/20211020014036.2141723-1-yukuai3@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0c9d338c
    • Pavel Begunkov's avatar
      block: refactor bio_iov_bvec_set() · fa5fa8ec
      Pavel Begunkov authored
      Combine bio_iov_bvec_set() and bio_iov_bvec_set_append() and let the
      caller to do iov_iter_advance(). Also get rid of __bio_iov_bvec_set(),
      which was duplicated in the final binary, and replace a weird
      iov_iter_truncate() of a temporal iter copy with min() better reflecting
      the intention.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/bcf1ac36fce769a514e19475f3623cd86a1d8b72.1635006010.git.asml.silence@gmail.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fa5fa8ec
    • Pavel Begunkov's avatar
      block: add single bio async direct IO helper · 54a88eb8
      Pavel Begunkov authored
      As with __blkdev_direct_IO_simple(), we can implement direct IO more
      efficiently if there is only one bio. Add __blkdev_direct_IO_async() and
      blkdev_bio_end_io_async(). This patch brings me from 4.45-4.5 MIOPS with
      nullblk to 4.7+.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/f0ae4109b7a6934adede490f84d188d53b97051b.1635006010.git.asml.silence@gmail.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      54a88eb8
  2. 23 Oct, 2021 1 commit
    • Jens Axboe's avatar
      sched: make task_struct->plug always defined · 599593a8
      Jens Axboe authored
      If CONFIG_BLOCK isn't set, then it's an empty struct anyway. Just make
      it generally available, so we don't break the compile:
      
      kernel/sched/core.c: In function ‘sched_submit_work’:
      kernel/sched/core.c:6346:35: error: ‘struct task_struct’ has no member named ‘plug’
       6346 |                 blk_flush_plug(tsk->plug, true);
            |                                   ^~
      kernel/sched/core.c: In function ‘io_schedule_prepare’:
      kernel/sched/core.c:8357:20: error: ‘struct task_struct’ has no member named ‘plug’
       8357 |         if (current->plug)
            |                    ^~
      kernel/sched/core.c:8358:39: error: ‘struct task_struct’ has no member named ‘plug’
       8358 |                 blk_flush_plug(current->plug, true);
            |                                       ^~
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Fixes: 008f75a2 ("block: cleanup the flush plug helpers")
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      599593a8
  3. 22 Oct, 2021 2 commits
  4. 21 Oct, 2021 15 commits
  5. 20 Oct, 2021 16 commits
  6. 19 Oct, 2021 2 commits