1. 24 Mar, 2020 11 commits
  2. 21 Mar, 2020 5 commits
    • Paolo Valente's avatar
      block, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline · 4d38a87f
      Paolo Valente authored
      In bfq_pd_offline(), the function bfq_flush_idle_tree() is invoked to
      flush the rb tree that contains all idle entities belonging to the pd
      (cgroup) being destroyed. In particular, bfq_flush_idle_tree() is
      invoked before bfq_reparent_active_queues(). Yet the latter may happen
      to add some entities to the idle tree. It happens if, in some of the
      calls to bfq_bfqq_move() performed by bfq_reparent_active_queues(),
      the queue to move is empty and gets expired.
      
      This commit simply reverses the invocation order between
      bfq_flush_idle_tree() and bfq_reparent_active_queues().
      
      Tested-by: cki-project@redhat.com
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4d38a87f
    • Paolo Valente's avatar
      block, bfq: make reparent_leaf_entity actually work only on leaf entities · 576682fa
      Paolo Valente authored
      bfq_reparent_leaf_entity() reparents the input leaf entity (a leaf
      entity represents just a bfq_queue in an entity tree). Yet, the input
      entity is guaranteed to always be a leaf entity only in two-level
      entity trees. In this respect, because of the error fixed by
      commit 14afc593 ("block, bfq: fix overwrite of bfq_group pointer
      in bfq_find_set_group()"), all (wrongly collapsed) entity trees happened
      to actually have only two levels. After the latter commit, this does not
      hold any longer.
      
      This commit fixes this problem by modifying
      bfq_reparent_leaf_entity(), so that it searches an active leaf entity
      down the path that stems from the input entity. Such a leaf entity is
      guaranteed to exist when bfq_reparent_leaf_entity() is invoked.
      
      Tested-by: cki-project@redhat.com
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      576682fa
    • Paolo Valente's avatar
      block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup · c8997736
      Paolo Valente authored
      A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The
      goal of this put is to release a process reference to a bfq_queue. But
      process-reference releases may trigger also some extra operation, and,
      to this goal, are handled through bfq_release_process_ref(). So, turn
      the invocation of bfq_put_queue() into an invocation of
      bfq_release_process_ref().
      
      Tested-by: cki-project@redhat.com
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c8997736
    • Paolo Valente's avatar
      block, bfq: move forward the getting of an extra ref in bfq_bfqq_move · fd1bb3ae
      Paolo Valente authored
      Commit ecedd3d7 ("block, bfq: get extra ref to prevent a queue
      from being freed during a group move") gets an extra reference to a
      bfq_queue before possibly deactivating it (temporarily), in
      bfq_bfqq_move(). This prevents the bfq_queue from disappearing before
      being reactivated in its new group.
      
      Yet, the bfq_queue may also be expired (i.e., its service may be
      stopped) before the bfq_queue is deactivated. And also an expiration
      may lead to a premature freeing. This commit fixes this issue by
      simply moving forward the getting of the extra reference already
      introduced by commit ecedd3d7 ("block, bfq: get extra ref to
      prevent a queue from being freed during a group move").
      
      Reported-by: cki-project@redhat.com
      Tested-by: cki-project@redhat.com
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fd1bb3ae
    • Zhiqiang Liu's avatar
      block, bfq: fix use-after-free in bfq_idle_slice_timer_body · 2f95fa5c
      Zhiqiang Liu authored
      In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
      not in bfqd-lock critical section. The bfqq, which is not
      equal to NULL in bfq_idle_slice_timer, may be freed after passing
      to bfq_idle_slice_timer_body. So we will access the freed memory.
      
      In addition, considering the bfqq may be in race, we should
      firstly check whether bfqq is in service before doing something
      on it in bfq_idle_slice_timer_body func. If the bfqq in race is
      not in service, it means the bfqq has been expired through
      __bfq_bfqq_expire func, and wait_request flags has been cleared in
      __bfq_bfqd_reset_in_service func. So we do not need to re-clear the
      wait_request of bfqq which is not in service.
      
      KASAN log is given as follows:
      [13058.354613] ==================================================================
      [13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
      [13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767
      [13058.354646]
      [13058.354655] CPU: 96 PID: 19767 Comm: fork13
      [13058.354661] Call trace:
      [13058.354667]  dump_backtrace+0x0/0x310
      [13058.354672]  show_stack+0x28/0x38
      [13058.354681]  dump_stack+0xd8/0x108
      [13058.354687]  print_address_description+0x68/0x2d0
      [13058.354690]  kasan_report+0x124/0x2e0
      [13058.354697]  __asan_load8+0x88/0xb0
      [13058.354702]  bfq_idle_slice_timer+0xac/0x290
      [13058.354707]  __hrtimer_run_queues+0x298/0x8b8
      [13058.354710]  hrtimer_interrupt+0x1b8/0x678
      [13058.354716]  arch_timer_handler_phys+0x4c/0x78
      [13058.354722]  handle_percpu_devid_irq+0xf0/0x558
      [13058.354731]  generic_handle_irq+0x50/0x70
      [13058.354735]  __handle_domain_irq+0x94/0x110
      [13058.354739]  gic_handle_irq+0x8c/0x1b0
      [13058.354742]  el1_irq+0xb8/0x140
      [13058.354748]  do_wp_page+0x260/0xe28
      [13058.354752]  __handle_mm_fault+0x8ec/0x9b0
      [13058.354756]  handle_mm_fault+0x280/0x460
      [13058.354762]  do_page_fault+0x3ec/0x890
      [13058.354765]  do_mem_abort+0xc0/0x1b0
      [13058.354768]  el0_da+0x24/0x28
      [13058.354770]
      [13058.354773] Allocated by task 19731:
      [13058.354780]  kasan_kmalloc+0xe0/0x190
      [13058.354784]  kasan_slab_alloc+0x14/0x20
      [13058.354788]  kmem_cache_alloc_node+0x130/0x440
      [13058.354793]  bfq_get_queue+0x138/0x858
      [13058.354797]  bfq_get_bfqq_handle_split+0xd4/0x328
      [13058.354801]  bfq_init_rq+0x1f4/0x1180
      [13058.354806]  bfq_insert_requests+0x264/0x1c98
      [13058.354811]  blk_mq_sched_insert_requests+0x1c4/0x488
      [13058.354818]  blk_mq_flush_plug_list+0x2d4/0x6e0
      [13058.354826]  blk_flush_plug_list+0x230/0x548
      [13058.354830]  blk_finish_plug+0x60/0x80
      [13058.354838]  read_pages+0xec/0x2c0
      [13058.354842]  __do_page_cache_readahead+0x374/0x438
      [13058.354846]  ondemand_readahead+0x24c/0x6b0
      [13058.354851]  page_cache_sync_readahead+0x17c/0x2f8
      [13058.354858]  generic_file_buffered_read+0x588/0xc58
      [13058.354862]  generic_file_read_iter+0x1b4/0x278
      [13058.354965]  ext4_file_read_iter+0xa8/0x1d8 [ext4]
      [13058.354972]  __vfs_read+0x238/0x320
      [13058.354976]  vfs_read+0xbc/0x1c0
      [13058.354980]  ksys_read+0xdc/0x1b8
      [13058.354984]  __arm64_sys_read+0x50/0x60
      [13058.354990]  el0_svc_common+0xb4/0x1d8
      [13058.354994]  el0_svc_handler+0x50/0xa8
      [13058.354998]  el0_svc+0x8/0xc
      [13058.354999]
      [13058.355001] Freed by task 19731:
      [13058.355007]  __kasan_slab_free+0x120/0x228
      [13058.355010]  kasan_slab_free+0x10/0x18
      [13058.355014]  kmem_cache_free+0x288/0x3f0
      [13058.355018]  bfq_put_queue+0x134/0x208
      [13058.355022]  bfq_exit_icq_bfqq+0x164/0x348
      [13058.355026]  bfq_exit_icq+0x28/0x40
      [13058.355030]  ioc_exit_icq+0xa0/0x150
      [13058.355035]  put_io_context_active+0x250/0x438
      [13058.355038]  exit_io_context+0xd0/0x138
      [13058.355045]  do_exit+0x734/0xc58
      [13058.355050]  do_group_exit+0x78/0x220
      [13058.355054]  __wake_up_parent+0x0/0x50
      [13058.355058]  el0_svc_common+0xb4/0x1d8
      [13058.355062]  el0_svc_handler+0x50/0xa8
      [13058.355066]  el0_svc+0x8/0xc
      [13058.355067]
      [13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
      [13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70, ffffa02cf3e64040)
      [13058.355077] The buggy address belongs to the page:
      [13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
      [13058.366175] flags: 0x2ffffe0000008100(slab|head)
      [13058.370781] raw: 2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780
      [13058.370787] raw: ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000
      [13058.370789] page dumped because: kasan: bad access detected
      [13058.370791]
      [13058.370792] Memory state around the buggy address:
      [13058.370797]  ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
      [13058.370801]  ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370808]                                                                 ^
      [13058.370811]  ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370815]  ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [13058.370817] ==================================================================
      [13058.370820] Disabling lock debugging due to kernel taint
      
      Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
      --
      V2->V3: rewrite the comment as suggested by Paolo Valente
      V1->V2: add one comment, and add Fixes and Reported-by tag.
      
      Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
      Acked-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Reported-by: default avatarWang Wang <wangwang2@huawei.com>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Signed-off-by: default avatarFeilong Lin <linfeilong@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2f95fa5c
  3. 18 Mar, 2020 7 commits
  4. 12 Mar, 2020 11 commits
  5. 10 Mar, 2020 6 commits
    • Bart Van Assche's avatar
      null_blk: Add support for init_hctx() fault injection · 596444e7
      Bart Van Assche authored
      This makes it possible to test the error path in blk_mq_realloc_hw_ctxs()
      and also several error paths in null_blk.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      596444e7
    • Bart Van Assche's avatar
      null_blk: Handle null_add_dev() failures properly · 9b03b713
      Bart Van Assche authored
      If null_add_dev() fails then null_del_dev() is called with a NULL argument.
      Make null_del_dev() handle this scenario correctly. This patch fixes the
      following KASAN complaint:
      
      null-ptr-deref in null_del_dev+0x28/0x280 [null_blk]
      Read of size 8 at addr 0000000000000000 by task find/1062
      
      Call Trace:
       dump_stack+0xa5/0xe6
       __kasan_report.cold+0x65/0x99
       kasan_report+0x16/0x20
       __asan_load8+0x58/0x90
       null_del_dev+0x28/0x280 [null_blk]
       nullb_group_drop_item+0x7e/0xa0 [null_blk]
       client_drop_item+0x53/0x80 [configfs]
       configfs_rmdir+0x395/0x4e0 [configfs]
       vfs_rmdir+0xb6/0x220
       do_rmdir+0x238/0x2c0
       __x64_sys_unlinkat+0x75/0x90
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9b03b713
    • Bart Van Assche's avatar
      null_blk: Fix the null_add_dev() error path · 2004bfde
      Bart Van Assche authored
      If null_add_dev() fails, clear dev->nullb.
      
      This patch fixes the following KASAN complaint:
      
      BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
      Read of size 8 at addr ffff88803280fc30 by task check/8409
      
      Call Trace:
       dump_stack+0xa5/0xe6
       print_address_description.constprop.0+0x26/0x260
       __kasan_report.cold+0x7b/0x99
       kasan_report+0x16/0x20
       __asan_load8+0x58/0x90
       nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7ff370926317
      Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      RSP: 002b:00007fff2dd2da48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff370926317
      RDX: 0000000000000002 RSI: 0000559437ef23f0 RDI: 0000000000000001
      RBP: 0000559437ef23f0 R08: 000000000000000a R09: 0000000000000001
      R10: 0000559436703471 R11: 0000000000000246 R12: 0000000000000002
      R13: 00007ff370a006a0 R14: 00007ff370a014a0 R15: 00007ff370a008a0
      
      Allocated by task 8409:
       save_stack+0x23/0x90
       __kasan_kmalloc.constprop.0+0xcf/0xe0
       kasan_kmalloc+0xd/0x10
       kmem_cache_alloc_node_trace+0x129/0x4c0
       null_add_dev+0x24a/0xe90 [null_blk]
       nullb_device_power_store+0x1b6/0x270 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 8409:
       save_stack+0x23/0x90
       __kasan_slab_free+0x112/0x160
       kasan_slab_free+0x12/0x20
       kfree+0xdf/0x250
       null_add_dev+0xaf3/0xe90 [null_blk]
       nullb_device_power_store+0x1b6/0x270 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 2984c868 ("nullb: factor disk parameters")
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2004bfde
    • Bart Van Assche's avatar
      null_blk: Fix changing the number of hardware queues · 78b10be2
      Bart Van Assche authored
      Instead of initializing null_blk hardware queues explicitly after the
      request queue has been created, provide .init_hctx() and .exit_hctx()
      callback functions. The latter functions are not only called during
      request queue allocation but also when the number of hardware queues
      changes. Allocate nr_cpu_ids queues during initialization to support
      increasing the number of hardware queues above the initial hardware
      queue count.
      
      This change fixes increasing the number of hardware queues above the
      initial number of hardware queues and also keeps nullb->nr_queues in
      sync with the number of hardware queues.
      
      Fixes: 45919fbf ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      78b10be2
    • Bart Van Assche's avatar
      null_blk: Suppress an UBSAN complaint triggered when setting 'memory_backed' · b9853b4d
      Bart Van Assche authored
      Although it is not clear to me why UBSAN complains when 'memory_backed'
      is set, this patch suppresses the UBSAN complaint that is triggered when
      setting that configfs attribute.
      
      UBSAN: Undefined behaviour in drivers/block/null_blk_main.c:327:1
      load of value 16 is not a valid value for type '_Bool'
      CPU: 2 PID: 8396 Comm: check Not tainted 5.6.0-rc1-dbg+ #14
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Call Trace:
       dump_stack+0xa5/0xe6
       ubsan_epilogue+0x9/0x26
       __ubsan_handle_load_invalid_value+0x6d/0x76
       nullb_device_memory_backed_store.cold+0x2c/0x38 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b9853b4d
    • Bart Van Assche's avatar
      blk-mq: Fix a recently introduced regression in blk_mq_realloc_hw_ctxs() · d0930bb8
      Bart Van Assche authored
      q->nr_hw_queues must only be updated once it is known that
      blk_mq_realloc_hw_ctxs() has succeeded. Otherwise it can happen that
      reallocation fails and that q->nr_hw_queues is larger than the number of
      allocated hardware queues. This patch fixes the following crash if
      increasing the number of hardware queues fails:
      
      BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x775/0x810
      Write of size 8 at addr 0000000000000118 by task check/977
      
      CPU: 3 PID: 977 Comm: check Not tainted 5.6.0-rc1-dbg+ #8
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Call Trace:
       dump_stack+0xa5/0xe6
       __kasan_report.cold+0x65/0x99
       kasan_report+0x16/0x20
       check_memory_region+0x140/0x1b0
       memset+0x28/0x40
       blk_mq_map_swqueue+0x775/0x810
       blk_mq_update_nr_hw_queues+0x468/0x710
       nullb_device_submit_queues_store+0xf7/0x1a0 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: ac0d6b92 ("block: Reduce the amount of memory required per request queue")
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Cc: Keith Busch <kbusch@kernel.org>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d0930bb8