1. 25 Mar, 2020 6 commits
  2. 24 Mar, 2020 21 commits
  3. 21 Mar, 2020 5 commits
    • Paolo Valente's avatar
      block, bfq: invoke flush_idle_tree after reparent_active_queues in pd_offline · 4d38a87f
      Paolo Valente authored
      In bfq_pd_offline(), the function bfq_flush_idle_tree() is invoked to
      flush the rb tree that contains all idle entities belonging to the pd
      (cgroup) being destroyed. In particular, bfq_flush_idle_tree() is
      invoked before bfq_reparent_active_queues(). Yet the latter may happen
      to add some entities to the idle tree. It happens if, in some of the
      calls to bfq_bfqq_move() performed by bfq_reparent_active_queues(),
      the queue to move is empty and gets expired.
      
      This commit simply reverses the invocation order between
      bfq_flush_idle_tree() and bfq_reparent_active_queues().
      
      Tested-by: cki-project@redhat.com
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4d38a87f
    • Paolo Valente's avatar
      block, bfq: make reparent_leaf_entity actually work only on leaf entities · 576682fa
      Paolo Valente authored
      bfq_reparent_leaf_entity() reparents the input leaf entity (a leaf
      entity represents just a bfq_queue in an entity tree). Yet, the input
      entity is guaranteed to always be a leaf entity only in two-level
      entity trees. In this respect, because of the error fixed by
      commit 14afc593 ("block, bfq: fix overwrite of bfq_group pointer
      in bfq_find_set_group()"), all (wrongly collapsed) entity trees happened
      to actually have only two levels. After the latter commit, this does not
      hold any longer.
      
      This commit fixes this problem by modifying
      bfq_reparent_leaf_entity(), so that it searches an active leaf entity
      down the path that stems from the input entity. Such a leaf entity is
      guaranteed to exist when bfq_reparent_leaf_entity() is invoked.
      
      Tested-by: cki-project@redhat.com
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      576682fa
    • Paolo Valente's avatar
      block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup · c8997736
      Paolo Valente authored
      A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The
      goal of this put is to release a process reference to a bfq_queue. But
      process-reference releases may trigger also some extra operation, and,
      to this goal, are handled through bfq_release_process_ref(). So, turn
      the invocation of bfq_put_queue() into an invocation of
      bfq_release_process_ref().
      
      Tested-by: cki-project@redhat.com
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c8997736
    • Paolo Valente's avatar
      block, bfq: move forward the getting of an extra ref in bfq_bfqq_move · fd1bb3ae
      Paolo Valente authored
      Commit ecedd3d7 ("block, bfq: get extra ref to prevent a queue
      from being freed during a group move") gets an extra reference to a
      bfq_queue before possibly deactivating it (temporarily), in
      bfq_bfqq_move(). This prevents the bfq_queue from disappearing before
      being reactivated in its new group.
      
      Yet, the bfq_queue may also be expired (i.e., its service may be
      stopped) before the bfq_queue is deactivated. And also an expiration
      may lead to a premature freeing. This commit fixes this issue by
      simply moving forward the getting of the extra reference already
      introduced by commit ecedd3d7 ("block, bfq: get extra ref to
      prevent a queue from being freed during a group move").
      
      Reported-by: cki-project@redhat.com
      Tested-by: cki-project@redhat.com
      Signed-off-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fd1bb3ae
    • Zhiqiang Liu's avatar
      block, bfq: fix use-after-free in bfq_idle_slice_timer_body · 2f95fa5c
      Zhiqiang Liu authored
      In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
      not in bfqd-lock critical section. The bfqq, which is not
      equal to NULL in bfq_idle_slice_timer, may be freed after passing
      to bfq_idle_slice_timer_body. So we will access the freed memory.
      
      In addition, considering the bfqq may be in race, we should
      firstly check whether bfqq is in service before doing something
      on it in bfq_idle_slice_timer_body func. If the bfqq in race is
      not in service, it means the bfqq has been expired through
      __bfq_bfqq_expire func, and wait_request flags has been cleared in
      __bfq_bfqd_reset_in_service func. So we do not need to re-clear the
      wait_request of bfqq which is not in service.
      
      KASAN log is given as follows:
      [13058.354613] ==================================================================
      [13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
      [13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767
      [13058.354646]
      [13058.354655] CPU: 96 PID: 19767 Comm: fork13
      [13058.354661] Call trace:
      [13058.354667]  dump_backtrace+0x0/0x310
      [13058.354672]  show_stack+0x28/0x38
      [13058.354681]  dump_stack+0xd8/0x108
      [13058.354687]  print_address_description+0x68/0x2d0
      [13058.354690]  kasan_report+0x124/0x2e0
      [13058.354697]  __asan_load8+0x88/0xb0
      [13058.354702]  bfq_idle_slice_timer+0xac/0x290
      [13058.354707]  __hrtimer_run_queues+0x298/0x8b8
      [13058.354710]  hrtimer_interrupt+0x1b8/0x678
      [13058.354716]  arch_timer_handler_phys+0x4c/0x78
      [13058.354722]  handle_percpu_devid_irq+0xf0/0x558
      [13058.354731]  generic_handle_irq+0x50/0x70
      [13058.354735]  __handle_domain_irq+0x94/0x110
      [13058.354739]  gic_handle_irq+0x8c/0x1b0
      [13058.354742]  el1_irq+0xb8/0x140
      [13058.354748]  do_wp_page+0x260/0xe28
      [13058.354752]  __handle_mm_fault+0x8ec/0x9b0
      [13058.354756]  handle_mm_fault+0x280/0x460
      [13058.354762]  do_page_fault+0x3ec/0x890
      [13058.354765]  do_mem_abort+0xc0/0x1b0
      [13058.354768]  el0_da+0x24/0x28
      [13058.354770]
      [13058.354773] Allocated by task 19731:
      [13058.354780]  kasan_kmalloc+0xe0/0x190
      [13058.354784]  kasan_slab_alloc+0x14/0x20
      [13058.354788]  kmem_cache_alloc_node+0x130/0x440
      [13058.354793]  bfq_get_queue+0x138/0x858
      [13058.354797]  bfq_get_bfqq_handle_split+0xd4/0x328
      [13058.354801]  bfq_init_rq+0x1f4/0x1180
      [13058.354806]  bfq_insert_requests+0x264/0x1c98
      [13058.354811]  blk_mq_sched_insert_requests+0x1c4/0x488
      [13058.354818]  blk_mq_flush_plug_list+0x2d4/0x6e0
      [13058.354826]  blk_flush_plug_list+0x230/0x548
      [13058.354830]  blk_finish_plug+0x60/0x80
      [13058.354838]  read_pages+0xec/0x2c0
      [13058.354842]  __do_page_cache_readahead+0x374/0x438
      [13058.354846]  ondemand_readahead+0x24c/0x6b0
      [13058.354851]  page_cache_sync_readahead+0x17c/0x2f8
      [13058.354858]  generic_file_buffered_read+0x588/0xc58
      [13058.354862]  generic_file_read_iter+0x1b4/0x278
      [13058.354965]  ext4_file_read_iter+0xa8/0x1d8 [ext4]
      [13058.354972]  __vfs_read+0x238/0x320
      [13058.354976]  vfs_read+0xbc/0x1c0
      [13058.354980]  ksys_read+0xdc/0x1b8
      [13058.354984]  __arm64_sys_read+0x50/0x60
      [13058.354990]  el0_svc_common+0xb4/0x1d8
      [13058.354994]  el0_svc_handler+0x50/0xa8
      [13058.354998]  el0_svc+0x8/0xc
      [13058.354999]
      [13058.355001] Freed by task 19731:
      [13058.355007]  __kasan_slab_free+0x120/0x228
      [13058.355010]  kasan_slab_free+0x10/0x18
      [13058.355014]  kmem_cache_free+0x288/0x3f0
      [13058.355018]  bfq_put_queue+0x134/0x208
      [13058.355022]  bfq_exit_icq_bfqq+0x164/0x348
      [13058.355026]  bfq_exit_icq+0x28/0x40
      [13058.355030]  ioc_exit_icq+0xa0/0x150
      [13058.355035]  put_io_context_active+0x250/0x438
      [13058.355038]  exit_io_context+0xd0/0x138
      [13058.355045]  do_exit+0x734/0xc58
      [13058.355050]  do_group_exit+0x78/0x220
      [13058.355054]  __wake_up_parent+0x0/0x50
      [13058.355058]  el0_svc_common+0xb4/0x1d8
      [13058.355062]  el0_svc_handler+0x50/0xa8
      [13058.355066]  el0_svc+0x8/0xc
      [13058.355067]
      [13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
      [13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70, ffffa02cf3e64040)
      [13058.355077] The buggy address belongs to the page:
      [13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
      [13058.366175] flags: 0x2ffffe0000008100(slab|head)
      [13058.370781] raw: 2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780
      [13058.370787] raw: ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000
      [13058.370789] page dumped because: kasan: bad access detected
      [13058.370791]
      [13058.370792] Memory state around the buggy address:
      [13058.370797]  ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
      [13058.370801]  ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370808]                                                                 ^
      [13058.370811]  ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370815]  ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [13058.370817] ==================================================================
      [13058.370820] Disabling lock debugging due to kernel taint
      
      Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
      --
      V2->V3: rewrite the comment as suggested by Paolo Valente
      V1->V2: add one comment, and add Fixes and Reported-by tag.
      
      Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
      Acked-by: default avatarPaolo Valente <paolo.valente@linaro.org>
      Reported-by: default avatarWang Wang <wangwang2@huawei.com>
      Signed-off-by: default avatarZhiqiang Liu <liuzhiqiang26@huawei.com>
      Signed-off-by: default avatarFeilong Lin <linfeilong@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2f95fa5c
  4. 18 Mar, 2020 7 commits
  5. 12 Mar, 2020 1 commit
    • Alexey Dobriyan's avatar
      block, zoned: fix integer overflow with BLKRESETZONE et al · 11bde986
      Alexey Dobriyan authored
      Check for overflow in addition before checking for end-of-block-device.
      
      Steps to reproduce:
      
      	#define _GNU_SOURCE 1
      	#include <sys/ioctl.h>
      	#include <sys/types.h>
      	#include <sys/stat.h>
      	#include <fcntl.h>
      
      	typedef unsigned long long __u64;
      
      	struct blk_zone_range {
      	        __u64 sector;
      	        __u64 nr_sectors;
      	};
      
      	#define BLKRESETZONE    _IOW(0x12, 131, struct blk_zone_range)
      
      	int main(void)
      	{
      	        int fd = open("/dev/nullb0", O_RDWR|O_DIRECT);
      	        struct blk_zone_range zr = {4096, 0xfffffffffffff000ULL};
      	        ioctl(fd, BLKRESETZONE, &zr);
      	        return 0;
      	}
      
      BUG: KASAN: null-ptr-deref in submit_bio_wait+0x74/0xe0
      Write of size 8 at addr 0000000000000040 by task a.out/1590
      
      CPU: 8 PID: 1590 Comm: a.out Not tainted 5.6.0-rc1-00019-g359c92c0 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
      Call Trace:
       dump_stack+0x76/0xa0
       __kasan_report.cold+0x5/0x3e
       kasan_report+0xe/0x20
       submit_bio_wait+0x74/0xe0
       blkdev_zone_mgmt+0x26f/0x2a0
       blkdev_zone_mgmt_ioctl+0x14b/0x1b0
       blkdev_ioctl+0xb28/0xe60
       block_ioctl+0x69/0x80
       ksys_ioctl+0x3af/0xa50
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAlexey Dobriyan (SK hynix) <adobriyan@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      11bde986