1. 16 Feb, 2023 4 commits
    • Jens Axboe's avatar
      brd: mark as nowait compatible · 67205f80
      Jens Axboe authored
      By default, non-mq drivers do not support nowait. This causes io_uring
      to use a slower path as the driver cannot be trust not to block. brd
      can safely set the nowait flag, as worst case all it does is a NOIO
      allocation.
      
      For io_uring, this makes a substantial difference. Before:
      
      submitter=0, tid=453, file=/dev/ram0, node=-1
      polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
      Engine=io_uring, sq_ring=128, cq_ring=128
      IOPS=440.03K, BW=1718MiB/s, IOS/call=32/31
      IOPS=428.96K, BW=1675MiB/s, IOS/call=32/32
      IOPS=442.59K, BW=1728MiB/s, IOS/call=32/31
      IOPS=419.65K, BW=1639MiB/s, IOS/call=32/32
      IOPS=426.82K, BW=1667MiB/s, IOS/call=32/31
      
      and after:
      
      submitter=0, tid=354, file=/dev/ram0, node=-1
      polled=0, fixedbufs=1/0, register_files=1, buffered=0, QD=128
      Engine=io_uring, sq_ring=128, cq_ring=128
      IOPS=3.37M, BW=13.15GiB/s, IOS/call=32/31
      IOPS=3.45M, BW=13.46GiB/s, IOS/call=32/31
      IOPS=3.43M, BW=13.42GiB/s, IOS/call=32/32
      IOPS=3.43M, BW=13.39GiB/s, IOS/call=32/31
      IOPS=3.43M, BW=13.38GiB/s, IOS/call=32/31
      
      or about an 8x in difference. Now that brd is prepared to deal with
      REQ_NOWAIT reads/writes, mark it as supporting that.
      
      Cc: stable@vger.kernel.org # 5.10+
      Link: https://lore.kernel.org/linux-block/20230203103005.31290-1-p.raghav@samsung.com/Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      67205f80
    • Jens Axboe's avatar
      brd: check for REQ_NOWAIT and set correct page allocation mask · 6ded703c
      Jens Axboe authored
      If REQ_NOWAIT is set, then do a non-blocking allocation if the operation
      is a write and we need to insert a new page. Currently REQ_NOWAIT cannot
      be set as the queue isn't marked as supporting nowait, this change is in
      preparation for allowing that.
      
      radix_tree_preload() warns on attempting to call it with an allocation
      mask that doesn't allow blocking. While that warning could arguably
      be removed, we need to handle radix insertion failures anyway as they
      are more likely if we cannot block to get memory.
      
      Remove legacy BUG_ON()'s and turn them into proper errors instead, one
      for the allocation failure and one for finding a page that doesn't
      match the correct index.
      
      Cc: stable@vger.kernel.org # 5.10+
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6ded703c
    • Jens Axboe's avatar
      brd: return 0/-error from brd_insert_page() · db0ccc44
      Jens Axboe authored
      It currently returns a page, but callers just check for NULL/page to
      gauge success. Clean this up and return the appropriate error directly
      instead.
      
      Cc: stable@vger.kernel.org # 5.10+
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      db0ccc44
    • Ming Lei's avatar
      block: sync mixed merged request's failfast with 1st bio's · 3ce6a115
      Ming Lei authored
      We support mixed merge for requests/bios with different fastfail
      settings. When request fails, each time we only handle the portion
      with same failfast setting, then bios with failfast can be failed
      immediately, and bios without failfast can be retried.
      
      The idea is pretty good, but the current implementation has several
      defects:
      
      1) initially RA bio doesn't set failfast, however bio merge code
      doesn't consider this point, and just check its failfast setting for
      deciding if mixed merge is required. Fix this issue by adding helper
      of bio_failfast().
      
      2) when merging bio to request front, if this request is mixed
      merged, we have to sync request's faifast setting with 1st bio's
      failfast. Fix it by calling blk_update_mixed_merge().
      
      3) when merging bio to request back, if this request is mixed
      merged, we have to mark the bio as failfast, because blk_update_request
      simply updates request failfast with 1st bio's failfast. Fix
      it by calling blk_update_mixed_merge().
      
      Fixes one normal EXT4 READ IO failure issue, because it is observed
      that the normal READ IO is merged with RA IO, and the mixed merged
      request has different failfast setting with 1st bio's, so finally
      the normal READ IO doesn't get retried.
      
      Cc: Tejun Heo <tj@kernel.org>
      Fixes: 80a761fd ("block: implement mixed merge of different failfast requests")
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20230209125527.667004-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3ce6a115
  2. 15 Feb, 2023 1 commit
  3. 14 Feb, 2023 7 commits
  4. 13 Feb, 2023 1 commit
  5. 10 Feb, 2023 3 commits
  6. 09 Feb, 2023 4 commits
  7. 08 Feb, 2023 3 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · a872818f
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.3/block
      
      Pull MD fix from Song:
      
      "This commit fixes a rare crash during the takeover process."
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md: account io_acct_set usage with active_io
      a872818f
    • Xiao Ni's avatar
      md: account io_acct_set usage with active_io · 76fed014
      Xiao Ni authored
      io_acct_set was enabled for raid0/raid5 io accounting. bios that contain
      md_io_acct are allocated in the i/o path. There isn't a good method to
      monitor if these bios are all finished and freed. In the takeover process,
      io_acct_set (which is used for bios with md_io_acct) need to be freed.
      However, if some bios finish after io_acct_set is freed, it may trigger
      the following panic:
      
      [ 6973.767999] RIP: 0010:mempool_free+0x52/0x80
      [ 6973.786098] Call Trace:
      [ 6973.786549]  md_end_io_acct+0x31/0x40
      [ 6973.787227]  blk_update_request+0x224/0x380
      [ 6973.787994]  blk_mq_end_request+0x1a/0x130
      [ 6973.788739]  blk_complete_reqs+0x35/0x50
      [ 6973.789456]  __do_softirq+0xd7/0x2c8
      [ 6973.790114]  ? sort_range+0x20/0x20
      [ 6973.790763]  run_ksoftirqd+0x2a/0x40
      [ 6973.791400]  smpboot_thread_fn+0xb5/0x150
      [ 6973.792114]  kthread+0x10b/0x130
      [ 6973.792724]  ? set_kthread_struct+0x50/0x50
      [ 6973.793491]  ret_from_fork+0x1f/0x40
      
      Fix this by increasing and decreasing active_io for each bio with
      md_io_acct so that mddev_suspend() will wait until all bios from
      io_acct_set finish before freeing io_acct_set.
      Reported-by: default avatarFine Fan <ffan@redhat.com>
      Signed-off-by: default avatarXiao Ni <xni@redhat.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      76fed014
    • Ming Lei's avatar
      block: ublk: improve handling device deletion · 0abe39de
      Ming Lei authored
      Inside ublk_ctrl_del_dev(), when the device is removed, we wait
      until the device number is freed with holding global lock of
      ublk_ctl_mutex, this way isn't friendly from user viewpoint:
      
      1) if device is in-use, the current delete command hangs in
      ublk_ctrl_del_dev(), and user can't break from the handling
      because wait_event() is used
      
      2) global lock is held, so any new device can't be added and
      other old devices can't be removed.
      
      Improve the deleting handling by the following way, suggested by
      Nadav:
      
      1) wait without holding the global lock
      
      2) replace wait_event() with wait_event_interruptible()
      Reported-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Suggested-by: default avatarNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20230207150700.545530-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0abe39de
  8. 07 Feb, 2023 5 commits
  9. 06 Feb, 2023 12 commits