1. 16 Jun, 2024 5 commits
    • Jiapeng Chong's avatar
      bdev: make blockdev_mnt static · d9c23321
      Jiapeng Chong authored
      The blockdev_mnt are not used outside the file bdev.c, so the modification
      is defined as static.
      
      block/bdev.c:377:17: warning: symbol 'blockdev_mnt' was not declared. Should it be static?
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      jpg: Remove closes bugzilla link
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Signed-off-by: default avatarJohn Garry <john.g.garry@oracle.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Fixes: 8f3a608827d1 ("bdev: open block device as files")
      Tested-by: default avatarJohn Garry <john.g.garry@oracle.com>
      Link: https://lore.kernel.org/r/20240614090345.655716-2-john.g.garry@oracle.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d9c23321
    • Damien Le Moal's avatar
      dm: Remove unused macro DM_ZONE_INVALID_WP_OFST · eaa3706f
      Damien Le Moal authored
      With the switch to using the zone append emulation of the block layer
      zone write plugging, the macro DM_ZONE_INVALID_WP_OFST is no longer used
      in dm-zone.c. Remove its definition.
      
      Fixes: f211268e ("dm: Use the block layer zone append emulation")
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Reviewed-by: default avatarNiklas Cassel <cassel@kernel.org>
      Link: https://lore.kernel.org/r/20240611023639.89277-5-dlemoal@kernel.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      eaa3706f
    • Damien Le Moal's avatar
      dm: Improve zone resource limits handling · 73a74af0
      Damien Le Moal authored
      The generic stacking of limits implemented in the block layer cannot
      correctly handle stacking of zone resource limits (max open zones and
      max active zones) because these limits are for an entire device but the
      stacking may be for a portion of that device (e.g. a dm-linear target
      that does not cover an entire block device). As a result, when DM
      devices are created on top of zoned block devices, the DM device never
      has any zone resource limits advertized, which is only correct if all
      underlying target devices also have no zone resource limits.
      If at least one target device has resource limits, the user may see
      either performance issues (if the max open zone limit of the device is
      exceeded) or write I/O errors if the max active zone limit of one of
      the underlying target devices is exceeded.
      
      While it is very difficult to correctly and reliably stack zone resource
      limits in general, cases where targets are not sharing zone resources of
      the same device can be dealt with relatively easily. Such situation
      happens when a target maps all sequential zones of a zoned block device:
      for such mapping, other targets mapping other parts of the same zoned
      block device can only contain conventional zones and thus will not
      require any zone resource to correctly handle write operations.
      
      For a mapped device constructed with such targets, which includes mapped
      devices constructed with targets mapping entire zoned block devices, the
      zone resource limits can be reliably determined using the non-zero
      minimum of the zone resource limits of all targets.
      
      For mapped devices that include targets partially mapping the set of
      sequential write required zones of zoned block devices, instead of
      advertizing no zone resource limits, it is also better to set the mapped
      device limits to the non-zero minimum of the limits of all targets. In
      this case the limits for a target depend on the number of sequential
      zones being mapped: if this number of zone is larger than the limits,
      then the limits of the device apply and can be used. If on the other
      hand the target maps a number of zones smaller than the limits, then no
      limits is needed and we can assume that the target has no limits (limits
      set to 0).
      
      This commit improves zone resource limits handling as described above
      by modifying dm_set_zones_restrictions() to iterate the targets of a
      mapped device to evaluate the max open and max active zone limits. This
      relies on an internal "stacking" of the limits of the target devices
      combined with a direct counting of the number of sequential zones
      mapped by the targets.
      1) For a target mapping an entire zoned block device, the limits for the
         target are set to the limits of the device.
      2) For a target partially mapping a zoned block device, the number of
         mapped sequential zones is used to determine the limits: if the
         target maps more sequential write required zones than the device
         limits, then the limits of the device are used as-is. If the number
         of mapped sequential zones is lower than the limits, then we assume
         that the target has no limits (limits set to 0).
      As this evaluation is done for each target, the zone resource limits
      for the mapped device are evaluated as the non-zero minimum of the
      limits of all the targets.
      
      For configurations resulting in unreliable limits, i.e. a table
      containing a target partially mapping a zoned device, a warning message
      is issued.
      
      The counting of mapped sequential zones for the target is done using the
      new function dm_device_count_zones() which performs a report zones on
      the entire block device with the callback dm_device_count_zones_cb().
      This count of mapped sequential zones is also used to determine if the
      mapped device contains only conventional zones. This allows simplifying
      dm_set_zones_restrictions() to not do a report zones just for this.
      For mapped devices mapping only conventional zones, as before, the
      mapped device is changed to a regular device by setting its zoned limit
      to false and clearing all its zone related limits.
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Reviewed-by: default avatarNiklas Cassel <cassel@kernel.org>
      Link: https://lore.kernel.org/r/20240611023639.89277-4-dlemoal@kernel.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      73a74af0
    • Damien Le Moal's avatar
      dm: Call dm_revalidate_zones() after setting the queue limits · 7f91ccd8
      Damien Le Moal authored
      dm_revalidate_zones() is called from dm_set_zone_restrictions() when the
      mapped device queue limits are not yet set. However,
      dm_revalidate_zones() calls blk_revalidate_disk_zones() and this
      function consults and modifies the mapped device queue limits. Thus,
      currently, blk_revalidate_disk_zones() operates on limits that are not
      yet initialized.
      
      Fix this by moving the call to dm_revalidate_zones() out of
      dm_set_zone_restrictions() and into dm_table_set_restrictions() after
      executing queue_limits_set().
      
      To further cleanup dm_set_zones_restrictions(), the message about the
      type of zone append (native or emulated) is also moved inside
      dm_revalidate_zones().
      
      Fixes: 1c0e7202 ("dm: use queue_limits_set")
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Reviewed-by: default avatarNiklas Cassel <cassel@kernel.org>
      Link: https://lore.kernel.org/r/20240611023639.89277-3-dlemoal@kernel.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7f91ccd8
    • Damien Le Moal's avatar
      block: Improve checks on zone resource limits · e21d12c7
      Damien Le Moal authored
      Make sure that the zone resource limits of a zoned block device are
      correct by checking that:
      (a) If the device has a max active zones limit, make sure that the max
          open zones limit is lower than the max active zones limit.
      (b) If the device has zone resource limits, check that the limits
          values are lower than the number of sequential zones of the device.
          If it is not, assume that the zoned device has no limits by setting
          the limits to 0.
      
      For (a), a check is added to blk_validate_zoned_limits() and an error
      returned if the max open zones limit exceeds the value of the max active
      zone limit (if there is one).
      
      For (b), given that we need the number of sequential zones of the zoned
      device, this check is added to disk_update_zone_resources(). This is
      safe to do as that function is executed with the disk queue frozen and
      the check executed after queue_limits_start_update() which takes the
      queue limits lock. Of note is that the early return in this function
      for zoned devices that do not use zone write plugging (e.g. DM devices
      using native zone append) is moved to after the new check and adjustment
      of the zone resource limits so that the check applies to any zoned
      device.
      Signed-off-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: default avatarNiklas Cassel <cassel@kernel.org>
      Reviewed-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Link: https://lore.kernel.org/r/20240611023639.89277-2-dlemoal@kernel.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e21d12c7
  2. 15 Jun, 2024 1 commit
  3. 14 Jun, 2024 30 commits
  4. 12 Jun, 2024 4 commits
    • Jens Axboe's avatar
      Merge tag 'md-6.11-20240612' of... · c2670cf7
      Jens Axboe authored
      Merge tag 'md-6.11-20240612' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.11/block
      
      Pull MD updates from Song:
      
      "The major changes in this PR are:
      
       - sync_action fix and refactoring, by Yu Kuai;
       - Various small fixes by Christoph Hellwig, Li Nan, and Ofir Gal."
      
      * tag 'md-6.11-20240612' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md/raid5: avoid BUG_ON() while continue reshape after reassembling
        md: pass in max_sectors for pers->sync_request()
        md: factor out helpers for different sync_action in md_do_sync()
        md: replace last_sync_action with new enum type
        md: use new helpers in md_do_sync()
        md: don't fail action_store() if sync_thread is not registered
        md: remove parameter check_seq for stop_sync_thread()
        md: replace sysfs api sync_action with new helpers
        md: factor out helper to start reshape from action_store()
        md: add new helpers for sync_action
        md: add a new enum type sync_action
        md: rearrange recovery_flags
        md/md-bitmap: fix writing non bitmap pages
        md/raid1: don't free conf on raid0_run failure
        md/raid0: don't free conf on raid0_run failure
        md: make md_flush_request() more readable
        md: fix deadlock between mddev_suspend and flush bio
        md: change the return value type of md_write_start to void
        md: do not delete safemode_timer in mddev_suspend
      c2670cf7
    • Yu Kuai's avatar
      md/raid5: avoid BUG_ON() while continue reshape after reassembling · 305a5170
      Yu Kuai authored
      Currently, mdadm support --revert-reshape to abort the reshape while
      reassembling, as the test 07revert-grow. However, following BUG_ON()
      can be triggerred by the test:
      
      kernel BUG at drivers/md/raid5.c:6278!
      invalid opcode: 0000 [#1] PREEMPT SMP PTI
      irq event stamp: 158985
      CPU: 6 PID: 891 Comm: md0_reshape Not tainted 6.9.0-03335-g7592a0b0049a #94
      RIP: 0010:reshape_request+0x3f1/0xe60
      Call Trace:
       <TASK>
       raid5_sync_request+0x43d/0x550
       md_do_sync+0xb7a/0x2110
       md_thread+0x294/0x2b0
       kthread+0x147/0x1c0
       ret_from_fork+0x59/0x70
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      
      Root cause is that --revert-reshape update the raid_disks from 5 to 4,
      while reshape position is still set, and after reassembling the array,
      reshape position will be read from super block, then during reshape the
      checking of 'writepos' that is caculated by old reshape position will
      fail.
      
      Fix this panic the easy way first, by converting the BUG_ON() to
      WARN_ON(), and stop the reshape if checkings fail.
      
      Noted that mdadm must fix --revert-shape as well, and probably md/raid
      should enhance metadata validation as well, however this means
      reassemble will fail and there must be user tools to fix the wrong
      metadata.
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20240611132251.1967786-13-yukuai1@huaweicloud.com
      305a5170
    • Yu Kuai's avatar
      md: pass in max_sectors for pers->sync_request() · bc49694a
      Yu Kuai authored
      For different sync_action, sync_thread will use different max_sectors,
      see details in md_sync_max_sectors(), currently both md_do_sync() and
      pers->sync_request() in eatch iteration have to get the same
      max_sectors. Hence pass in max_sectors for pers->sync_request() to
      prevent redundant code.
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20240611132251.1967786-12-yukuai1@huaweicloud.com
      bc49694a
    • Yu Kuai's avatar
      md: factor out helpers for different sync_action in md_do_sync() · bbf20762
      Yu Kuai authored
      Make code cleaner by replacing if else if with switch, and it's more
      obvious now what is doing for each sync_action. There are no
      functional changes.
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Link: https://lore.kernel.org/r/20240611132251.1967786-11-yukuai1@huaweicloud.com
      bbf20762