1. 11 Apr, 2023 4 commits
  2. 04 Apr, 2023 7 commits
    • Mike Snitzer's avatar
      dm integrity: call kmem_cache_destroy() in dm_integrity_init() error path · 6b79a428
      Mike Snitzer authored
      Otherwise the journal_io_cache will leak if dm_register_target() fails.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      6b79a428
    • Mike Snitzer's avatar
      dm clone: call kmem_cache_destroy() in dm_clone_init() error path · 6827af4a
      Mike Snitzer authored
      Otherwise the _hydration_cache will leak if dm_register_target() fails.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      6827af4a
    • Mikulas Patocka's avatar
      dm error: add discard support · b6bcb844
      Mikulas Patocka authored
      Add io_err_io_hints() and set discard limits so that the zero target
      advertises support for discards.
      
      The error target will return -EIO for discards.
      
      This is useful when the user combines dm-error with other
      discard-supporting targets in the same table; without dm-error
      support, discards would be disabled for the whole combined device.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Tested-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      b6bcb844
    • Mikulas Patocka's avatar
      dm zero: add discard support · 00065f92
      Mikulas Patocka authored
      Add zero_io_hints() and set discard limits so that the zero target
      advertises support for discards.
      
      The zero target will ignore discards.
      
      This is useful when the user combines dm-zero with other
      discard-supporting targets in the same table; without dm-zero support,
      discards would be disabled for the whole combined device.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Tested-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      00065f92
    • Mikulas Patocka's avatar
      dm table: allow targets without devices to set ->io_hints · 85c938e8
      Mikulas Patocka authored
      In dm_calculate_queue_limits, add call to ->io_hints hook if the
      target doesn't provide ->iterate_devices.
      
      This is needed so the "error" and "zero" targets may support
      discards. The 2 following commits will add their respective discard
      support.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Tested-by: default avatarMilan Broz <gmazyland@gmail.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      85c938e8
    • Michael Weiß's avatar
      dm verity: emit audit events on verification failure and more · 074c4466
      Michael Weiß authored
      dm-verity signals integrity violations by returning I/O errors
      to user space. To identify integrity violations by a controlling
      instance, the kernel audit subsystem can be used to emit audit
      events to user space. Analogous to dm-integrity, we also use the
      dm-audit submodule allowing to emit audit events on verification
      failures of metadata and data blocks as well as if max corrupted
      errors are reached.
      
      The construction and destruction of verity device mappings are
      also relevant for auditing a system. Thus, those events are also
      logged as audit events.
      
      Tested by starting a container with the container manager (cmld) of
      GyroidOS which uses a dm-verity protected rootfs image root.img mapped
      to /dev/mapper/<uuid>-root. One block was manipulated in the
      underlying image file and repeated reads of the verity device were
      performed again until the max corrupted errors is reached, e.g.:
      
        dd if=/dev/urandom of=root.img bs=512 count=1 seek=1000
        for i in range {1..101}; do \
          dd if=/dev/mapper/<uuid>-root of=/dev/null bs=4096 \
             count=1 skip=1000 \
        done
      
      The resulting audit log looks as follows:
      
        type=DM_CTRL msg=audit(1677618791.876:962):
          module=verity op=ctr ppid=4876 pid=29102 auid=0 uid=0 gid=0
          euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=44
          comm="cmld" exe="/usr/sbin/cml/cmld" subj=unconfined
          dev=254:3 error_msg='success' res=1
      
        type=DM_EVENT msg=audit(1677619463.786:1074): module=verity
          op=verify-data dev=7:0 sector=1000 res=0
        ...
        type=DM_EVENT msg=audit(1677619596.727:1162): module=verity
          op=verify-data dev=7:0 sector=1000 res=0
      
        type=DM_EVENT msg=audit(1677619596.731:1163): module=verity
          op=max-corrupted-errors dev=254:3 sector=? res=0
      Signed-off-by: default avatarMichael Weiß <michael.weiss@aisec.fraunhofer.de>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      074c4466
    • Yeongjin Gil's avatar
      dm verity: fix error handling for check_at_most_once on FEC · e8c5d45f
      Yeongjin Gil authored
      In verity_end_io(), if bi_status is not BLK_STS_OK, it can be return
      directly. But if FEC configured, it is desired to correct the data page
      through verity_verify_io. And the return value will be converted to
      blk_status and passed to verity_finish_io().
      
      BTW, when a bit is set in v->validated_blocks, verity_verify_io() skips
      verification regardless of I/O error for the corresponding bio. In this
      case, the I/O error could not be returned properly, and as a result,
      there is a problem that abnormal data could be read for the
      corresponding block.
      
      To fix this problem, when an I/O error occurs, do not skip verification
      even if the bit related is set in v->validated_blocks.
      
      Fixes: 843f38d3 ("dm verity: add 'check_at_most_once' option to only validate hashes once")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Signed-off-by: default avatarYeongjin Gil <youngjin.gil@samsung.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      e8c5d45f
  3. 30 Mar, 2023 23 commits
    • Joe Thornber's avatar
      dm: improve hash_locks sizing and hash function · 363b7fd7
      Joe Thornber authored
      Both bufio and bio-prison-v1 use the identical model for splitting
      their respective locks and rbtrees. Improve dm_num_hash_locks() to
      distribute across more rbtrees to improve overall performance -- but
      the maximum number of locks/rbtrees is still 64.
      
      Also factor out a common hash function named dm_hash_locks_index(),
      the magic numbers used were determined to be best using this program:
       https://gist.github.com/jthornber/e05c47daa7b500c56dc339269c5467fcSigned-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      363b7fd7
    • Mike Snitzer's avatar
      dm bio prison v1: intelligently size dm_bio_prison's prison_regions · b6279f82
      Mike Snitzer authored
      Size the dm_bio_prison's number of prison_region structs using
      dm_num_hash_locks().
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      b6279f82
    • Mike Snitzer's avatar
      dm bio prison v1: prepare to intelligently size dm_bio_prison's prison_regions · c6273411
      Mike Snitzer authored
      Add num_locks member to dm_bio_prison struct and use it rather than
      the NR_LOCKS magic value (64).
      
      Next commit will size the dm_bio_prison's prison_regions according to
      dm_num_hash_locks().
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      c6273411
    • Mike Snitzer's avatar
      dm bufio: intelligently size dm_buffer_cache's buffer_trees · 1e84c4b7
      Mike Snitzer authored
      Size the dm_buffer_cache's number of buffer_tree structs using
      dm_num_hash_locks().
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      1e84c4b7
    • Mike Snitzer's avatar
      dm bufio: prepare to intelligently size dm_buffer_cache's buffer_trees · 36c18b86
      Mike Snitzer authored
      Add num_locks member to dm_buffer_cache struct and use it rather than
      the NR_LOCKS magic value (64).
      
      Next commit will size the dm_buffer_cache's buffer_trees according to
      dm_num_hash_locks().
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      36c18b86
    • Mike Snitzer's avatar
      dm: add dm_num_hash_locks() · 0bac3f2f
      Mike Snitzer authored
      Simple helper to use when DM core code needs to appropriately size,
      based on num_online_cpus(), its data structures that split locks.
      
      dm_num_hash_locks() rounds up num_online_cpus() to next power of 2
      but caps return at DM_HASH_LOCKS_MAX (64).
      
      This heuristic may evolve as warranted, but as-is it will serve as a
      more informed basis for sizing the sharded lock structs in dm-bufio's
      dm_buffer_cache (buffer_trees) and dm-bio-prison-v1's dm_bio_prison
      (prison_regions).
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      0bac3f2f
    • Mike Snitzer's avatar
      dm bio prison v1: add dm_cell_key_has_valid_range · 3f8d3f54
      Mike Snitzer authored
      Don't have bio_detain() BUG_ON if a dm_cell_key is beyond
      BIO_PRISON_MAX_RANGE or spans a boundary.
      
      Update dm-thin.c:build_key() to use dm_cell_key_has_valid_range() which
      will do this checking without using BUG_ON. Also update
      process_discard_bio() to check the discard bio that DM core passes in
      (having first imposed max_discard_granularity based splitting).
      
      dm_cell_key_has_valid_range() will merely WARN_ON_ONCE if it returns
      false because if it does: it is programmer error that should be caught
      with proper testing. So relax the BUG_ONs to be WARN_ON_ONCE.
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      3f8d3f54
    • Joe Thornber's avatar
      dm bio prison v1: improve concurrent IO performance · e2dd8aca
      Joe Thornber authored
      Split the bio prison into multiple regions, with a separate rbtree and
      associated lock for each region.
      
      To get fast bio prison locking and not damage the performance of
      discards too much the bio-prison now stipulates that discards should
      not cross a BIO_PRISON_MAX_RANGE boundary.
      
      Because the range of a key (block_end - block_begin) must not exceed
      BIO_PRISON_MAX_RANGE: break_up_discard_bio() now ensures the data
      range reflected in PHYSICAL key doesn't exceed BIO_PRISON_MAX_RANGE.
      And splitting the thin target's discards (handled with VIRTUAL key) is
      achieved by updating dm-thin.c to set limits->max_discard_sectors in
      terms of BIO_PRISON_MAX_RANGE _and_ setting the thin and thin-pool
      targets' max_discard_granularity to true.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      e2dd8aca
    • Mike Snitzer's avatar
      dm: split discards further if target sets max_discard_granularity · 06961c48
      Mike Snitzer authored
      The block core (bio_split_discard) will already split discards based
      on the 'discard_granularity' and 'max_discard_sectors' queue_limits.
      But the DM thin target also needs to ensure that it doesn't receive a
      discard that spans a 'max_discard_sectors' boundary.
      
      Introduce a dm_target 'max_discard_granularity' flag that if set will
      cause DM core to split discard bios relative to 'max_discard_sectors'.
      This treats 'discard_granularity' as a "min_discard_granularity" and
      'max_discard_sectors' as a "max_discard_granularity".
      Requested-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      06961c48
    • Joe Thornber's avatar
      dm thin: speed up cell_defer_no_holder() · bb46c561
      Joe Thornber authored
      Reduce the time that a spinlock is held in cell_defer_no_holder().
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      bb46c561
    • Mikulas Patocka's avatar
      dm bufio: use multi-page bio vector · 56c5de44
      Mikulas Patocka authored
      The kernel supports multi page bio vector entries, so we can use them
      in dm-bufio as an optimization.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      56c5de44
    • Mikulas Patocka's avatar
      dm bufio: use waitqueue_active in __free_buffer_wake · f5f93541
      Mikulas Patocka authored
      Save one spinlock by using waitqueue_active. We hold the bufio lock at
      this place, so no one can add entries to the waitqueue at this point.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      f5f93541
    • Mike Snitzer's avatar
      dm bufio: move dm_bufio_client members to avoid spanning cachelines · 530f683d
      Mike Snitzer authored
      Movement also consolidates holes in dm_bufio_client struct. But the
      overall size of the struct isn't changed.
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      530f683d
    • Joe Thornber's avatar
      dm bufio: add lock_history optimization for cache iterators · 79118806
      Joe Thornber authored
      Sometimes it is beneficial to repeatedly get and drop locks as part of
      an iteration.  Introduce lock_history struct to help avoid redundant
      drop and gets of the same lock.
      
      Optimizes cache_iterate, cache_mark_many and cache_evict.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      79118806
    • Joe Thornber's avatar
      dm bufio: improve concurrent IO performance · 450e8dee
      Joe Thornber authored
      When multiple threads perform IO to a thin device, the underlying
      dm_bufio object can become a bottleneck; slowing down access to btree
      nodes that store the thin metadata. Prior to this commit, each bufio
      instance had a single mutex that was taken for every bufio operation.
      
      This commit concentrates on improving the common case where: a user of
      dm_bufio wishes to access, but not modify, a buffer which is already
      within the dm_bufio cache.
      
      Implementation::
      
        The code has been refactored; pulling out an 'lru' abstraction and a
        'buffer cache' abstraction (see 2 previous commits). This commit
        updates higher level bufio code (that performs allocation of buffers,
        IO and eviction/cache sizing) to leverage both abstractions. It also
        deals with the delicate locking requirements of both abstractions to
        provide finer grained locking. The result is significantly better
        concurrent IO performance.
      
        Before this commit, bufio has a global lru list it used to evict the
        oldest, clean buffers from _all_ clients. With the new locking we
        don’t want different ways to access the same buffer, so instead
        do_global_cleanup() loops around the clients asking them to free
        buffers older than a certain time.
      
        This commit also converts many old BUG_ONs to WARN_ON_ONCE, see the
        lru_evict and cache_evict code in particular.  They will return
        ER_DONT_EVICT if a given buffer somehow meets the invariants that
        should _never_ happen. [Aside from revising this commit's header and
        fixing coding style and whitespace nits: this switching to
        WARN_ON_ONCE is Mike Snitzer's lone contribution to this commit]
      
      Testing::
      
        Some of the low level functions have been unit tested using dm-unit:
          https://github.com/jthornber/dm-unit/blob/main/src/tests/bufio.rs
      
        Higher level concurrency and IO is tested via a test only target
        found here:
          https://github.com/jthornber/linux/blob/2023-03-24-thin-concurrency-9/drivers/md/dm-bufio-test.c
      
        The associated userland side of these tests is here:
          https://github.com/jthornber/dmtest-python/blob/main/src/dmtest/bufio/bufio_tests.py
      
        In addition the full dmtest suite of tests (dm-thin, dm-cache, etc)
        has been run (~450 tests).
      
      Performance::
      
        Most bufio operations have unchanged performance. But if multiple
        threads are attempting to get buffers concurrently, and these
        buffers are already in the cache then there's a big speed up. Eg,
        one test has 16 'hotspot' threads simulating btree lookups while
        another thread dirties the whole device. In this case the hotspot
        threads acquire the buffers about 25 times faster.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      450e8dee
    • Joe Thornber's avatar
      dm bufio: add dm_buffer_cache abstraction · 2cd7a6d4
      Joe Thornber authored
      The buffer cache is responsible for managing the holder count,
      tracking clean/dirty state, and choosing buffers via predicates.
      Higher level code is responsible for allocation of buffers, IO and
      eviction/cache sizing.
      
      The buffer cache has thread safe methods for acquiring a reference
      to an existing buffer. All other methods in buffer cache are _not_
      threadsafe, and only contain enough locking to guarantee the safe
      methods.
      
      Rather than a single mutex, sharded rw_semaphores are used to allow
      concurrent threads to 'get' buffers. Each rw_semaphore protects its
      own rbtree of buffer entries.
      
      Code that uses this new dm_buffer_cache abstraction will be introduced
      in a following commit.
      
      This commit moves the dm_buffer struct in preparation for finer grained
      dm_buffer changes, in the next commit, to be more easily seen. It also
      introduces temporary dm_buffer struct members to allow compilation of
      this intermediate commit (they will be elided in the next commit).
      
      This commit will cause "defined but not used" compiler warnings that
      will be resolved by the next commit.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      2cd7a6d4
    • Joe Thornber's avatar
      dm bufio: add LRU abstraction · be845bab
      Joe Thornber authored
      A CLOCK algorithm is used in this LRU abstraction.  This avoids
      relinking list nodes, which would require a write lock protecting it.
      
      None of the LRU methods are threadsafe; locking must be done at a
      higher level.
      
      Code that uses this new LRU will be introduced in the next 2 commits.
      
      As such, this commit will cause "defined but not used" compiler warnings
      that will be resolved by the next 2 commits.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      be845bab
    • Mike Snitzer's avatar
      dm bufio: don't bug for clear developer oversight · b75a80f4
      Mike Snitzer authored
      Reasonable to relax to WARN_ON because these are easily avoided but do
      offer some assurance future coding mistakes won't occur (if changes
      tested properly).
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      b75a80f4
    • Mike Snitzer's avatar
      dm bufio: never crash if dm_bufio_in_request() · 05112287
      Mike Snitzer authored
      All these instances are entirely avoidable given that they speak to
      coding mistakes that result in inappropriate use. Proper testing during
      development will catch any such coding bug so its best to relax all of
      these from BUG_ON to WARN_ON_ONCE.
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      05112287
    • Mike Snitzer's avatar
      dm bufio: use WARN_ON in dm_bufio_client_destroy and dm_bufio_exit · 555977dd
      Mike Snitzer authored
      Using BUG_ON when tearing down is excessive. Relax these to WARN_ONs.
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      555977dd
    • Joe Thornber's avatar
      dm bufio: remove unused dm_bufio_release_move interface · 96a2ff2a
      Joe Thornber authored
      Was used by multi-snapshot DM target that never went upstream.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Acked-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      96a2ff2a
    • Mike Snitzer's avatar
      dm: fix __send_duplicate_bios() to always allow for splitting IO · 666eed46
      Mike Snitzer authored
      Commit 7dd76d1f ("dm: improve bio splitting and associated IO
      accounting") only called setup_split_accounting() from
      __send_duplicate_bios() if a single bio were being issued. But the case
      where duplicate bios are issued must call it too.
      
      Otherwise the bio won't be split and resubmitted (via recursion through
      block core back to DM) to submit the later portions of a bio (which may
      map to an entirely different target).
      
      For example, when discarding an entire DM striped device with the
      following DM table:
       vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
       vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
      
      Before (broken, discards the first striped target's devices twice):
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2049 len=22528
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=22528
      
      After (works as expected):
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
       device-mapper: striped: target_stripe=0, bdev=7:2, start=2048 len=22528
       device-mapper: striped: target_stripe=1, bdev=7:3, start=2048 len=22528
      
      Fixes: 7dd76d1f ("dm: improve bio splitting and associated IO accounting")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarOrange Kao <orange@aiven.io>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      666eed46
    • Mike Snitzer's avatar
      dm: fix improper splitting for abnormal bios · f7b58a69
      Mike Snitzer authored
      "Abnormal" bios include discards, write zeroes and secure erase. By no
      longer passing the calculated 'len' pointer, commit 7dd06a25 ("dm:
      allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
      senseless approach to disallowing dm_accept_partial_bio() from working
      for duplicate bios processed using __send_duplicate_bios().
      
      It inadvertently and incorrectly stopped the use of 'len' when
      initializing a target's io (in alloc_tio). As such the resulting tio
      could address more area of a device than it should.
      
      For example, when discarding an entire DM striped device with the
      following DM table:
       vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
       vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
      
      Before this fix:
      
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
       blkdiscard: attempt to access beyond end of device
       loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
      
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
       blkdiscard: attempt to access beyond end of device
       loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
      
      After this fix;
      
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
      
      Fixes: 7dd06a25 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarOrange Kao <orange@aiven.io>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      f7b58a69
  4. 26 Mar, 2023 6 commits