• Lars Ellenberg's avatar
    drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire") · f31e583a
    Lars Ellenberg authored
    And also re-enable partial-zero-out + discard aligned.
    
    With the introduction of REQ_OP_WRITE_ZEROES,
    we started to use that for both WRITE_ZEROES and DISCARDS,
    hoping that WRITE_ZEROES would "do what we want",
    UNMAP if possible, zero-out the rest.
    
    The example scenario is some LVM "thin" backend.
    
    While an un-allocated block on dm-thin reads as zeroes, on a dm-thin
    with "skip_block_zeroing=true", after a partial block write allocated
    that block, that same block may well map "undefined old garbage" from
    the backends on LBAs that have not yet been written to.
    
    If we cannot distinguish between zero-out and discard on the receiving
    side, to avoid "undefined old garbage" to pop up randomly at later times
    on supposedly zero-initialized blocks, we'd need to map all discards to
    zero-out on the receiving side.  But that would potentially do a full
    alloc on thinly provisioned backends, even when the expectation was to
    unmap/trim/discard/de-allocate.
    
    We need to distinguish on the protocol level, whether we need to guarantee
    zeroes (and thus use zero-out, potentially doing the mentioned full-alloc),
    or if we want to put the emphasis on discard, and only do a "best effort
    zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing
    only potential unaligned head and tail clippings to at least *try* to
    avoid "false positives" in an online-verify later), hoping that someone
    set skip_block_zeroing=false.
    
    For some discussion regarding this on dm-devel, see also
    https://www.mail-archive.com/dm-devel%40redhat.com/msg07965.html
    https://www.redhat.com/archives/dm-devel/2018-January/msg00271.html
    
    For backward compatibility, P_TRIM means zero-out, unless the
    DRBD_FF_WZEROES feature flag is agreed upon during handshake.
    
    To have upper layers even try to submit WRITE ZEROES requests,
    we need to announce "efficient zeroout" independently.
    
    We need to fixup max_write_zeroes_sectors after blk_queue_stack_limits():
    if we can handle "zeroes" efficiently on the protocol,
    we want to do that, even if our backend does not announce
    max_write_zeroes_sectors itself.
    Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    f31e583a
drbd_main.c 112 KB