• Lars Ellenberg's avatar
    drbd: fix potential distributed deadlock during verify or resync · 0e49d7b0
    Lars Ellenberg authored
    If max-buffers and socket buffer sizes are "too small" for the chosen
    resync rate, this could lead potentially lead to a distributed deadlock,
    which may or may not resolve itself via the "ko-count" and request
    timeout mechanism, or could be resolved by forced disconnect.
    
    One option to deal with this is proper configuration:
    use larger max-buffer and socket buffers settings,
    or reduce the resync rate.
    
    But even with bad configuration we should not deadlock,
    but "gracefully" recover.
    
    The issue is avoided by using only up to max-buffers/2 for resync
    requests, and by using max-buffers not as a hard limit for data buffer
    allocations, but as a throttle threshold only.
    Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
    Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
    Signed-off-by: default avatarJens Axboe <axboe@fb.com>
    0e49d7b0
drbd_worker.c 56.8 KB