• Lars Ellenberg's avatar
    drbd: fix for possible deadlock on IO error during resync · e9e6f3ec
    Lars Ellenberg authored
    Scenario:
    
    Something (say, flush-147:0) is in drbd_al_begin_io,
    holding a local_cnt, waiting for the resync to make progress.
    
    Disk fails, worker in after_state_ch does drbd_rs_cancel_all,
    then waits for local_cnt to drop to zero.
    
    flush-147:0 is woken by drbd_rs_cancel_all, needs to write an AL
    transaction, and queues that on the worker.
    
    Deadlock.
    
    Fix: do not wait in the worker, have put_ldev() trigger the
    state change D_FAILED -> D_DISKLESS when necessary.
    put_ldev() cannot do the state change directly, as it may or may not
    already hold various spinlocks. We queue a short work instead.
    Signed-off-by: default avatarPhilipp Reisner <philipp.reisner@linbit.com>
    Signed-off-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
    e9e6f3ec
drbd_main.c 105 KB