• Alex Elder's avatar
    rbd: reset BACKOFF if unable to re-queue · 588377d6
    Alex Elder authored
    If ceph_fault() is unable to queue work after a delay, it sets the
    BACKOFF connection flag so con_work() will attempt to do so.
    
    In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't
    result in newly-queued work, it simply ignores this condition and
    proceeds as if no backoff delay were desired.  There are two
    problems with this--one of which is a bug.
    
    The first problem is simply that the intended behavior is to back
    off, and if we aren't able queue the work item to run after a delay
    we're not doing that.
    
    The only reason queue_delayed_work() won't queue work is if the
    provided work item is already queued.  In the messenger, this
    means that con_work() is already scheduled to be run again.  So
    if we simply set the BACKOFF flag again when this occurs, we know
    the next con_work() call will again attempt to hold off activity
    on the connection until after the delay.
    
    The second problem--the bug--is a leak of a reference count.  If
    queue_delayed_work() returns 0 in con_work(), con->ops->put() drops
    the connection reference held on entry to con_work().  However,
    processing is (was) allowed to continue, and at the end of the
    function a second con->ops->put() is called.
    
    This patch fixes both problems.
    Signed-off-by: default avatarAlex Elder <elder@inktank.com>
    Reviewed-by: default avatarSage Weil <sage@inktank.com>
    588377d6
messenger.c 70.9 KB