• Sage Weil's avatar
    ceph: fix mds sync() race with completing requests · 80fc7314
    Sage Weil authored
    The wait_unsafe_requests() helper dropped the mdsc mutex to wait
    for each request to complete, and then examined r_node to get the
    next request after retaking the lock.  But the request completion
    removes the request from the tree, so r_node was always undefined
    at this point.  Since it's a small race, it usually led to a
    valid request, but not always.  The result was an occasional
    crash in rb_next() while dereferencing node->rb_left.
    
    Fix this by clearing the rb_node when removing the request from
    the request tree, and not walking off into the weeds when we
    are done waiting for a request.  Since the request we waited on
    will _always_ be out of the request tree, take a ref on the next
    request, in the hopes that it won't be.  But if it is, it's ok:
    we can start over from the beginning (and traverse over older read
    requests again).
    Signed-off-by: default avatarSage Weil <sage@newdream.net>
    80fc7314
mds_client.c 75.5 KB