• Ilya Dryomov's avatar
    libceph: fix ceph_msg_revoke() · c83dd1dd
    Ilya Dryomov authored
    commit 67645d76 upstream.
    
    There are a number of problems with revoking a "was sending" message:
    
    (1) We never make any attempt to revoke data - only kvecs contibute to
    con->out_skip.  However, once the header (envelope) is written to the
    socket, our peer learns data_len and sets itself to expect at least
    data_len bytes to follow front or front+middle.  If ceph_msg_revoke()
    is called while the messenger is sending message's data portion,
    anything we send after that call is counted by the OSD towards the now
    revoked message's data portion.  The effects vary, the most common one
    is the eventual hang - higher layers get stuck waiting for the reply to
    the message that was sent out after ceph_msg_revoke() returned and
    treated by the OSD as a bunch of data bytes.  This is what Matt ran
    into.
    
    (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs
    is wrong.  If ceph_msg_revoke() is called before the tag is sent out or
    while the messenger is sending the header, we will get a connection
    reset, either due to a bad tag (0 is not a valid tag) or a bad header
    CRC, which kind of defeats the purpose of revoke.  Currently the kernel
    client refuses to work with header CRCs disabled, but that will likely
    change in the future, making this even worse.
    
    (3) con->out_skip is not reset on connection reset, leading to one or
    more spurious connection resets if we happen to get a real one between
    con->out_skip is set in ceph_msg_revoke() and before it's cleared in
    write_partial_skip().
    
    Fixing (1) and (3) is trivial.  The idea behind fixing (2) is to never
    zero the tag or the header, i.e. send out tag+header regardless of when
    ceph_msg_revoke() is called.  That way the header is always correct, no
    unnecessary resets are induced and revoke stands ready for disabled
    CRCs.  Since ceph_msg_revoke() rips out con->out_msg, introduce a new
    "message out temp" and copy the header into it before sending.
    Reported-by: default avatarMatt Conner <matt.conner@keepertech.com>
    Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    Tested-by: default avatarMatt Conner <matt.conner@keepertech.com>
    Reviewed-by: default avatarSage Weil <sage@redhat.com>
    Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
    c83dd1dd
messenger.c 84.7 KB