• Xiubo Li's avatar
    ceph: fix possible deadlock when holding Fwb to get inline_data · 825978fd
    Xiubo Li authored
    1, mount with wsync.
    2, create a file with O_RDWR, and the request was sent to mds.0:
    
       ceph_atomic_open()-->
         ceph_mdsc_do_request(openc)
         finish_open(file, dentry, ceph_open)-->
           ceph_open()-->
             ceph_init_file()-->
               ceph_init_file_info()-->
                 ceph_uninline_data()-->
                 {
                   ...
                   if (inline_version == 1 || /* initial version, no data */
                       inline_version == CEPH_INLINE_NONE)
                         goto out_unlock;
                   ...
                 }
    
    The inline_version will be 1, which is the initial version for the
    new create file. And here the ci->i_inline_version will keep with 1,
    it's buggy.
    
    3, buffer write to the file immediately:
    
       ceph_write_iter()-->
         ceph_get_caps(file, need=Fw, want=Fb, ...);
         generic_perform_write()-->
           a_ops->write_begin()-->
             ceph_write_begin()-->
               netfs_write_begin()-->
                 netfs_begin_read()-->
                   netfs_rreq_submit_slice()-->
                     netfs_read_from_server()-->
                       rreq->netfs_ops->issue_read()-->
                         ceph_netfs_issue_read()-->
                         {
                           ...
                           if (ci->i_inline_version != CEPH_INLINE_NONE &&
                               ceph_netfs_issue_op_inline(subreq))
                             return;
                           ...
                         }
         ceph_put_cap_refs(ci, Fwb);
    
    The ceph_netfs_issue_op_inline() will send a getattr(Fsr) request to
    mds.1.
    
    4, then the mds.1 will request the rd lock for CInode::filelock from
    the auth mds.0, the mds.0 will do the CInode::filelock state transation
    from excl --> sync, but it need to revoke the Fxwb caps back from the
    clients.
    
    While the kernel client has aleady held the Fwb caps and waiting for
    the getattr(Fsr).
    
    It's deadlock!
    
    URL: https://tracker.ceph.com/issues/55377Signed-off-by: default avatarXiubo Li <xiubli@redhat.com>
    Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
    Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    825978fd
addr.c 54.5 KB