• Ilya Dryomov's avatar
    libceph: defer __complete_request() to a workqueue · 88bc1922
    Ilya Dryomov authored
    In the common case, req->r_callback is called by handle_reply() on the
    ceph-msgr worker thread without any locks.  If handle_reply() fails, it
    is called with both osd->lock and osdc->lock.  In the map check case,
    it is called with just osdc->lock but held for write.  Finally, if the
    request is aborted because of -ENOSPC or by ceph_osdc_abort_requests(),
    it is called directly on the submitter's thread, again with both locks.
    
    req->r_callback on the submitter's thread is relatively new (introduced
    in 4.12) and ripe for deadlocks -- e.g. writeback worker thread waiting
    on itself:
    
      inode_wait_for_writeback+0x26/0x40
      evict+0xb5/0x1a0
      iput+0x1d2/0x220
      ceph_put_wrbuffer_cap_refs+0xe0/0x2c0 [ceph]
      writepages_finish+0x2d3/0x410 [ceph]
      __complete_request+0x26/0x60 [libceph]
      complete_request+0x2e/0x70 [libceph]
      __submit_request+0x256/0x330 [libceph]
      submit_request+0x2b/0x30 [libceph]
      ceph_osdc_start_request+0x25/0x40 [libceph]
      ceph_writepages_start+0xdfe/0x1320 [ceph]
      do_writepages+0x1f/0x70
      __writeback_single_inode+0x45/0x330
      writeback_sb_inodes+0x26a/0x600
      __writeback_inodes_wb+0x92/0xc0
      wb_writeback+0x274/0x330
      wb_workfn+0x2d5/0x3b0
    
    Defer __complete_request() to a workqueue in all failure cases so it's
    never on the same thread as ceph_osdc_start_request() and always called
    with no locks held.
    
    Link: http://tracker.ceph.com/issues/23978Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
    Acked-by: default avatarJeff Layton <jlayton@redhat.com>
    Reviewed-by: default avatar"Yan, Zheng" <zyan@redhat.com>
    88bc1922
osd_client.c 141 KB