• Andreas Gruenbacher's avatar
    gfs2: Cancel remote delete work asynchronously · 486408d6
    Andreas Gruenbacher authored
    In gfs2_inode_lookup and gfs2_create_inode, we're calling
    gfs2_cancel_delete_work which currently cancels any remote delete work
    (delete_work_func) synchronously.  This means that if the work is
    currently running, it will wait for it to finish.  We're doing this to
    pevent a previous instance of an inode from having any influence on the
    next instance.
    
    However, delete_work_func uses gfs2_inode_lookup internally, and we can
    end up in a deadlock when delete_work_func gets interrupted at the wrong
    time.  For example,
    
      (1) An inode's iopen glock has delete work queued, but the inode
          itself has been evicted from the inode cache.
    
      (2) The delete work is preempted before reaching gfs2_inode_lookup.
    
      (3) Another process recreates the inode (gfs2_create_inode).  It tries
          to cancel any outstanding delete work, which blocks waiting for
          the ongoing delete work to finish.
    
      (4) The delete work calls gfs2_inode_lookup, which blocks waiting for
          gfs2_create_inode to instantiate and unlock the new inode =>
          deadlock.
    
    It turns out that when the delete work notices that its inode has been
    re-instantiated, it will do nothing.  This means that it's safe to
    cancel the delete work asynchronously.  This prevents the kind of
    deadlock described above.
    Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
    Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
    486408d6
glock.c 71.7 KB