• Darrick J. Wong's avatar
    vfs: remove lockdep bogosity in __sb_start_write · 22843291
    Darrick J. Wong authored
    __sb_start_write has some weird looking lockdep code that claims to
    exist to handle nested freeze locking requests from xfs.  The code as
    written seems broken -- if we think we hold a read lock on any of the
    higher freeze levels (e.g. we hold SB_FREEZE_WRITE and are trying to
    lock SB_FREEZE_PAGEFAULT), it converts a blocking lock attempt into a
    trylock.
    
    However, it's not correct to downgrade a blocking lock attempt to a
    trylock unless the downgrading code or the callers are prepared to deal
    with that situation.  Neither __sb_start_write nor its callers handle
    this at all.  For example:
    
    sb_start_pagefault ignores the return value completely, with the result
    that if xfs_filemap_fault loses a race with a different thread trying to
    fsfreeze, it will proceed without pagefault freeze protection (thereby
    breaking locking rules) and then unlocks the pagefault freeze lock that
    it doesn't own on its way out (thereby corrupting the lock state), which
    leads to a system hang shortly afterwards.
    
    Normally, this won't happen because our ownership of a read lock on a
    higher freeze protection level blocks fsfreeze from grabbing a write
    lock on that higher level.  *However*, if lockdep is offline,
    lock_is_held_type unconditionally returns 1, which means that
    percpu_rwsem_is_held returns 1, which means that __sb_start_write
    unconditionally converts blocking freeze lock attempts into trylocks,
    even when we *don't* hold anything that would block a fsfreeze.
    
    Apparently this all held together until 5.10-rc1, when bugs in lockdep
    caused lockdep to shut itself off early in an fstests run, and once
    fstests gets to the "race writes with freezer" tests, kaboom.  This
    might explain the long trail of vanishingly infrequent livelocks in
    fstests after lockdep goes offline that I've never been able to
    diagnose.
    
    We could fix it by spinning on the trylock if wait==true, but AFAICT the
    locking works fine if lockdep is not built at all (and I didn't see any
    complaints running fstests overnight), so remove this snippet entirely.
    
    NOTE: Commit f4b554af in 2015 created the current weird logic (which
    used to exist in a different form in commit 5accdf82
    
     from 2012) in
    __sb_start_write.  XFS solved this whole problem in the late 2.6 era by
    creating a variant of transactions (XFS_TRANS_NO_WRITECOUNT) that don't
    grab intwrite freeze protection, thus making lockdep's solution
    unnecessary.  The commit claims that Dave Chinner explained that the
    trylock hack + comment could be removed, but nobody ever did.
    Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    22843291
super.c 47.3 KB