• Dave Chinner's avatar
    xfs: stop holding ILOCK over filldir callbacks · dbad7c99
    Dave Chinner authored
    The recent change to the readdir locking made in 40194ecc ("xfs:
    reinstate the ilock in xfs_readdir") for CXFS directory sanity was
    probably the wrong thing to do. Deep in the readdir code we
    can take page faults in the filldir callback, and so taking a page
    fault while holding an inode ilock creates a new set of locking
    issues that lockdep warns all over the place about.
    
    The locking order for regular inodes w.r.t. page faults is io_lock
    -> pagefault -> mmap_sem -> ilock. The directory readdir code now
    triggers ilock -> page fault -> mmap_sem. While we cannot deadlock
    at this point, it inverts all the locking patterns that lockdep
    normally sees on XFS inodes, and so triggers lockdep. We worked
    around this with commit 93a8614e ("xfs: fix directory inode iolock
    lockdep false positive"), but that then just moved the lockdep
    warning to deeper in the page fault path and triggered on security
    inode locks. Fixing the shmem issue there just moved the lockdep
    reports somewhere else, and now we are getting false positives from
    filesystem freezing annotations getting confused.
    
    Further, if we enter memory reclaim in a readdir path, we now get
    lockdep warning about potential deadlocks because the ilock is held
    when we enter reclaim. This, again, is different to a regular file
    in that we never allow memory reclaim to run while holding the ilock
    for regular files. Hence lockdep now throws
    ilock->kmalloc->reclaim->ilock warnings.
    
    Basically, the problem is that the ilock is being used to protect
    the directory data and the inode metadata, whereas for a regular
    file the iolock protects the data and the ilock protects the
    metadata. From the VFS perspective, the i_mutex serialises all
    accesses to the directory data, and so not holding the ilock for
    readdir doesn't matter. The issue is that CXFS doesn't access
    directory data via the VFS, so it has no "data serialisaton"
    mechanism. Hence we need to hold the IOLOCK in the correct places to
    provide this low level directory data access serialisation.
    
    The ilock can then be used just when the extent list needs to be
    read, just like we do for regular files. The directory modification
    code can take the iolock exclusive when the ilock is also taken,
    and this then ensures that readdir is correct excluded while
    modifications are in progress.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    dbad7c99
xfs_inode.c 96.8 KB