• Filipe Manana's avatar
    Btrfs: add semaphore to synchronize direct IO writes with fsync · 5f9a8a51
    Filipe Manana authored
    Due to the optimization of lockless direct IO writes (the inode's i_mutex
    is not held) introduced in commit 38851cc1 ("Btrfs: implement unlocked
    dio write"), we started having races between such writes with concurrent
    fsync operations that use the fast fsync path. These races were addressed
    in the patches titled "Btrfs: fix race between fsync and lockless direct
    IO writes" and "Btrfs: fix race between fsync and direct IO writes for
    prealloc extents". The races happened because the direct IO path, like
    every other write path, does create extent maps followed by the
    corresponding ordered extents while the fast fsync path collected first
    ordered extents and then it collected extent maps. This made it possible
    to log file extent items (based on the collected extent maps) without
    waiting for the corresponding ordered extents to complete (get their IO
    done). The two fixes mentioned before added a solution that consists of
    making the direct IO path create first the ordered extents and then the
    extent maps, while the fsync path attempts to collect any new ordered
    extents once it collects the extent maps. This was simple and did not
    require adding any synchonization primitive to any data structure (struct
    btrfs_inode for example) but it makes things more fragile for future
    development endeavours and adds an exceptional approach compared to the
    other write paths.
    
    This change adds a read-write semaphore to the btrfs inode structure and
    makes the direct IO path create the extent maps and the ordered extents
    while holding read access on that semaphore, while the fast fsync path
    collects extent maps and ordered extents while holding write access on
    that semaphore. The logic for direct IO write path is encapsulated in a
    new helper function that is used both for cow and nocow direct IO writes.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Reviewed-by: default avatarJosef Bacik <jbacik@fb.com>
    5f9a8a51
tree-log.c 152 KB