• Dave Chinner's avatar
    xfs: remote attribute blocks aren't really userdata · 292378ed
    Dave Chinner authored
    When adding a new remote attribute, we write the attribute to the
    new extent before the allocation transaction is committed. This
    means we cannot reuse busy extents as that violates crash
    consistency semantics. Hence we currently treat remote attribute
    extent allocation like userdata because it has the same overwrite
    ordering constraints as userdata.
    
    Unfortunately, this also allows the allocator to incorrectly apply
    extent size hints to the remote attribute extent allocation. This
    results in interesting failures, such as transaction block
    reservation overruns and in-memory inode attribute fork corruption.
    
    To fix this, we need to separate the busy extent reuse configuration
    from the userdata configuration. This changes the definition of
    XFS_BMAPI_METADATA slightly - it now means that allocation is
    metadata and reuse of busy extents is acceptible due to the metadata
    ordering semantics of the journal. If this flag is not set, it
    means the allocation is that has unordered data writeback, and hence
    busy extent reuse is not allowed. It no longer implies the
    allocation is for user data, just that the data write will not be
    strictly ordered. This matches the semantics for both user data
    and remote attribute block allocation.
    
    As such, This patch changes the "userdata" field to a "datatype"
    field, and adds a "no busy reuse" flag to the field.
    When we detect an unordered data extent allocation, we immediately set
    the no reuse flag. We then set the "user data" flags based on the
    inode fork we are allocating the extent to. Hence we only set
    userdata flags on data fork allocations now and consider attribute
    fork remote extents to be an unordered metadata extent.
    
    The result is that remote attribute extents now have the expected
    allocation semantics, and the data fork allocation behaviour is
    completely unchanged.
    
    It should be noted that there may be other ways to fix this (e.g.
    use ordered metadata buffers for the remote attribute extent data
    write) but they are more invasive and difficult to validate both
    from a design and implementation POV. Hence this patch takes the
    simple, obvious route to fixing the problem...
    Reported-and-tested-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
    292378ed
xfs_bmap_util.c 47.5 KB