• Eric Whitney's avatar
    ext4: shrink race window in ext4_should_retry_alloc() · efc61345
    Eric Whitney authored
    When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it
    fails at significant rates on the two test scenarios that disable
    delayed allocation (ext3conv and data_journal) and force actual block
    allocation for the fallocate and pwrite functions in the test.  The
    failure rate on 5.10 for both ext3conv and data_journal on one test
    system typically runs about 85%.  On 5.11, the failure rate on ext3conv
    sometimes drops to as low as 1% while the rate on data_journal
    increases to nearly 100%.
    
    The observed failures are largely due to ext4_should_retry_alloc()
    cutting off block allocation retries when s_mb_free_pending (used to
    indicate that a transaction in progress will free blocks) is 0.
    However, free space is usually available when this occurs during runs
    of generic/371.  It appears that a thread attempting to allocate
    blocks is just missing transaction commits in other threads that
    increase the free cluster count and reset s_mb_free_pending while
    the allocating thread isn't running.  Explicitly testing for free space
    availability avoids this race.
    
    The current code uses a post-increment operator in the conditional
    expression that determines whether the retry limit has been exceeded.
    This means that the conditional expression uses the value of the
    retry counter before it's increased, resulting in an extra retry cycle.
    The current code actually retries twice before hitting its retry limit
    rather than once.
    
    Increasing the retry limit to 3 from the current actual maximum retry
    count of 2 in combination with the change described above reduces the
    observed failure rate to less that 0.1% on both ext3conv and
    data_journal with what should be limited impact on users sensitive to
    the overhead caused by retries.
    
    A per filesystem percpu counter exported via sysfs is added to allow
    users or developers to track the number of times the retry limit is
    exceeded without resorting to debugging methods.  This should provide
    some insight into worst case retry behavior.
    Signed-off-by: default avatarEric Whitney <enwlinux@gmail.com>
    Link: https://lore.kernel.org/r/20210218151132.19678-1-enwlinux@gmail.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    efc61345
balloc.c 26.9 KB