• Harshad Shirwadkar's avatar
    ext4: improve cr 0 / cr 1 group scanning · 196e402a
    Harshad Shirwadkar authored
    Instead of traversing through groups linearly, scan groups in specific
    orders at cr 0 and cr 1. At cr 0, we want to find groups that have the
    largest free order >= the order of the request. So, with this patch,
    we maintain lists for each possible order and insert each group into a
    list based on the largest free order in its buddy bitmap. During cr 0
    allocation, we traverse these lists in the increasing order of largest
    free orders. This allows us to find a group with the best available cr
    0 match in constant time. If nothing can be found, we fallback to cr 1
    immediately.
    
    At CR1, the story is slightly different. We want to traverse in the
    order of increasing average fragment size. For CR1, we maintain a rb
    tree of groupinfos which is sorted by average fragment size. Instead
    of traversing linearly, at CR1, we traverse in the order of increasing
    average fragment size, starting at the most optimal group. This brings
    down cr 1 search complexity to log(num groups).
    
    For cr >= 2, we just perform the linear search as before. Also, in
    case of lock contention, we intermittently fallback to linear search
    even in CR 0 and CR 1 cases. This allows us to proceed during the
    allocation path even in case of high contention.
    
    There is an opportunity to do optimization at CR2 too. That's because
    at CR2 we only consider groups where bb_free counter (number of free
    blocks) is greater than the request extent size. That's left as future
    work.
    
    All the changes introduced in this patch are protected under a new
    mount option "mb_optimize_scan".
    
    With this patchset, following experiment was performed:
    
    Created a highly fragmented disk of size 65TB. The disk had no
    contiguous 2M regions. Following command was run consecutively for 3
    times:
    
    time dd if=/dev/urandom of=file bs=2M count=10
    
    Here are the results with and without cr 0/1 optimizations introduced
    in this patch:
    
    |---------+------------------------------+---------------------------|
    |         | Without CR 0/1 Optimizations | With CR 0/1 Optimizations |
    |---------+------------------------------+---------------------------|
    | 1st run | 5m1.871s                     | 2m47.642s                 |
    | 2nd run | 2m28.390s                    | 0m0.611s                  |
    | 3rd run | 2m26.530s                    | 0m1.255s                  |
    |---------+------------------------------+---------------------------|
    Signed-off-by: default avatarHarshad Shirwadkar <harshadshirwadkar@gmail.com>
    Reported-by: default avatarkernel test robot <lkp@intel.com>
    Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
    Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
    Link: https://lore.kernel.org/r/20210401172129.189766-6-harshadshirwadkar@gmail.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    196e402a
mballoc.c 178 KB