• Christoph Hellwig's avatar
    btrfs: submit IO synchronously for fast checksum implementations · da023618
    Christoph Hellwig authored
    Most modern hardware supports very fast accelerated crc32c calculation.
    If that is supported the CPU overhead of the checksum calculation is
    very limited, and offloading the calculation to special worker threads
    has a lot of overhead for no gain.
    
    E.g. on an Intel Optane device is actually very much slows down even
    1M buffered writes with fio:
    
    Unpatched:
    
    write: IOPS=3316, BW=3316MiB/s (3477MB/s)(200GiB/61757msec); 0 zone resets
    
    With synchronous CRCs:
    
    write: IOPS=4882, BW=4882MiB/s (5119MB/s)(200GiB/41948msec); 0 zone resets
    
    With a lot of variation during the unpatched run going down as low as
    1100MB/s, while the synchronous CRC version has about the same peak write
    speed but much lower dips, and fewer kworkers churning around.
    Both tests had fio saturated at 100% CPU.
    
    (thanks to Jens Axboe via Chris Mason for the benchmarking)
    Reviewed-by: default avatarChris Mason <clm@fb.com>
    Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
    Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
    da023618
bio.c 23.7 KB