• Andi Kleen's avatar
    dio: optimize cache misses in the submission path · 65dd2aa9
    Andi Kleen authored
    Some investigation of a transaction processing workload showed that a
    major consumer of cycles in __blockdev_direct_IO is the cache miss while
    accessing the block size.  This is because it has to walk the chain from
    block_dev to gendisk to queue.
    
    The block size is needed early on to check alignment and sizes.  It's only
    done if the check for the inode block size fails.  But the costly block
    device state is unconditionally fetched.
    
    - Reorganize the code to only fetch block dev state when actually
      needed.
    
    Then do a prefetch on the block dev early on in the direct IO path.  This
    is worth it, because there is substantial code run before we actually
    touch the block dev now.
    
    - I also added some unlikelies to make it clear the compiler that block
      device fetch code is not normally executed.
    
    This gave a small, but measurable improvement on a large database
    benchmark (about 0.3%)
    
    [akpm@linux-foundation.org: coding-style fixes]
    [sfr@canb.auug.org.au: using prefetch requires including prefetch.h]
    Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
    Cc: Jeff Moyer <jmoyer@redhat.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    65dd2aa9
direct-io.c 38.5 KB