• Jens Axboe's avatar
    io_uring: speedup provided buffer handling · cc3cec83
    Jens Axboe authored
    In testing high frequency workloads with provided buffers, we spend a
    lot of time in allocating and freeing the buffer units themselves.
    Rather than repeatedly free and alloc them, add a recycling cache
    instead. There are two caches:
    
    - ctx->io_buffers_cache. This is the one we grab from in the submission
      path, and it's protected by ctx->uring_lock. For inline completions,
      we can recycle straight back to this cache and not need any extra
      locking.
    
    - ctx->io_buffers_comp. If we're not under uring_lock, then we use this
      list to recycle buffers. It's protected by the completion_lock.
    
    On adding a new buffer, check io_buffers_cache. If it's empty, check if
    we can splice entries from the io_buffers_comp_cache.
    
    This reduces about 5-10% of overhead from provided buffers, bringing it
    pretty close to the non-provided path.
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    cc3cec83
io_uring.c 285 KB