• Jens Axboe's avatar
    io_uring: add support for ring mapped supplied buffers · c7fb1942
    Jens Axboe authored
    Provided buffers allow an application to supply io_uring with buffers
    that can then be grabbed for a read/receive request, when the data
    source is ready to deliver data. The existing scheme relies on using
    IORING_OP_PROVIDE_BUFFERS to do that, but it can be difficult to use
    in real world applications. It's pretty efficient if the application
    is able to supply back batches of provided buffers when they have been
    consumed and the application is ready to recycle them, but if
    fragmentation occurs in the buffer space, it can become difficult to
    supply enough buffers at the time. This hurts efficiency.
    
    Add a register op, IORING_REGISTER_PBUF_RING, which allows an application
    to setup a shared queue for each buffer group of provided buffers. The
    application can then supply buffers simply by adding them to this ring,
    and the kernel can consume then just as easily. The ring shares the head
    with the application, the tail remains private in the kernel.
    
    Provided buffers setup with IORING_REGISTER_PBUF_RING cannot use
    IORING_OP_{PROVIDE,REMOVE}_BUFFERS for adding or removing entries to the
    ring, they must use the mapped ring. Mapped provided buffer rings can
    co-exist with normal provided buffers, just not within the same group ID.
    
    To gauge overhead of the existing scheme and evaluate the mapped ring
    approach, a simple NOP benchmark was written. It uses a ring of 128
    entries, and submits/completes 32 at the time. 'Replenish' is how
    many buffers are provided back at the time after they have been
    consumed:
    
    Test			Replenish			NOPs/sec
    ================================================================
    No provided buffers	NA				~30M
    Provided buffers	32				~16M
    Provided buffers	 1				~10M
    Ring buffers		32				~27M
    Ring buffers		 1				~27M
    
    The ring mapped buffers perform almost as well as not using provided
    buffers at all, and they don't care if you provided 1 or more back at
    the same time. This means application can just replenish as they go,
    rather than need to batch and compact, further reducing overhead in the
    application. The NOP benchmark above doesn't need to do any compaction,
    so that overhead isn't even reflected in the above test.
    Co-developed-by: default avatarDylan Yudaken <dylany@fb.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    c7fb1942
io_uring.c 307 KB