• Sagi Grimberg's avatar
    nvme-tcp: optimize queue io_cpu assignment for multiple queue maps · 40510a63
    Sagi Grimberg authored
    Currently, queue io_cpu assignment is done sequentially for default,
    read and poll queues based on queue id. This causes miss-alignment between
    context of CPU initiating I/O and the I/O worker thread processing
    queued requests or completions.
    
    Change to modify queue io_cpu assignment to take into account queue
    maps offset. Each queue io_cpu will start at zero for each queue map.
    This essentially aligns read/poll queues to start over the same range as
    default queues.
    
    Testing performed by Mark with:
    - ram device (nvmet)
    - single CPU core (pinned)
    - 100% 4k reads
    - engine io_uring (not using sq_thread option)
    - hipri flag set
    
    Micro-benchmark results show a net gain of:
    - increase of 18%-29% in IOPs
    - reduction of 16%-22% in average latency
    - reduction of 7%-23% in 99.99% latency
    
    Baseline:
    ========
    QDepth/Batch	| IOPs [k]	| Avg. Lat [us]	| 99.99% Lat [us]
    -----------------------------------------------------------------
    1/1 		| 32.4		| 30.11		| 50.94
    32/8		| 179		| 168.20	| 371
    
    CPU alignment:
    =============
    QDepth/Batch	| IOPs [k]	| Avg. Lat [us]	| 99.99% Lat [us]
    -----------------------------------------------------------------
    1/1 		| 38.5		|   25.18	| 39.16
    32/8		| 231		|   130.75	| 343
    Reported-by: default avatarMark Wunderlich <mark.wunderlich@intel.com>
    Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
    Signed-off-by: default avatarKeith Busch <kbusch@kernel.org>
    40510a63
tcp.c 61.5 KB