• Sebastian Andrzej Siewior's avatar
    net: Allow to use SMP threads for backlog NAPI. · dad6b977
    Sebastian Andrzej Siewior authored
    Backlog NAPI is a per-CPU NAPI struct only (with no device behind it)
    used by drivers which don't do NAPI them self, RPS and parts of the
    stack which need to avoid recursive deadlocks while processing a packet.
    
    The non-NAPI driver use the CPU local backlog NAPI. If RPS is enabled
    then a flow for the skb is computed and based on the flow the skb can be
    enqueued on a remote CPU. Scheduling/ raising the softirq (for backlog's
    NAPI) on the remote CPU isn't trivial because the softirq is only
    scheduled on the local CPU and performed after the hardirq is done.
    In order to schedule a softirq on the remote CPU, an IPI is sent to the
    remote CPU which schedules the backlog-NAPI on the then local CPU.
    
    On PREEMPT_RT interrupts are force-threaded. The soft interrupts are
    raised within the interrupt thread and processed after the interrupt
    handler completed still within the context of the interrupt thread. The
    softirq is handled in the context where it originated.
    
    With force-threaded interrupts enabled, ksoftirqd is woken up if a
    softirq is raised from hardirq context. This is the case if it is raised
    from an IPI. Additionally there is a warning on PREEMPT_RT if the
    softirq is raised from the idle thread.
    This was done for two reasons:
    - With threaded interrupts the processing should happen in thread
      context (where it originated) and ksoftirqd is the only thread for
      this context if raised from hardirq. Using the currently running task
      instead would "punish" a random task.
    - Once ksoftirqd is active it consumes all further softirqs until it
      stops running. This changed recently and is no longer the case.
    
    Instead of keeping the backlog NAPI in ksoftirqd (in force-threaded/
    PREEMPT_RT setups) I am proposing NAPI-threads for backlog.
    The "proper" setup with threaded-NAPI is not doable because the threads
    are not pinned to an individual CPU and can be modified by the user.
    Additionally a dummy network device would have to be assigned. Also
    CPU-hotplug has to be considered if additional CPUs show up.
    All this can be probably done/ solved but the smpboot-threads already
    provide this infrastructure.
    
    Sending UDP packets over loopback expects that the packet is processed
    within the call. Delaying it by handing it over to the thread hurts
    performance. It is not beneficial to the outcome if the context switch
    happens immediately after enqueue or after a while to process a few
    packets in a batch.
    There is no need to always use the thread if the backlog NAPI is
    requested on the local CPU. This restores the loopback throuput. The
    performance drops mostly to the same value after enabling RPS on the
    loopback comparing the IPI and the tread result.
    
    Create NAPI-threads for backlog if request during boot. The thread runs
    the inner loop from napi_threaded_poll(), the wait part is different. It
    checks for NAPI_STATE_SCHED (the backlog NAPI can not be disabled).
    
    The NAPI threads for backlog are optional, it has to be enabled via the boot
    argument "thread_backlog_napi". It is mandatory for PREEMPT_RT to avoid the
    wakeup of ksoftirqd from the IPI.
    Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
    Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
    dad6b977
dev.c 301 KB