• Jesper Dangaard Brouer's avatar
    page_pool: handle page recycle for NUMA_NO_NODE condition · 44768dec
    Jesper Dangaard Brouer authored
    The check in pool_page_reusable (page_to_nid(page) == pool->p.nid) is
    not valid if page_pool was configured with pool->p.nid = NUMA_NO_NODE.
    
    The goal of the NUMA changes in commit d5394610 ("page_pool: Don't
    recycle non-reusable pages"), were to have RX-pages that belongs to the
    same NUMA node as the CPU processing RX-packet during softirq/NAPI. As
    illustrated by the performance measurements.
    
    This patch moves the NAPI checks out of fast-path, and at the same time
    solves the NUMA_NO_NODE issue.
    
    First realize that alloc_pages_node() with pool->p.nid = NUMA_NO_NODE
    will lookup current CPU nid (Numa ID) via numa_mem_id(), which is used
    as the the preferred nid.  It is only in rare situations, where
    e.g. NUMA zone runs dry, that page gets doesn't get allocated from
    preferred nid.  The page_pool API allows drivers to control the nid
    themselves via controlling pool->p.nid.
    
    This patch moves the NAPI check to when alloc cache is refilled, via
    dequeuing/consuming pages from the ptr_ring. Thus, we can allow placing
    pages from remote NUMA into the ptr_ring, as the dequeue/consume step
    will check the NUMA node. All current drivers using page_pool will
    alloc/refill RX-ring from same CPU running softirq/NAPI process.
    
    Drivers that control the nid explicitly, also use page_pool_update_nid
    when changing nid runtime.  To speed up transision to new nid the alloc
    cache is now flushed on nid changes.  This force pages to come from
    ptr_ring, which does the appropate nid check.
    
    For the NUMA_NO_NODE case, when a NIC IRQ is moved to another NUMA
    node, we accept that transitioning the alloc cache doesn't happen
    immediately. The preferred nid change runtime via consulting
    numa_mem_id() based on the CPU processing RX-packets.
    
    Notice, to avoid stressing the page buddy allocator and avoid doing too
    much work under softirq with preempt disabled, the NUMA check at
    ptr_ring dequeue will break the refill cycle, when detecting a NUMA
    mismatch. This will cause a slower transition, but its done on purpose.
    
    Fixes: d5394610 ("page_pool: Don't recycle non-reusable pages")
    Reported-by: default avatarLi RongQing <lirongqing@baidu.com>
    Reported-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
    Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    44768dec
page_pool.c 13.8 KB