1. 30 Nov, 2023 22 commits
  2. 29 Nov, 2023 14 commits
  3. 28 Nov, 2023 4 commits
    • Paolo Abeni's avatar
      Merge branch 'net-page_pool-add-netlink-based-introspection' · a3799729
      Paolo Abeni authored
      Jakub Kicinski says:
      
      ====================
      net: page_pool: add netlink-based introspection
      
      We recently started to deploy newer kernels / drivers at Meta,
      making significant use of page pools for the first time.
      We immediately run into page pool leaks both real and false positive
      warnings. As Eric pointed out/predicted there's no guarantee that
      applications will read / close their sockets so a page pool page
      may be stuck in a socket (but not leaked) forever. This happens
      a lot in our fleet. Most of these are obviously due to application
      bugs but we should not be printing kernel warnings due to minor
      application resource leaks.
      
      Conversely the page pool memory may get leaked at runtime, and
      we have no way to detect / track that, unless someone reconfigures
      the NIC and destroys the page pools which leaked the pages.
      
      The solution presented here is to expose the memory use of page
      pools via netlink. This allows for continuous monitoring of memory
      used by page pools, regardless if they were destroyed or not.
      Sample in patch 15 can print the memory use and recycling
      efficiency:
      
      $ ./page-pool
          eth0[2]	page pools: 10 (zombies: 0)
      		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
      		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)
      
      v4:
       - use dev_net(netdev)->loopback_dev
       - extend inflight doc
      v3: https://lore.kernel.org/all/20231122034420.1158898-1-kuba@kernel.org/
       - ID is still here, can't decide if it matters
       - rename destroyed -> detach-time, good enough?
       - fix build for netsec
      v2: https://lore.kernel.org/r/20231121000048.789613-1-kuba@kernel.org
       - hopefully fix build with PAGE_POOL=n
      v1: https://lore.kernel.org/all/20231024160220.3973311-1-kuba@kernel.org/
       - The main change compared to the RFC is that the API now exposes
         outstanding references and byte counts even for "live" page pools.
         The warning is no longer printed if page pool is accessible via netlink.
      RFC: https://lore.kernel.org/all/20230816234303.3786178-1-kuba@kernel.org/
      ====================
      
      Link: https://lore.kernel.org/r/20231126230740.2148636-1-kuba@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a3799729
    • Jakub Kicinski's avatar
      tools: ynl: add sample for getting page-pool information · 637567e4
      Jakub Kicinski authored
      Regenerate the tools/ code after netdev spec changes.
      
      Add sample to query page-pool info in a concise fashion:
      
      $ ./page-pool
          eth0[2]	page pools: 10 (zombies: 0)
      		refs: 41984 bytes: 171966464 (refs: 0 bytes: 0)
      		recycling: 90.3% (alloc: 656:397681 recycle: 89652:270201)
      Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      637567e4
    • Jakub Kicinski's avatar
      net: page_pool: mute the periodic warning for visible page pools · be009667
      Jakub Kicinski authored
      Mute the periodic "stalled pool shutdown" warning if the page pool
      is visible to user space. Rolling out a driver using page pools
      to just a few hundred hosts at Meta surfaces applications which
      fail to reap their broken sockets. Obviously it's best if the
      applications are fixed, but we don't generally print warnings
      for application resource leaks. Admins can now depend on the
      netlink interface for getting page pool info to detect buggy
      apps.
      
      While at it throw in the ID of the pool into the message,
      in rare cases (pools from destroyed netns) this will make
      finding the pool with a debugger easier.
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      be009667
    • Jakub Kicinski's avatar
      net: page_pool: expose page pool stats via netlink · d49010ad
      Jakub Kicinski authored
      Dump the stats into netlink. More clever approaches
      like dumping the stats per-CPU for each CPU individually
      to see where the packets get consumed can be implemented
      in the future.
      
      A trimmed example from a real (but recently booted system):
      
      $ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \
                 --dump page-pool-stats-get
      [{'info': {'id': 19, 'ifindex': 2},
        'alloc-empty': 48,
        'alloc-fast': 3024,
        'alloc-refill': 0,
        'alloc-slow': 48,
        'alloc-slow-high-order': 0,
        'alloc-waive': 0,
        'recycle-cache-full': 0,
        'recycle-cached': 0,
        'recycle-released-refcnt': 0,
        'recycle-ring': 0,
        'recycle-ring-full': 0},
       {'info': {'id': 18, 'ifindex': 2},
        'alloc-empty': 66,
        'alloc-fast': 11811,
        'alloc-refill': 35,
        'alloc-slow': 66,
        'alloc-slow-high-order': 0,
        'alloc-waive': 0,
        'recycle-cache-full': 1145,
        'recycle-cached': 6541,
        'recycle-released-refcnt': 0,
        'recycle-ring': 1275,
        'recycle-ring-full': 0},
       {'info': {'id': 17, 'ifindex': 2},
        'alloc-empty': 73,
        'alloc-fast': 62099,
        'alloc-refill': 413,
      ...
      Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d49010ad