1. 19 Apr, 2018 29 commits
  2. 18 Apr, 2018 11 commits
    • David S. Miller's avatar
      Merge branch 'ipv6-Separate-data-structures-for-FIB-and-data-path' · 0565de29
      David S. Miller authored
      David Ahern says:
      
      ====================
      net/ipv6: Separate data structures for FIB and data path
      
      IPv6 uses the same data struct for both control plane (FIB entries) and
      data path (dst entries). This struct has elements needed for both paths
      adding memory overhead and complexity (taking a dst hold in most places
      but an additional reference on rt6i_ref in a few). Furthermore, because
      of the dst_alloc tie, all FIB entries are allocated with GFP_ATOMIC.
      
      This patch set separates FIB entries from dst entries, better aligning
      IPv6 code with IPv4, simplifying the reference counting and allowing
      FIB entries added by userspace (not autoconf) to use GFP_KERNEL. It is
      first step to a number of performance and scalability changes.
      
      The end result of this patch set:
        - FIB entries (fib6_info):
              /* size: 208, cachelines: 4, members: 25 */
              /* sum members: 207, holes: 1, sum holes: 1 */
      
        - dst entries (rt6_info)
             /* size: 240, cachelines: 4, members: 11 */
      
      Versus the the single rt6_info struct today for both paths:
            /* size: 320, cachelines: 5, members: 28 */
      
      This amounts to a 35% reduction in memory use for FIB entries and a
      25% reduction for dst entries.
      
      With respect to locking FIB entries use RCU and a single atomic
      counter with fib6_info_hold and fib6_info_release helpers to manage
      the reference counting. dst entries use only the traditional dst
      refcounts with dst_hold and dst_release.
      
      FIB entries for host routes are referenced by inet6_ifaddr and
      ifacaddr6. In both cases, additional holds are taken -- similar to
      what is done for devices.
      
      This set is the first of many changes to improve the scalability of the
      IPv6 code. Follow on changes include:
      - consolidating duplicate fib6_info references like IPv4 does with
        duplicate fib_info
      
      - moving fib6_info into a slab cache to avoid allocation roundups to
        power of 2 (the 208 size becomes a 256 actual allocation)
      
      - Allow FIB lookups without generating a dst (e.g., most rt6_lookup
        users just want to verify the egress device). Means moving dst
        allocation to the other side of fib6_rule_lookup which again aligns
        with IPv4 behavior
      
      - using separate standalone nexthop objects which have performance
        benefits beyond fib_info consolidation
      
      At this point I am not seeing any refcount leaks or underflows, no
      oops or bug_ons, or warnings from kasan, so I think it is ready for
      others to beat up on it finding errors in code paths I have missed.
      
      v2 changes
      - rebased to top of tree
      - improved commit message on patch 7
      
      v1 changes
      - rebased to top of tree
      - fix memory leak of metrics as noted by Ido
      - MTU fixes based on pmtu tests (thanks Stefano Brivio for writing)
      
      RFC v2 changes
      - improved commit messages
      - move common metrics code from dst.c to net/ipv4/metrics.c (comment
        from DaveM)
      - address comments from Wei Wang and Martin KaFai Lau (let me know if
        I missed something)
      - fixes detected by kernel test robots
        + added fib6_metric_set to change metric on a FIB entry which could
          be pointing to read-only dst_default_metrics
        + 0day testing found a problem with an intermediate patch; added
          dst_hold_safe on rt->from. Code is removed 3 patches later
      - allow cacheinfo to handle NULL dst; means only expires is pushed to
        userspace
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0565de29
    • David Ahern's avatar
      net/ipv6: Remove unused code and variables for rt6_info · 77634cc6
      David Ahern authored
      Drop unneeded elements from rt6_info struct and rearrange layout to
      something more relevant for the data path.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77634cc6
    • David Ahern's avatar
      net/ipv6: Flip FIB entries to fib6_info · 8d1c802b
      David Ahern authored
      Convert all code paths referencing a FIB entry from
      rt6_info to fib6_info.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d1c802b
    • David Ahern's avatar
      net/ipv6: separate handling of FIB entries from dst based routes · 93531c67
      David Ahern authored
      Last step before flipping the data type for FIB entries:
      - use fib6_info_alloc to create FIB entries in ip6_route_info_create
        and addrconf_dst_alloc
      - use fib6_info_release in place of dst_release, ip6_rt_put and
        rt6_release
      - remove the dst_hold before calling __ip6_ins_rt or ip6_del_rt
      - when purging routes, drop per-cpu routes
      - replace inc and dec of rt6i_ref with fib6_info_hold and fib6_info_release
      - use rt->from since it points to the FIB entry
      - drop references to exception bucket, fib6_metrics and per-cpu from
        dst entries (those are relevant for fib entries only)
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93531c67
    • David Ahern's avatar
      net/ipv6: introduce fib6_info struct and helpers · a64efe14
      David Ahern authored
      Add fib6_info struct and alloc, destroy, hold and release helpers.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a64efe14
    • David Ahern's avatar
      net/ipv6: Cleanup exception and cache route handling · 23fb93a4
      David Ahern authored
      IPv6 FIB will only contain FIB entries with exception routes added to
      the FIB entry. Once this transformation is complete, FIB lookups will
      return a fib6_info with the lookup functions still returning a dst
      based rt6_info. The current code uses rt6_info for both paths and
      overloads the rt6_info variable usually called 'rt'.
      
      This patch introduces a new 'f6i' variable name for the result of the FIB
      lookup and keeps 'rt' as the dst based return variable. 'f6i' becomes a
      fib6_info in a later patch which is why it is introduced as f6i now;
      avoids the additional churn in the later patch.
      
      In addition, remove RTF_CACHE and dst checks from fib6 add and delete
      since they can not happen now and will never happen after the data
      type flip.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23fb93a4
    • David Ahern's avatar
      net/ipv6: Add gfp_flags to route add functions · acb54e3c
      David Ahern authored
      Most FIB entries can be added using memory allocated with GFP_KERNEL.
      Add gfp_flags to ip6_route_add and addrconf_dst_alloc. Code paths that
      can be reached from the packet path (e.g., ndisc and autoconfig) or
      atomic notifiers use GFP_ATOMIC; paths from user context (adding
      addresses and routes) use GFP_KERNEL.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      acb54e3c
    • David Ahern's avatar
      net/ipv6: Create a neigh_lookup for FIB entries · f8a1b43b
      David Ahern authored
      The router discovery code has a FIB entry and wants to validate the
      gateway has a neighbor entry. Refactor the existing dst_neigh_lookup
      for IPv6 and create a new function that takes the gateway and device
      and returns a neighbor entry. Use the new function in
      ndisc_router_discovery to validate the gateway.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8a1b43b
    • David Ahern's avatar
      net/ipv6: Move dst flags to booleans in fib entries · 3b6761d1
      David Ahern authored
      Continuing to wean FIB paths off of dst_entry, use a bool to hold
      requests for certain dst settings. Add a helper to convert the
      flags to DST flags when a FIB entry is converted to a dst_entry.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b6761d1
    • David Ahern's avatar
      net/ipv6: Add rt6_info create function for ip6_pol_route_lookup · dec9b0e2
      David Ahern authored
      ip6_pol_route_lookup is the lookup function for ip6_route_lookup and
      rt6_lookup. At the moment it returns either a reference to a FIB entry
      or a cached exception. To move FIB entries to a separate struct, this
      lookup function needs to convert FIB entries to an rt6_info that is
      returned to the caller.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dec9b0e2
    • David Ahern's avatar
      net/ipv6: Add fib6_null_entry · 421842ed
      David Ahern authored
      ip6_null_entry will stay a dst based return for lookups that fail to
      match an entry.
      
      Add a new fib6_null_entry which constitutes the root node and leafs
      for fibs. Replace existing references to ip6_null_entry with the
      new fib6_null_entry when dealing with FIBs.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      421842ed