1. 19 Apr, 2018 36 commits
  2. 18 Apr, 2018 4 commits
    • David S. Miller's avatar
      Merge branch 'ipv6-Separate-data-structures-for-FIB-and-data-path' · 0565de29
      David S. Miller authored
      David Ahern says:
      
      ====================
      net/ipv6: Separate data structures for FIB and data path
      
      IPv6 uses the same data struct for both control plane (FIB entries) and
      data path (dst entries). This struct has elements needed for both paths
      adding memory overhead and complexity (taking a dst hold in most places
      but an additional reference on rt6i_ref in a few). Furthermore, because
      of the dst_alloc tie, all FIB entries are allocated with GFP_ATOMIC.
      
      This patch set separates FIB entries from dst entries, better aligning
      IPv6 code with IPv4, simplifying the reference counting and allowing
      FIB entries added by userspace (not autoconf) to use GFP_KERNEL. It is
      first step to a number of performance and scalability changes.
      
      The end result of this patch set:
        - FIB entries (fib6_info):
              /* size: 208, cachelines: 4, members: 25 */
              /* sum members: 207, holes: 1, sum holes: 1 */
      
        - dst entries (rt6_info)
             /* size: 240, cachelines: 4, members: 11 */
      
      Versus the the single rt6_info struct today for both paths:
            /* size: 320, cachelines: 5, members: 28 */
      
      This amounts to a 35% reduction in memory use for FIB entries and a
      25% reduction for dst entries.
      
      With respect to locking FIB entries use RCU and a single atomic
      counter with fib6_info_hold and fib6_info_release helpers to manage
      the reference counting. dst entries use only the traditional dst
      refcounts with dst_hold and dst_release.
      
      FIB entries for host routes are referenced by inet6_ifaddr and
      ifacaddr6. In both cases, additional holds are taken -- similar to
      what is done for devices.
      
      This set is the first of many changes to improve the scalability of the
      IPv6 code. Follow on changes include:
      - consolidating duplicate fib6_info references like IPv4 does with
        duplicate fib_info
      
      - moving fib6_info into a slab cache to avoid allocation roundups to
        power of 2 (the 208 size becomes a 256 actual allocation)
      
      - Allow FIB lookups without generating a dst (e.g., most rt6_lookup
        users just want to verify the egress device). Means moving dst
        allocation to the other side of fib6_rule_lookup which again aligns
        with IPv4 behavior
      
      - using separate standalone nexthop objects which have performance
        benefits beyond fib_info consolidation
      
      At this point I am not seeing any refcount leaks or underflows, no
      oops or bug_ons, or warnings from kasan, so I think it is ready for
      others to beat up on it finding errors in code paths I have missed.
      
      v2 changes
      - rebased to top of tree
      - improved commit message on patch 7
      
      v1 changes
      - rebased to top of tree
      - fix memory leak of metrics as noted by Ido
      - MTU fixes based on pmtu tests (thanks Stefano Brivio for writing)
      
      RFC v2 changes
      - improved commit messages
      - move common metrics code from dst.c to net/ipv4/metrics.c (comment
        from DaveM)
      - address comments from Wei Wang and Martin KaFai Lau (let me know if
        I missed something)
      - fixes detected by kernel test robots
        + added fib6_metric_set to change metric on a FIB entry which could
          be pointing to read-only dst_default_metrics
        + 0day testing found a problem with an intermediate patch; added
          dst_hold_safe on rt->from. Code is removed 3 patches later
      - allow cacheinfo to handle NULL dst; means only expires is pushed to
        userspace
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0565de29
    • David Ahern's avatar
      net/ipv6: Remove unused code and variables for rt6_info · 77634cc6
      David Ahern authored
      Drop unneeded elements from rt6_info struct and rearrange layout to
      something more relevant for the data path.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77634cc6
    • David Ahern's avatar
      net/ipv6: Flip FIB entries to fib6_info · 8d1c802b
      David Ahern authored
      Convert all code paths referencing a FIB entry from
      rt6_info to fib6_info.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d1c802b
    • David Ahern's avatar
      net/ipv6: separate handling of FIB entries from dst based routes · 93531c67
      David Ahern authored
      Last step before flipping the data type for FIB entries:
      - use fib6_info_alloc to create FIB entries in ip6_route_info_create
        and addrconf_dst_alloc
      - use fib6_info_release in place of dst_release, ip6_rt_put and
        rt6_release
      - remove the dst_hold before calling __ip6_ins_rt or ip6_del_rt
      - when purging routes, drop per-cpu routes
      - replace inc and dec of rt6i_ref with fib6_info_hold and fib6_info_release
      - use rt->from since it points to the FIB entry
      - drop references to exception bucket, fib6_metrics and per-cpu from
        dst entries (those are relevant for fib entries only)
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93531c67