1. 03 Jun, 2019 26 commits
  2. 02 Jun, 2019 4 commits
  3. 01 Jun, 2019 5 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · c1e9e01d
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset container Netfilter/IPVS update for net-next:
      
      1) Add UDP tunnel support for ICMP errors in IPVS.
      
      Julian Anastasov says:
      
      This patchset is a followup to the commit that adds UDP/GUE tunnel:
      "ipvs: allow tunneling with gue encapsulation".
      
      What we do is to put tunnel real servers in hash table (patch 1),
      add function to lookup tunnels (patch 2) and use it to strip the
      embedded tunnel headers from ICMP errors (patch 3).
      
      2) Extend xt_owner to match for supplementary groups, from
         Lukasz Pawelczyk.
      
      3) Remove unused oif field in flow_offload_tuple object, from
         Taehee Yoo.
      
      4) Release basechain counters from workqueue to skip synchronize_rcu()
         call. From Florian Westphal.
      
      5) Replace skb_make_writable() by skb_ensure_writable(). Patchset
         from Florian Westphal.
      
      6) Checksum support for gue encapsulation in IPVS, from Jacky Hu.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1e9e01d
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 0462eaac
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf-next 2019-05-31
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      Lots of exciting new features in the first PR of this developement cycle!
      The main changes are:
      
      1) misc verifier improvements, from Alexei.
      
      2) bpftool can now convert btf to valid C, from Andrii.
      
      3) verifier can insert explicit ZEXT insn when requested by 32-bit JITs.
         This feature greatly improves BPF speed on 32-bit architectures. From Jiong.
      
      4) cgroups will now auto-detach bpf programs. This fixes issue of thousands
         bpf programs got stuck in dying cgroups. From Roman.
      
      5) new bpf_send_signal() helper, from Yonghong.
      
      6) cgroup inet skb programs can signal CN to the stack, from Lawrence.
      
      7) miscellaneous cleanups, from many developers.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0462eaac
    • Alan Maguire's avatar
      selftests/bpf: measure RTT from xdp using xdping · cd538502
      Alan Maguire authored
      xdping allows us to get latency estimates from XDP.  Output looks
      like this:
      
      ./xdping -I eth4 192.168.55.8
      Setting up XDP for eth4, please wait...
      XDP setup disrupts network connectivity, hit Ctrl+C to quit
      
      Normal ping RTT data
      [Ignore final RTT; it is distorted by XDP using the reply]
      PING 192.168.55.8 (192.168.55.8) from 192.168.55.7 eth4: 56(84) bytes of data.
      64 bytes from 192.168.55.8: icmp_seq=1 ttl=64 time=0.302 ms
      64 bytes from 192.168.55.8: icmp_seq=2 ttl=64 time=0.208 ms
      64 bytes from 192.168.55.8: icmp_seq=3 ttl=64 time=0.163 ms
      64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.275 ms
      
      4 packets transmitted, 4 received, 0% packet loss, time 3079ms
      rtt min/avg/max/mdev = 0.163/0.237/0.302/0.054 ms
      
      XDP RTT data:
      64 bytes from 192.168.55.8: icmp_seq=5 ttl=64 time=0.02808 ms
      64 bytes from 192.168.55.8: icmp_seq=6 ttl=64 time=0.02804 ms
      64 bytes from 192.168.55.8: icmp_seq=7 ttl=64 time=0.02815 ms
      64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.02805 ms
      
      The xdping program loads the associated xdping_kern.o BPF program
      and attaches it to the specified interface.  If run in client
      mode (the default), it will add a map entry keyed by the
      target IP address; this map will store RTT measurements, current
      sequence number etc.  Finally in client mode the ping command
      is executed, and the xdping BPF program will use the last ICMP
      reply, reformulate it as an ICMP request with the next sequence
      number and XDP_TX it.  After the reply to that request is received
      we can measure RTT and repeat until the desired number of
      measurements is made.  This is why the sequence numbers in the
      normal ping are 1, 2, 3 and 8.  We XDP_TX a modified version
      of ICMP reply 4 and keep doing this until we get the 4 replies
      we need; hence the networking stack only sees reply 8, where
      we have XDP_PASSed it upstream since we are done.
      
      In server mode (-s), xdping simply takes ICMP requests and replies
      to them in XDP rather than passing the request up to the networking
      stack.  No map entry is required.
      
      xdping can be run in native XDP mode (the default, or specified
      via -N) or in skb mode (-S).
      
      A test program test_xdping.sh exercises some of these options.
      
      Note that native XDP does not seem to XDP_TX for veths, hence -N
      is not tested.  Looking at the code, it looks like XDP_TX is
      supported so I'm not sure if that's expected.  Running xdping in
      native mode for ixgbe as both client and server works fine.
      
      Changes since v4
      
      - close fds on cleanup (Song Liu)
      
      Changes since v3
      
      - fixed seq to be __be16 (Song Liu)
      - fixed fd checks in xdping.c (Song Liu)
      
      Changes since v2
      
      - updated commit message to explain why seq number of last
        ICMP reply is 8 not 4 (Song Liu)
      - updated types of seq number, raddr and eliminated csum variable
        in xdpclient/xdpserver functions as it was not needed (Song Liu)
      - added XDPING_DEFAULT_COUNT definition and usage specification of
        default/max counts (Song Liu)
      
      Changes since v1
       - moved from RFC to PATCH
       - removed unused variable in ipv4_csum() (Song Liu)
       - refactored ICMP checks into icmp_check() function called by client
         and server programs and reworked client and server programs due
         to lack of shared code (Song Liu)
       - added checks to ensure that SKB and native mode are not requested
         together (Song Liu)
      Signed-off-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      cd538502
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 33aae282
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2019-05-31
      
      This series contains updates to the iavf driver.
      
      Nathan Chancellor converts the use of gnu_printf to printf.
      
      Aleksandr modifies the driver to limit the number of RSS queues to the
      number of online CPUs in order to avoid creating misconfigured RSS
      queues.
      
      Gustavo A. R. Silva converts a couple of instances where sizeof() can be
      replaced with struct_size().
      
      Alice makes the remaining changes to the iavf driver to cleanup all the
      old "i40evf" references in the driver to iavf, including the file names
      that still contained the old driver reference.  There was no functional
      changes made, just cosmetic to reduce any confusion going forward now
      that the iavf driver is the virtual function driver for both i40e and
      ice drivers.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33aae282
    • Jiong Wang's avatar
      bpf: doc: update answer for 32-bit subregister question · c231c22a
      Jiong Wang authored
      There has been quite a few progress around the two steps mentioned in the
      answer to the following question:
      
        Q: BPF 32-bit subregister requirements
      
      This patch updates the answer to reflect what has been done.
      
      v2:
       - Add missing full stop. (Song Liu)
       - Minor tweak on one sentence. (Song Liu)
      
      v1:
       - Integrated rephrase from Quentin and Jakub
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c231c22a
  4. 31 May, 2019 5 commits
    • Alexei Starovoitov's avatar
      Merge branch 'map-charge-cleanup' · d168286d
      Alexei Starovoitov authored
      Roman Gushchin says:
      
      ====================
      During my work on memcg-based memory accounting for bpf maps
      I've done some cleanups and refactorings of the existing
      memlock rlimit-based code. It makes it more robust, unifies
      size to pages conversion, size checks and corresponding error
      codes. Also it adds coverage for cgroup local storage and
      socket local storage maps.
      
      It looks like some preliminary work on the mm side might be
      required to start working on the memcg-based accounting,
      so I'm sending these patches as a separate patchset.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d168286d
    • Roman Gushchin's avatar
      bpf: move memory size checks to bpf_map_charge_init() · c85d6913
      Roman Gushchin authored
      Most bpf map types doing similar checks and bytes to pages
      conversion during memory allocation and charging.
      
      Let's unify these checks by moving them into bpf_map_charge_init().
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c85d6913
    • Roman Gushchin's avatar
      bpf: rework memlock-based memory accounting for maps · b936ca64
      Roman Gushchin authored
      In order to unify the existing memlock charging code with the
      memcg-based memory accounting, which will be added later, let's
      rework the current scheme.
      
      Currently the following design is used:
        1) .alloc() callback optionally checks if the allocation will likely
           succeed using bpf_map_precharge_memlock()
        2) .alloc() performs actual allocations
        3) .alloc() callback calculates map cost and sets map.memory.pages
        4) map_create() calls bpf_map_init_memlock() which sets map.memory.user
           and performs actual charging; in case of failure the map is
           destroyed
        <map is in use>
        1) bpf_map_free_deferred() calls bpf_map_release_memlock(), which
           performs uncharge and releases the user
        2) .map_free() callback releases the memory
      
      The scheme can be simplified and made more robust:
        1) .alloc() calculates map cost and calls bpf_map_charge_init()
        2) bpf_map_charge_init() sets map.memory.user and performs actual
          charge
        3) .alloc() performs actual allocations
        <map is in use>
        1) .map_free() callback releases the memory
        2) bpf_map_charge_finish() performs uncharge and releases the user
      
      The new scheme also allows to reuse bpf_map_charge_init()/finish()
      functions for memcg-based accounting. Because charges are performed
      before actual allocations and uncharges after freeing the memory,
      no bogus memory pressure can be created.
      
      In cases when the map structure is not available (e.g. it's not
      created yet, or is already destroyed), on-stack bpf_map_memory
      structure is used. The charge can be transferred with the
      bpf_map_charge_move() function.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b936ca64
    • Roman Gushchin's avatar
      bpf: group memory related fields in struct bpf_map_memory · 3539b96e
      Roman Gushchin authored
      Group "user" and "pages" fields of bpf_map into the bpf_map_memory
      structure. Later it can be extended with "memcg" and other related
      information.
      
      The main reason for a such change (beside cosmetics) is to pass
      bpf_map_memory structure to charging functions before the actual
      allocation of bpf_map.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3539b96e
    • Roman Gushchin's avatar
      bpf: add memlock precharge for socket local storage · d50836cd
      Roman Gushchin authored
      Socket local storage maps lack the memlock precharge check,
      which is performed before the memory allocation for
      most other bpf map types.
      
      Let's add it in order to unify all map types.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d50836cd