1. 16 Jul, 2018 24 commits
  2. 15 Jul, 2018 1 commit
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 2aa4a337
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-07-15
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Various different arm32 JIT improvements in order to optimize code emission
         and make the JIT code itself more robust, from Russell.
      
      2) Support simultaneous driver and offloaded XDP in order to allow for advanced
         use-cases where some work is offloaded to the NIC and some to the host. Also
         add ability for bpftool to load programs and maps beyond just the cgroup case,
         from Jakub.
      
      3) Add BPF JIT support in nfp for multiplication as well as division. For the
         latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong.
      
      4) Add BTF pretty print functionality to bpftool in plain and JSON output
         format, from Okash.
      
      5) Add build and installation to the BPF helper man page into bpftool, from Quentin.
      
      6) Add a TCP BPF callback for listening sockets which is triggered right after
         the socket transitions to TCP_LISTEN state, from Andrey.
      
      7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup
         tree and prints all attached programs, from Roman.
      
      8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged
         packets, from Jesper.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aa4a337
  3. 14 Jul, 2018 15 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-tcp-listen-cb' · 13f7432b
      Daniel Borkmann authored
      Andrey Ignatov says:
      
      ====================
      This patchset adds TCP-BPF callback for listening sockets.
      
      Patch 0001 provides more details and is the main patch in the set.
      
      Patch 0006 adds selftest for the new callback.
      
      Other patches are bug fixes and improvements in TCP-BPF selftest
      to make it easier to extend in 0006.
      ====================
      Acked-by: default avatarLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      13f7432b
    • Andrey Ignatov's avatar
      selftests/bpf: Test case for BPF_SOCK_OPS_TCP_LISTEN_CB · 78d8e26d
      Andrey Ignatov authored
      Cover new TCP-BPF callback in test_tcpbpf: when listen() is called on
      socket, set BPF_SOCK_OPS_STATE_CB_FLAG so that BPF_SOCK_OPS_STATE_CB
      callback can be called on future state transition, and when such a
      transition happens (TCP_LISTEN -> TCP_CLOSE), track it in the map and
      verify it in user space later.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      78d8e26d
    • Andrey Ignatov's avatar
      selftests/bpf: Better verification in test_tcpbpf · 2044e4ef
      Andrey Ignatov authored
      Reduce amount of copy/paste for debug info when result is verified in
      the test and keep that info together with values being checked so that
      they won't get out of sync.
      
      It also improves debug experience: instead of checking manually what
      doesn't match in debug output for all fields, only unexpected field is
      printed.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2044e4ef
    • Andrey Ignatov's avatar
      selftests/bpf: Switch test_tcpbpf_user to cgroup_helpers · c65267e5
      Andrey Ignatov authored
      Switch to cgroup_helpers to simplify the code and fix cgroup cleanup:
      before cgroup was not cleaned up after the test.
      
      It also removes SYSTEM macro, that only printed error, but didn't
      terminate the test.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      c65267e5
    • Andrey Ignatov's avatar
      selftests/bpf: Fix const'ness in cgroup_helpers · 04c13411
      Andrey Ignatov authored
      Lack of const in cgroup helpers signatures forces to write ugly client
      code. Fix it.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      04c13411
    • Andrey Ignatov's avatar
      bpf: Sync bpf.h to tools/ · 060a7fcc
      Andrey Ignatov authored
      Sync BPF_SOCK_OPS_TCP_LISTEN_CB related UAPI changes to tools/.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      060a7fcc
    • Andrey Ignatov's avatar
      bpf: Add BPF_SOCK_OPS_TCP_LISTEN_CB · f333ee0c
      Andrey Ignatov authored
      Add new TCP-BPF callback that is called on listen(2) right after socket
      transition to TCP_LISTEN state.
      
      It fills the gap for listening sockets in TCP-BPF. For example BPF
      program can set BPF_SOCK_OPS_STATE_CB_FLAG when socket becomes listening
      and track later transition from TCP_LISTEN to TCP_CLOSE with
      BPF_SOCK_OPS_STATE_CB callback.
      
      Before there was no way to do it with TCP-BPF and other options were
      much harder to work with. E.g. socket state tracking can be done with
      tracepoints (either raw or regular) but they can't be attached to cgroup
      and their lifetime has to be managed separately.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f333ee0c
    • David S. Miller's avatar
      Merge branch 'mlxsw-VRRP' · f5c64e56
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Add VRRP support
      
      When a router that is acting as the default gateway of a host stops
      functioning, the host will encounter packet loss until the router starts
      functioning again.
      
      To increase the reliability of the default gateway without performing
      reconfiguration on the host, a host can use a Virtual Router Redundancy
      Protocol (VRRP) Router. This virtual router is composed from several
      routers where only one is actually forwarding packets from the host (the
      master router) while the other routers act as backup routers. The
      election of the master router is determined by the VRRP protocol [1].
      
      Packets addressed to the virtual router are always sent to the virtual
      router MAC address (IPv4: 00-00-5E-00-01-XX, IPv6: 00-00-5E-00-02-XX).
      Such packets can only be accepted by the master router and must be
      discarded by the backup routers.
      
      In Linux, VRRP is usually implemented by configuring a macvlan with the
      virtual router MAC on top of the router interface that is connected to
      the host / LAN. The macvlan on the master router is assigned the virtual
      IP (VIP) that the host uses as its gateway.
      
      In order to support VRRP in mlxsw, we first need to enable macvlan upper
      devices on top of mlxsw netdevs and their uppers. This is done by the
      first patch, which also takes care of sanitizing macvlan configurations
      that are not currently supported by the driver.
      
      The second patch directs packets with destination MAC addresses as the
      macvlans to the router so that they will undergo an L3 lookup. This is
      consistent with the kernel's behavior where the macvlan's Rx handler
      will re-inject such packets to the Rx path so that they will be picked
      up by the IPvX protocol handlers and undergo an L3 lookup. Note that the
      driver prevents the macvlans from being enslaved to other devices, to
      ensure the packets will be picked up by the protocol handler and not by
      another Rx handler.
      
      The third patch adds packet traps for VRRP control packets for both IPv4
      and IPv6. Finally, the last patch optimizes the reception of VRRP MACs
      by potentially skipping one L2 lookup for them.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5c64e56
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Optimize processing of VRRP MACs · c3a49540
      Ido Schimmel authored
      Hosts using a VRRP router send their packets with a destination MAC of
      the VRRP router which is of the following form [1]:
      
      IPv4 - 00-00-5E-00-01-{VRID}
      IPv6 - 00-00-5E-00-02-{VRID}
      
      Where VRID is the ID of the virtual router. Such packets are directed to
      the router block in the ASIC by an FDB entry that was added in the
      previous patch.
      
      However, in certain cases it is possible to skip this FDB lookup and
      send such packets directly to the router. This is accomplished by adding
      these special MAC addresses to the RIF cache. If the cache is hit, the
      packet will skip the L2 lookup and ingress the router with the RIF
      specified in the cache entry.
      
      1. https://tools.ietf.org/html/rfc5798#section-7.3Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3a49540
    • Ido Schimmel's avatar
      mlxsw: spectrum: Add VRRP traps · 11566d34
      Ido Schimmel authored
      Virtual Router Redundancy Protocol packets are used to communicate the
      state of the Master router associated with the virtual router ID (VRID).
      
      These are link-local multicast packets sent with IP protocol 112 that
      are trapped in the router block in the ASIC.
      
      Add a trap for these packets and mark the trapped packets to prevent
      them from potentially being re-flooded by the bridge driver.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11566d34
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Direct macvlans' MACs to router · 2db99378
      Ido Schimmel authored
      An IP packet received on a netdev with a macvlan upper whose MAC matches
      the packet's destination MAC will be re-injected to the Rx path as if it
      was received by the macvlan, and perform an L3 lookup.
      
      Reflect this functionality to the ASIC by programming FDB entries that
      will direct MACs of macvlan uppers to the router.
      
      In a similar fashion to router interfaces (RIFs) that are programmed
      upon the addition of the first IP address on an interface and destroyed
      upon the removal of the last IP address, the FDB entries for the macvlan
      are added and destroyed based on the addition of the first and removal
      of the last IP address on the macvlan.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2db99378
    • Ido Schimmel's avatar
      mlxsw: spectrum: Enable macvlan upper devices · c5516185
      Ido Schimmel authored
      In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to
      be directed to the router we need to enable macvlan uppers on top of
      mlxsw netdevs.
      
      Allow macvlan upper devices on top of mlxsw netdevs and sanitize
      configurations that can't work. For example, a macvlan can't be enslaved
      to a bridge as without ACLs the device doesn't take the destination MAC
      into account when classifying a packet to a bridge instance (i.e., a
      FID).
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5516185
    • Yafang Shao's avatar
      tcp: remove redundant rcv_nxt update · ff0432e5
      Yafang Shao authored
      tcp_rcv_nxt_update() is already executed in tcp_data_queue().
      This line is redundant.
      
      See bellow,
      	tcp_queue_rcv
      		tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
      	tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq); <<<< redundant
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff0432e5
    • Okash Khawaja's avatar
      bpf: btf: print map dump and lookup with btf info · 2d3feca8
      Okash Khawaja authored
      This patch augments the output of bpftool's map dump and map lookup
      commands to print data along side btf info, if the correspondin btf
      info is available. The outputs for each of  map dump and map lookup
      commands are augmented in two ways:
      
      1. when neither of -j and -p are supplied, btf-ful map data is printed
      whose aim is human readability. This means no commitments for json- or
      backward- compatibility.
      
      2. when either -j or -p are supplied, a new json object named
      "formatted" is added for each key-value pair. This object contains the
      same data as the key-value pair, but with btf info. "formatted" object
      promises json- and backward- compatibility. Below is a sample output.
      
      $ bpftool map dump -p id 8
      [{
              "key": ["0x0f","0x00","0x00","0x00"
              ],
              "value": ["0x03", "0x00", "0x00", "0x00", ...
              ],
              "formatted": {
                      "key": 15,
                      "value": {
                              "int_field":  3,
                              ...
                      }
              }
      }
      ]
      
      This patch calls btf_dumper introduced in previous patch to accomplish
      the above. Indeed, btf-ful info is only displayed if btf data for the
      given map is available. Otherwise existing output is displayed as-is.
      Signed-off-by: default avatarOkash Khawaja <osk@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2d3feca8
    • Okash Khawaja's avatar
      bpf: btf: add btf print functionality · b12d6ec0
      Okash Khawaja authored
      This consumes functionality exported in the previous patch. It does the
      main job of printing with BTF data. This is used in the following patch
      to provide a more readable output of a map's dump. It relies on
      json_writer to do json printing. Below is sample output where map keys
      are ints and values are of type struct A:
      
      typedef int int_type;
      enum E {
              E0,
              E1,
      };
      
      struct B {
              int x;
              int y;
      };
      
      struct A {
              int m;
              unsigned long long n;
              char o;
              int p[8];
              int q[4][8];
              enum E r;
              void *s;
              struct B t;
              const int u;
              int_type v;
              unsigned int w1: 3;
              unsigned int w2: 3;
      };
      
      $ sudo bpftool map dump id 14
      [{
              "key": 0,
              "value": {
                  "m": 1,
                  "n": 2,
                  "o": "c",
                  "p": [15,16,17,18,15,16,17,18
                  ],
                  "q": [[25,26,27,28,25,26,27,28
                      ],[35,36,37,38,35,36,37,38
                      ],[45,46,47,48,45,46,47,48
                      ],[55,56,57,58,55,56,57,58
                      ]
                  ],
                  "r": 1,
                  "s": 0x7ffd80531cf8,
                  "t": {
                      "x": 5,
                      "y": 10
                  },
                  "u": 100,
                  "v": 20,
                  "w1": 0x7,
                  "w2": 0x3
              }
          }
      ]
      
      This patch uses json's {} and [] to imply struct/union and array. More
      explicit information can be added later. For example, a command line
      option can be introduced to print whether a key or value is struct
      or union, name of a struct etc. This will however come at the expense
      of duplicating info when, for example, printing an array of structs.
      enums are printed as ints without their names.
      Signed-off-by: default avatarOkash Khawaja <osk@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      b12d6ec0