1. 16 Jul, 2018 16 commits
    • Boris Pismenny's avatar
      tls: Fix zerocopy_from_iter iov handling · 47187998
      Boris Pismenny authored
      zerocopy_from_iter iterates over the message, but it doesn't revert the
      updates made by the iov iteration. This patch fixes it. Now, the iov can
      be used after calling zerocopy_from_iter.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47187998
    • Boris Pismenny's avatar
      tls: Add rx inline crypto offload · 4799ac81
      Boris Pismenny authored
      This patch completes the generic infrastructure to offload TLS crypto to a
      network device. It enables the kernel to skip decryption and
      authentication of some skbs marked as decrypted by the NIC. In the fast
      path, all packets received are decrypted by the NIC and the performance
      is comparable to plain TCP.
      
      This infrastructure doesn't require a TCP offload engine. Instead, the
      NIC only decrypts packets that contain the expected TCP sequence number.
      Out-Of-Order TCP packets are provided unmodified. As a result, at the
      worst case a received TLS record consists of both plaintext and ciphertext
      packets. These partially decrypted records must be reencrypted,
      only to be decrypted.
      
      The notable differences between SW KTLS Rx and this offload are as
      follows:
      1. Partial decryption - Software must handle the case of a TLS record
      that was only partially decrypted by HW. This can happen due to packet
      reordering.
      2. Resynchronization - tls_read_size calls the device driver to
      resynchronize HW after HW lost track of TLS record framing in
      the TCP stream.
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4799ac81
    • Boris Pismenny's avatar
      tls: Fill software context without allocation · b190a587
      Boris Pismenny authored
      This patch allows tls_set_sw_offload to fill the context in case it was
      already allocated previously.
      
      We will use it in TLS_DEVICE to fill the RX software context.
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b190a587
    • Boris Pismenny's avatar
      tls: Split tls_sw_release_resources_rx · 39f56e1a
      Boris Pismenny authored
      This patch splits tls_sw_release_resources_rx into two functions one
      which releases all inner software tls structures and another that also
      frees the containing structure.
      
      In TLS_DEVICE we will need to release the software structures without
      freeeing the containing structure, which contains other information.
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39f56e1a
    • Boris Pismenny's avatar
      tls: Split decrypt_skb to two functions · dafb67f3
      Boris Pismenny authored
      Previously, decrypt_skb also updated the TLS context.
      Now, decrypt_skb only decrypts the payload using the current context,
      while decrypt_skb_update also updates the state.
      
      Later, in the tls_device Rx flow, we will use decrypt_skb directly.
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dafb67f3
    • Boris Pismenny's avatar
      tls: Refactor tls_offload variable names · d80a1b9d
      Boris Pismenny authored
      For symmetry, we rename tls_offload_context to
      tls_offload_context_tx before we add tls_offload_context_rx.
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d80a1b9d
    • Boris Pismenny's avatar
      tcp: Don't coalesce decrypted and encrypted SKBs · 41ed9c04
      Boris Pismenny authored
      Prevent coalescing of decrypted and encrypted SKBs in GRO
      and TCP layer.
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41ed9c04
    • Boris Pismenny's avatar
      net: Add TLS rx resync NDO · 16e4edc2
      Boris Pismenny authored
      Add new netdev tls op for resynchronizing HW tls context
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      16e4edc2
    • Ilya Lesokhin's avatar
      net: Add TLS RX offload feature · 14136564
      Ilya Lesokhin authored
      This patch adds a netdev feature to configure TLS RX inline crypto offload.
      Signed-off-by: default avatarIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14136564
    • Boris Pismenny's avatar
      net: Add decrypted field to skb · 784abe24
      Boris Pismenny authored
      The decrypted bit is propogated to cloned/copied skbs.
      This will be used later by the inline crypto receive side offload
      of tls.
      Signed-off-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarIlya Lesokhin <ilyal@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      784abe24
    • David S. Miller's avatar
      Merge branch 'mvpp2-add-debugfs-interface' · cc98419a
      David S. Miller authored
      Maxime Chevallier says:
      
      ====================
      net: mvpp2: add debugfs interface
      
      The PPv2 Header Parser and Classifier are not straightforward to debug,
      having easy access to some of the many lookup tables configuration is
      helpful during development and debug.
      
      This series adds a basic debugfs interface, allowing to read data from
      the Header Parser and some of the Classifier tables.
      
      For now, the interface is read-only, and contains only some basic info.
      
      This was actually used during RSS development, and might be useful to
      troubleshoot some issues we might find.
      
      The first patch of the series converts the mvpp2 files to SPDX, which
      eases adding the new debugfs dedicated file.
      
      The second patch adds the interface, and exposes basic Header Parser data.
      
      The 3rd patch adds a hit counter for the Header Parser TCAM.
      
      The 4th patch exposes classifier info.
      
      The 5th patch adds some hit counters for some of the classifier engines.
      
      Changes since V1:
      - Rebased on the lastest net-next
      - Made cls_flow_get non static so that it can be used in mvpp2_debugfs
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc98419a
    • Maxime Chevallier's avatar
      net: mvpp2: debugfs: add classifier hit counters · f9d30d5b
      Maxime Chevallier authored
      The classification operations that are used for RSS make use of several
      lookup tables. Having hit counters for these tables is really helpful
      to determine what flows were matched by ingress traffic, and see the
      path of packets among all the classifier tables.
      
      This commit adds hit counters for the 3 tables used at the moment :
      
       - The decoding table (also called lookup_id table), that links flows
         identified by the Header Parser to the flow table.
      
         There's one entry per flow, located at :
         .../mvpp2/<controller>/flows/XX/dec_hits
      
         Note that there are 21 flows in the decoding table, whereas there are
         52 flows in the Header Parser. That's because there are several kind
         of traffic that will match a given flow. Reading the hit counter from
         one sub-flow will clear all hit counter that have the same flow_id.
      
         This also applies to the flow_hits.
      
       - The flow table, that contains all the different lookups to be
         performed by the classifier for each packet of a given flow. The match
         is done on the first entry of the flow sequence.
      
       - The C2 engine entries, that are used to assign the default rx queue,
         and enable or disable RSS for a given port.
      
         There's one entry per flow, located at:
         .../mvpp2/<controller>/flows/XX/flow_hits
      
         There is one C2 entry per port, so the c2 hit counter is located at :
         .../mvpp2/<controller>/ethX/c2_hits
      
      All hit counter values are 16-bits clear-on-read values.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9d30d5b
    • Maxime Chevallier's avatar
      net: mvpp2: debugfs: add entries for classifier flows · dba1d918
      Maxime Chevallier authored
      The classifier configuration for RSS is quite complex, with several
      lookup tables being used. This commit adds useful info in debugfs to
      see how the different tables are configured :
      
      Added 2 new entries in the per-port directory :
      
        - .../eth0/default_rxq : The default rx queue on that port
        - .../eth0/rss_enable : Indicates if RSS is enabled in the C2 entry
      
      Added the 'flows' directory :
      
        It contains one entry per sub-flow. a 'sub-flow' is a unique path from
        Header Parser to the flow table. Multiple sub-flows can point to the
        same 'flow' (each flow has an id from 8 to 29, which is its index in the
        Lookup Id table) :
      
        - .../flows/00/...
                   /01/...
                   ...
                   /51/id : The flow id. There are 21 unique flows. There's one
                             flow per combination of the following parameters :
                             - L4 protocol (TCP, UDP, none)
                             - L3 protocol (IPv4, IPv6)
                             - L3 parameters (Fragmented or not)
                             - L2 parameters (Vlan tag presence or not)
                    .../type : The flow type. This is an even higher level flow,
                               that we manipulate with ethtool. It can be :
                               "udp4" "tcp4" "udp6" "tcp6" "ipv4" "ipv6" "other".
                    .../eth0/...
                    .../eth1/engine : The hash generation engine used for this
      	                        flow on the given port
                        .../hash_opts : The hash generation options indicating on
                                        what data we base the hash (vlan tag, src
                                        IP, src port, etc.)
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba1d918
    • Maxime Chevallier's avatar
      net: mvpp2: debugfs: add hit counter stats for Header Parser entries · 1203341c
      Maxime Chevallier authored
      One helpful feature to help debug the Header Parser TCAM filter in PPv2
      is to be able to see if the entries did match something when a packet
      comes in. This can be done by using the built-in hit counter for TCAM
      entries.
      
      This commit implements reading the counter, and exposing its value on
      debugfs for each filter entry.
      
      The counter is a 16-bits clear-on-read value, located at:
       .../mvpp2/<controller>/parser/XXX/hits
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1203341c
    • Maxime Chevallier's avatar
      net: mvpp2: add a debugfs interface for the Header Parser · 21da57a2
      Maxime Chevallier authored
      Marvell PPv2 Packer Header Parser has a TCAM based filter, that is not
      trivial to configure and debug. Being able to dump TCAM entries from
      userspace can be really helpful to help development of new features
      and debug existing ones.
      
      This commit adds a basic debugfs interface for the PPv2 driver, focusing
      on TCAM related features.
      
      <mnt>/mvpp2/ --- f2000000.ethernet
                    \- f4000000.ethernet --- parser --- 000 ...
                                          |          \- 001
                                          |          \- ...
                                          |          \- 255 --- ai
                                          |                  \- header_data
                                          |                  \- lookup_id
                                          |                  \- sram
                                          |                  \- valid
                                          \- eth1 ...
                                          \- eth2 --- mac_filter
                                                   \- parser_entries
                                                   \- vid_filter
      
      There's one directory per PPv2 instance, named after pdev->name to make
      sure names are uniques. In each of these directories, there's :
      
       - one directory per interface on the controller, each containing :
      
         - "mac_filter", which lists all filtered addresses for this port
           (based on TCAM, not on the kernel's uc / mc lists)
      
         - "parser_entries", which lists the indices of all valid TCAM
            entries that have this port in their port map
      
         - "vid_filter", which lists the vids allowed on this port, based on
           TCAM
      
       - one "parser" directory (the parser is common to all ports), containing :
      
         - one directory per TCAM entry (256 of them, from 0 to 255), each
           containing :
      
           - "ai" : Contains the 1 byte Additional Info field from TCAM, and
      
           - "header_data" : Contains the 8 bytes Header Data extracted from
             the packet
      
           - "lookup_id" : Contains the 4 bits LU_ID
      
           - "sram" : contains the raw SRAM data, which is the result of the TCAM
      		lookup. This readonly at the moment.
      
           - "valid" : Indicates if the entry is valid of not.
      
      All entries are read-only, and everything is output in hex form.
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21da57a2
    • Antoine Tenart's avatar
      net: mvpp2: switch to SPDX identifiers · f1e37e31
      Antoine Tenart authored
      Use the appropriate SPDX license identifiers and drop the license text.
      This patch is only cosmetic.
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1e37e31
  2. 15 Jul, 2018 1 commit
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 2aa4a337
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-07-15
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Various different arm32 JIT improvements in order to optimize code emission
         and make the JIT code itself more robust, from Russell.
      
      2) Support simultaneous driver and offloaded XDP in order to allow for advanced
         use-cases where some work is offloaded to the NIC and some to the host. Also
         add ability for bpftool to load programs and maps beyond just the cgroup case,
         from Jakub.
      
      3) Add BPF JIT support in nfp for multiplication as well as division. For the
         latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong.
      
      4) Add BTF pretty print functionality to bpftool in plain and JSON output
         format, from Okash.
      
      5) Add build and installation to the BPF helper man page into bpftool, from Quentin.
      
      6) Add a TCP BPF callback for listening sockets which is triggered right after
         the socket transitions to TCP_LISTEN state, from Andrey.
      
      7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup
         tree and prints all attached programs, from Roman.
      
      8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged
         packets, from Jesper.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aa4a337
  3. 14 Jul, 2018 20 commits
  4. 13 Jul, 2018 3 commits
    • Jesper Dangaard Brouer's avatar
      samples/bpf: xdp_redirect_cpu handle parsing of double VLAN tagged packets · d23b27c0
      Jesper Dangaard Brouer authored
      People noticed that the code match on IEEE 802.1ad (ETH_P_8021AD) ethertype,
      and this implies Q-in-Q or double tagged VLANs.  Thus, we better parse
      the next VLAN header too.  It is even marked as a TODO.
      
      This is relevant for real world use-cases, as XDP cpumap redirect can be
      used when the NIC RSS hashing is broken.  E.g. the ixgbe driver HW cannot
      handle double tagged VLAN packets, and places everything into a single
      RX queue.  Using cpumap redirect, users can redistribute traffic across
      CPUs to solve this, which is faster than the network stacks RPS solution.
      
      It is left as an exerise how to distribute the packets across CPUs.  It
      would be convenient to use the RX hash, but that is not _yet_ exposed
      to XDP programs. For now, users can code their own hash, as I've demonstrated
      in the Suricata code (where Q-in-Q is handled correctly).
      Reported-by: default avatarFlorian Maury <florian.maury-cv@x-cli.eu>
      Reported-by: default avatarMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      d23b27c0
    • Nikolay Aleksandrov's avatar
      net: ipmr: add support for passing full packet on wrong vif · c921c207
      Nikolay Aleksandrov authored
      This patch adds support for IGMPMSG_WRVIFWHOLE which is used to pass
      full packet and real vif id when the incoming interface is wrong.
      While the RP and FHR are setting up state we need to be sending the
      registers encapsulated with all the data inside otherwise we lose it.
      The RP then decapsulates it and forwards it to the interested parties.
      Currently with WRONGVIF we can only be sending empty register packets
      and will lose that data.
      This behaviour can be enabled by using MRT_PIM with
      val == IGMPMSG_WRVIFWHOLE. This doesn't prevent IGMPMSG_WRONGVIF from
      happening, it happens in addition to it, also it is controlled by the same
      throttling parameters as WRONGVIF (i.e. 1 packet per 3 seconds currently).
      Both messages are generated to keep backwards compatibily and avoid
      breaking someone who was enabling MRT_PIM with val == 4, since any
      positive val is accepted and treated the same.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c921c207
    • Daniel Borkmann's avatar
      Merge branch 'bpf-xdp-driver-and-hw' · ee15f7cd
      Daniel Borkmann authored
      Jakub Kicinski says:
      
      ====================
      This set is adding support for loading driver and offload XDP
      at the same time.  This enables advanced use cases where some
      of the work is offloaded to the NIC and some is done by the host.
      Separate netlink attributes are added for each mode of operation.
      Driver callbacks for offload are cleaned up a little, including
      removal of .prog_attached flag.
      ====================
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      ee15f7cd