1. 27 Jul, 2018 24 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-Support-DSCP-prioritization-and-rewrite' · 2e279c93
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Support DSCP prioritization and rewrite
      
      Petr says:
      
      On ingress, a network device such as a switch assigns to packets
      priority based on various criteria. Common options include interpreting
      PCP and DSCP fields according to user configuration. When a packet
      egresses the switch, a reverse process may rewrite PCP and/or DSCP
      headers according to packet priority.
      
      So far, mlxsw has supported prioritization based on PCP (802.1p priority
      tag). This patch set introduces support for prioritization based on
      DSCP, and DSCP rewrite.
      
      To configure the DSCP-to-priority maps, the user is expected to invoke
      ieee_setapp and ieee_delapp DCBNL ops, e.g. by using lldptool:
      
      To decide whether or not to pay attention to DSCP values, the Spectrum
      switch recognize a per-port configuration of trust level. Until the
      first APP rule is added for a given port, this port's trust level stays
      at PCP, meaning that PCP is used for packet prioritization. With the
      first DSCP APP rule, the port is configured to trust DSCP instead, and
      it stays there until all DSCP APP rules are removed again.
      
      Besides the DSCP (value 5) selector, another selector that plays into
      packet prioritization is Ethernet type (value 1) with PID of 0. Such APP
      entries denote default priority[1]:
      
      With this patch set, mlxsw uses these values to configure priority for
      DSCP values not explicitly specified in DSCP APP map. In the future we
      expect to also use this to configure default port priority for untagged
      packets.
      
      Access to DSCP-to-priority map, priority-to-DSCP map, and default
      priority for a port is exposed through three new DCB helpers. Like the
      already-existing dcb_ieee_getapp_mask() helper, these helpers operate in
      terms of bitmaps, to support the arbitrary M:N mapping that the APP
      rules allow. Such interface presents all the relevant information from
      the APP database without necessitating exposition of iterators, locking
      or other complex primitives. It is up to the driver to then digest the
      mapping in a way that the device supports. In this patch set, mlxsw
      resolves conflicts by favoring higher-numbered DSCP values and
      priorities.
      
      In this patchset:
      
      - Patch #1 fixes a bug in DCB APP database management.
      - Patch #2 adds the getters described above.
      - Patches #3-#6 add Spectrum configuration registers.
      - Patch #7 adds the mlxsw logic that configures the device according to
        APP rules.
      - Patch #8 adds a self-test. The test is added to the subdirectory
        drivers/net/mlxsw. Even though it's not particularly specific to
        mlxsw, it's not suitable for running on soft devices (which don't
        support the ieee_getapp et.al.), and thus isn't a good fit for the
        general net/forwarding directory.
      
      [1] 802.1Q-2014, Table D-9
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e279c93
    • Petr Machata's avatar
      selftests: mlxsw: Add test for trust-DSCP · d159261f
      Petr Machata authored
      Add a test that exercises the new code. Send DSCP-tagged packets, and
      observe how they are prioritized in the switch and the DSCP is updated
      on egress again.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d159261f
    • Petr Machata's avatar
      mlxsw: spectrum: Support ieee_setapp, ieee_delapp · b2b1dab6
      Petr Machata authored
      The APP TLVs are used for communicating priority-to-protocol ID maps for
      a given netdevice. Support the following APP TLVs:
      
      - DSCP (selector 5) to configure priority-to-DSCP code point maps. Use
        these maps to configure packet priority on ingress, and DSCP code
        point rewrite on egress.
      
      - Default priority (selector 1, PID 0) to configure priority for the
        DSCP code points that don't have one assigned by the DSCP selector. In
        future this could also be used for assigning default port priority
        when a packet arrives without DSCP tagging.
      
      Besides setting up the maps themselves, also configure port trust level
      and rewrite bits.
      
      Port trust level determines whether, for a packet arriving through a
      certain port, the priority should be determined based on PCP or DSCP
      header fields. So far, mlxsw kept the device default of trust-PCP. Now,
      as soon as the first DSCP APP TLV is configured, switch to trust-DSCP.
      Only when all DSCP APP TLVs are removed, switch back to trust-PCP again.
      Note that the default priority APP TLV doesn't impact the trust level
      configuration.
      
      Rewrite bits determine whether DSCP and PCP fields of egressing packets
      should be updated according to switch priority. When port trust is
      switched to DSCP, enable rewrite of DSCP field.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2b1dab6
    • Petr Machata's avatar
      mlxsw: reg: Add QoS Priority to DSCP Mapping Register · 55fb71f4
      Petr Machata authored
      This register controls mapping from Priority to DSCP for purposes of
      rewrite. Note that rewrite happens as the packet is transmitted provided
      that the DSCP rewrite bit is enabled for the packet.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55fb71f4
    • Petr Machata's avatar
      mlxsw: reg: Add QoS ReWrite Enable Register · e67131d9
      Petr Machata authored
      This register configures the rewrite enable (whether PCP or DSCP value
      in packet should be updated according to packet priority) per receive
      port.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e67131d9
    • Petr Machata's avatar
      mlxsw: reg: Add QoS Priority Trust State Register · 746da42a
      Petr Machata authored
      The QPTS register controls the port policy to calculate the switch
      priority and packet color based on incoming packet fields.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      746da42a
    • Petr Machata's avatar
      mlxsw: reg: Add QoS Port DSCP to Priority Mapping Register · 02837d72
      Petr Machata authored
      The QPDPM register controls the mapping from DSCP field to Switch
      Priority for IP packets.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02837d72
    • Petr Machata's avatar
      net: dcb: Add priority-to-DSCP map getters · b67c540b
      Petr Machata authored
      On ingress, a network device such as a switch assigns to packets
      priority based on various criteria. Common options include interpreting
      PCP and DSCP fields according to user configuration. When a packet
      egresses the switch, a reverse process may rewrite PCP and/or DSCP
      values according to packet priority.
      
      The following three functions support a) obtaining a DSCP-to-priority
      map or vice versa, and b) finding default-priority entries in APP
      database.
      
      The DCB subsystem supports for APP entries a very generous M:N mapping
      between priorities and protocol identifiers. Understandably,
      several (say) DSCP values can map to the same priority. But this
      asymmetry holds the other way around as well--one priority can map to
      several DSCP values. For this reason, the following functions operate in
      terms of bitmaps, with ones in positions that match some APP entry.
      
      - dcb_ieee_getapp_dscp_prio_mask_map() to compute for a given netdevice
        a map of DSCP-to-priority-mask, which gives for each DSCP value a
        bitmap of priorities related to that DSCP value by APP, along the
        lines of dcb_ieee_getapp_mask().
      
      - dcb_ieee_getapp_prio_dscp_mask_map() similarly to compute for a given
        netdevice a map from priorities to a bitmap of DSCPs.
      
      - dcb_ieee_getapp_default_prio_mask() which finds all default-priority
        rules for a given port in APP database, and returns a mask of
        priorities allowed by these default-priority rules.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b67c540b
    • Petr Machata's avatar
      net: dcb: For wild-card lookups, use priority -1, not 0 · 08193d1a
      Petr Machata authored
      The function dcb_app_lookup walks the list of specified DCB APP entries,
      looking for one that matches a given criteria: ifindex, selector,
      protocol ID and optionally also priority. The "don't care" value for
      priority is set to 0, because that priority has not been allowed under
      CEE regime, which predates the IEEE standardization.
      
      Under IEEE, 0 is a valid priority number. But because dcb_app_lookup
      considers zero a wild card, attempts to add an APP entry with priority 0
      fail when other entries exist for a given ifindex / selector / PID
      triplet.
      
      Fix by changing the wild-card value to -1.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08193d1a
    • Jiri Pirko's avatar
      net: sched: don't dump chains only held by actions · 1f3ed383
      Jiri Pirko authored
      In case a chain is empty and not explicitly created by a user,
      such chain should not exist. The only exception is if there is
      an action "goto chain" pointing to it. In that case, don't show the
      chain in the dump. Track the chain references held by actions and
      use them to find out if a chain should or should not be shown
      in chain dump.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f3ed383
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 7a49d3d4
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net-next): ipsec-next 2018-07-27
      
      1) Extend the output_mark to also support the input direction
         and masking the mark values before applying to the skb.
      
      2) Add a new lookup key for the upcomming xfrm interfaces.
      
      3) Extend the xfrm lookups to match xfrm interface IDs.
      
      4) Add virtual xfrm interfaces. The purpose of these interfaces
         is to overcome the design limitations that the existing
         VTI devices have.
      
        The main limitations that we see with the current VTI are the
        following:
      
        VTI interfaces are L3 tunnels with configurable endpoints.
        For xfrm, the tunnel endpoint are already determined by the SA.
        So the VTI tunnel endpoints must be either the same as on the
        SA or wildcards. In case VTI tunnel endpoints are same as on
        the SA, we get a one to one correlation between the SA and
        the tunnel. So each SA needs its own tunnel interface.
      
        On the other hand, we can have only one VTI tunnel with
        wildcard src/dst tunnel endpoints in the system because the
        lookup is based on the tunnel endpoints. The existing tunnel
        lookup won't work with multiple tunnels with wildcard
        tunnel endpoints. Some usecases require more than on
        VTI tunnel of this type, for example if somebody has multiple
        namespaces and every namespace requires such a VTI.
      
        VTI needs separate interfaces for IPv4 and IPv6 tunnels.
        So when routing to a VTI, we have to know to which address
        family this traffic class is going to be encapsulated.
        This is a lmitation because it makes routing more complex
        and it is not always possible to know what happens behind the
        VTI, e.g. when the VTI is move to some namespace.
      
        VTI works just with tunnel mode SAs. We need generic interfaces
        that ensures transfomation, regardless of the xfrm mode and
        the encapsulated address family.
      
        VTI is configured with a combination GRE keys and xfrm marks.
        With this we have to deal with some extra cases in the generic
        tunnel lookup because the GRE keys on the VTI are actually
        not GRE keys, the GRE keys were just reused for something else.
        All extensions to the VTI interfaces would require to add
        even more complexity to the generic tunnel lookup.
      
        So to overcome this, we developed xfrm interfaces with the
        following design goal:
      
        It should be possible to tunnel IPv4 and IPv6 through the same
        interface.
      
        No limitation on xfrm mode (tunnel, transport and beet).
      
        Should be a generic virtual interface that ensures IPsec
        transformation, no need to know what happens behind the
        interface.
      
        Interfaces should be configured with a new key that must match a
        new policy/SA lookup key.
      
        The lookup logic should stay in the xfrm codebase, no need to
        change or extend generic routing and tunnel lookups.
      
        Should be possible to use IPsec hardware offloads of the underlying
        interface.
      
      5) Remove xfrm pcpu policy cache. This was added after the flowcache
         removal, but it turned out to make things even worse.
         From Florian Westphal.
      
      6) Allow to update the set mark on SA updates.
         From Nathan Harold.
      
      7) Convert some timestamps to time64_t.
         From Arnd Bergmann.
      
      8) Don't check the offload_handle in xfrm code,
         it is an opaque data cookie for the driver.
         From Shannon Nelson.
      
      9) Remove xfrmi interface ID from flowi. After this pach
         no generic code is touched anymore to do xfrm interface
         lookups. From Benedict Wong.
      
      10) Allow to update the xfrm interface ID on SA updates.
          From Nathan Harold.
      
      11) Don't pass zero to ERR_PTR() in xfrm_resolve_and_create_bundle.
          From YueHaibing.
      
      12) Return more detailed errors on xfrm interface creation.
          From Benedict Wong.
      
      13) Use PTR_ERR_OR_ZERO instead of IS_ERR + PTR_ERR.
          From the kbuild test robot.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a49d3d4
    • kbuild test robot's avatar
      xfrm: fix ptr_ret.cocci warnings · c6f5e017
      kbuild test robot authored
      net/xfrm/xfrm_interface.c:692:1-3: WARNING: PTR_ERR_OR_ZERO can be used
      
       Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
      
      Generated by: scripts/coccinelle/api/ptr_ret.cocci
      
      Fixes: 44e2b838 ("xfrm: Return detailed errors from xfrmi_newlink")
      CC: Benedict Wong <benedictwong@google.com>
      Signed-off-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      c6f5e017
    • David S. Miller's avatar
      Merge tag 'mlx5e-updates-2018-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ecbcd689
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5e-updates-2018-07-26 (XDP redirect)
      
      This series from Tariq adds the support for device-out XDP redirect.
      
      Start with a simple RX and XDP cleanups:
      - Replace call to MPWQE free with dealloc in interface down flow
      - Do not recycle RX pages in interface down flow
      - Gather all XDP pre-requisite checks in a single function
      - Restrict the combination of large MTU and XDP
      
      Since now XDP logic is going to be called from TX side as well,
      generic XDP TX logic is not RX only anymore, for that Tariq creates
      a new xdp.c file and moves XDP related code into it, and generalizes
      the code to support XDP TX for XDP redirect, such as the xdp tx sq
      structures and xdp counters.
      
      XDP redirect support:
      Add implementation for the ndo_xdp_xmit callback.
      
      Dedicate a new set of XDP-SQ instances to satisfy the XDP_REDIRECT
      requests.  These instances are totally separated from the existing
      XDP-SQ objects that satisfy local XDP_TX actions.
      
      Performance tests:
      
      xdp_redirect_map from ConnectX-5 to ConnectX-5.
      CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      Packet-rate of 64B packets.
      
      Single queue: 7 Mpps.
      Multi queue: 55 Mpps.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecbcd689
    • Jakub Kicinski's avatar
      netdevsim: make debug dirs' dentries static · f61b6db3
      Jakub Kicinski authored
      The root directories of netdevsim should only be used by the core
      to create per-device subdirectories, so limit their visibility to
      the core file.
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f61b6db3
    • David S. Miller's avatar
      Merge branch 'docs-net-Convert-netdev-FAQ-to-RST' · 472f5975
      David S. Miller authored
      Tobin C. Harding says:
      
      ====================
      docs: net: Convert netdev-FAQ to RST
      
      Jon answered all the tree questions on v1 so if you will please take
      this through your tree that would be awesome.
      
      v2:
       - Fix typo 'canonical_path_format' (thanks Edward)
       - Add patch fixing references netdev-FAQ
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      472f5975
    • Tobin C. Harding's avatar
      docs: Update references to netdev-FAQ · 287f4fa9
      Tobin C. Harding authored
      File 'Documentation/networking/netdev-FAQ.txt' has been converted to RST
      format.  We should update all links/references to point to the new file.
      
      Update references to netdev-FAQ
      Signed-off-by: default avatarTobin C. Harding <me@tobin.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      287f4fa9
    • Tobin C. Harding's avatar
      docs: net: Convert netdev-FAQ to restructured text · 96398ddf
      Tobin C. Harding authored
      Preferred kernel docs format is now restructured text.  Convert
      netdev-FAQ.txt to restructured text.
      
       - Add SPDX license identifier.
      
       - Change file heading 'Information you need to know about netdev' to
        'netdev FAQ' to better suit displayed index (in HTML).
      
       - Change question/answer layout to suit rst.  Copy format in
         Documentation/bpf/bpf_devel_QA.rst
      
       - Fix indentation of code snippets
      
       - If multiple consecutive URLs appear put them in a list (to maintain
        whitespace).
      
       - Use uniform spelling of 'bug fix' throughout document (not bugfix or
         bug-fix).
      
       - Add double back ticks to 'net' and 'net-next' when referring to the
         trees.
      
       - Use rst references for Documentation/ links.
      
       - Add rst label 'netdev-FAQ' for referencing by other docs files.
      
       - Remove stale entry from Documentation/networking/00-INDEX
      Signed-off-by: default avatarTobin C. Harding <me@tobin.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96398ddf
    • Tobin C. Harding's avatar
      docs: Add rest label the_canonical_patch_format · f58252cd
      Tobin C. Harding authored
      In preparation to convert Documentation/network/netdev-FAQ.rst to
      restructured text format.  We would like to be able to reference 'the
      canonical patch format' section.
      
      Add rest label: 'the_canonical_patch_format'.
      Signed-off-by: default avatarTobin C. Harding <me@tobin.cc>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f58252cd
    • Jia-Ju Bai's avatar
      net: adaptec: Replace mdelay() with msleep() in starfire_init_one() · d8ad2f31
      Jia-Ju Bai authored
      starfire_init_one() is never called in atomic context.
      It calls mdelay() to busily wait, which is not necessary.
      mdelay() can be replaced with msleep().
      
      This is found by a static analysis tool named DCNS written by myself.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8ad2f31
    • Jia-Ju Bai's avatar
      isdn: hisax: config: Replace GFP_ATOMIC with GFP_KERNEL · 055d624f
      Jia-Ju Bai authored
      hisax_cs_new() and hisax_cs_setup() are never called in atomic context.
      They call kmalloc() and kzalloc() with GFP_ATOMIC, which is not necessary.
      GFP_ATOMIC can be replaced with GFP_KERNEL.
      
      This is found by a static analysis tool named DCNS written by myself.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      055d624f
    • Jia-Ju Bai's avatar
      isdn: hisax: callc: Replace GFP_ATOMIC with GFP_KERNEL in init_PStack() · 87935aa7
      Jia-Ju Bai authored
      init_PStack() is never called in atomic context.
      It calls kmalloc() with GFP_ATOMIC, which is not necessary.
      GFP_ATOMIC can be replaced with GFP_KERNEL.
      
      This is found by a static analysis tool named DCNS written by myself.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87935aa7
    • Jia-Ju Bai's avatar
      isdn: mISDN: netjet: Replace GFP_ATOMIC with GFP_KERNEL in nj_probe() · 9d8009de
      Jia-Ju Bai authored
      nj_probe() is never called in atomic context.
      It calls kzalloc() with GFP_ATOMIC, which is not necessary.
      GFP_ATOMIC can be replaced with GFP_KERNEL.
      
      This is found by a static analysis tool named DCNS written by myself.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9d8009de
    • Jia-Ju Bai's avatar
      isdn: mISDN: hfcpci: Replace GFP_ATOMIC with GFP_KERNEL in hfc_probe() · 8c957d66
      Jia-Ju Bai authored
      hfc_probe() is never called in atomic context.
      It calls kzalloc() with GFP_ATOMIC, which is not necessary.
      GFP_ATOMIC can be replaced with GFP_KERNEL.
      
      This is found by a static analysis tool named DCNS written by myself.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c957d66
    • YueHaibing's avatar
      net: hns: make hns_dsaf_roce_reset non static · ff7b9126
      YueHaibing authored
      hns_dsaf_roce_reset is exported and used in hns_roce_hw_v1.c
      In commit 336a443b ("net: hns: Make many functions static") I make
      it static wrongly.
      
      drivers/infiniband/hw/hns/hns_roce_hw_v1.o: In function `hns_roce_v1_reset':
      hns_roce_hw_v1.c:(.text+0x37ac): undefined reference to `hns_dsaf_roce_reset'
      hns_roce_hw_v1.c:(.text+0x37cc): undefined reference to `hns_dsaf_roce_reset'
      
      Fixes: 336a443b ("net: hns: Make many functions static")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff7b9126
  2. 26 Jul, 2018 16 commits