1. 23 Jan, 2023 3 commits
    • Vladimir Oltean's avatar
      docs: ethtool-netlink: document interface for MAC Merge layer · 37000004
      Vladimir Oltean authored
      Show details about the structures passed back and forth related to MAC
      Merge layer configuration, state and statistics. The rendered htmldocs
      will be much more verbose due to the kerneldoc references.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37000004
    • Vladimir Oltean's avatar
      net: ethtool: add support for MAC Merge layer · 2b30f829
      Vladimir Oltean authored
      The MAC merge sublayer (IEEE 802.3-2018 clause 99) is one of 2
      specifications (the other being Frame Preemption; IEEE 802.1Q-2018
      clause 6.7.2), which work together to minimize latency caused by frame
      interference at TX. The overall goal of TSN is for normal traffic and
      traffic with a bounded deadline to be able to cohabitate on the same L2
      network and not bother each other too much.
      
      The standards achieve this (partly) by introducing the concept of
      preemptible traffic, i.e. Ethernet frames that have a custom value for
      the Start-of-Frame-Delimiter (SFD), and these frames can be fragmented
      and reassembled at L2 on a link-local basis. The non-preemptible frames
      are called express traffic, they are transmitted using a normal SFD, and
      they can preempt preemptible frames, therefore having lower latency,
      which can matter at lower (100 Mbps) link speeds, or at high MTUs (jumbo
      frames around 9K). Preemption is not recursive, i.e. a P frame cannot
      preempt another P frame. Preemption also does not depend upon priority,
      or otherwise said, an E frame with prio 0 will still preempt a P frame
      with prio 7.
      
      In terms of implementation, the standards talk about the presence of an
      express MAC (eMAC) which handles express traffic, and a preemptible MAC
      (pMAC) which handles preemptible traffic, and these MACs are multiplexed
      on the same MII by a MAC merge layer.
      
      To support frame preemption, the definition of the SFD was generalized
      to SMD (Start-of-mPacket-Delimiter), where an mPacket is essentially an
      Ethernet frame fragment, or a complete frame. Stations unaware of an SMD
      value different from the standard SFD will treat P frames as error
      frames. To prevent that from happening, a negotiation process is
      defined.
      
      On RX, packets are dispatched to the eMAC or pMAC after being filtered
      by their SMD. On TX, the eMAC/pMAC classification decision is taken by
      the 802.1Q spec, based on packet priority (each of the 8 user priority
      values may have an admin-status of preemptible or express).
      
      The MAC Merge layer and the Frame Preemption parameters have some degree
      of independence in terms of how software stacks are supposed to deal
      with them. The activation of the MM layer is supposed to be controlled
      by an LLDP daemon (after it has been communicated that the link partner
      also supports it), after which a (hardware-based or not) verification
      handshake takes place, before actually enabling the feature. So the
      process is intended to be relatively plug-and-play. Whereas FP settings
      are supposed to be coordinated across a network using something
      approximating NETCONF.
      
      The support contained here is exclusively for the 802.3 (MAC Merge)
      portions and not for the 802.1Q (Frame Preemption) parts. This API is
      sufficient for an LLDP daemon to do its job. The FP adminStatus variable
      from 802.1Q is outside the scope of an LLDP daemon.
      
      I have taken a few creative licenses and augmented the Linux kernel UAPI
      compared to the standard managed objects recommended by IEEE 802.3.
      These are:
      
      - ETHTOOL_A_MM_PMAC_ENABLED: According to Figure 99-6: Receive
        Processing state diagram, a MAC Merge layer is always supposed to be
        able to receive P frames. However, this implies keeping the pMAC
        powered on, which will consume needless power in applications where FP
        will never be used. If LLDP is used, the reception of an Additional
        Ethernet Capabilities TLV from the link partner is sufficient
        indication that the pMAC should be enabled. So my proposal is that in
        Linux, we keep the pMAC turned off by default and that user space
        turns it on when needed.
      
      - ETHTOOL_A_MM_VERIFY_ENABLED: The IEEE managed object is called
        aMACMergeVerifyDisableTx. I opted for consistency (positive logic) in
        the boolean netlink attributes offered, so this is also positive here.
        Other than the meaning being reversed, they correspond to the same
        thing.
      
      - ETHTOOL_A_MM_MAX_VERIFY_TIME: I found it most reasonable for a LLDP
        daemon to maximize the verifyTime variable (delay between SMD-V
        transmissions), to maximize its chances that the LP replies. IEEE says
        that the verifyTime can range between 1 and 128 ms, but the NXP ENETC
        stupidly keeps this variable in a 7 bit register, so the maximum
        supported value is 127 ms. I could have chosen to hardcode this in the
        LLDP daemon to a lower value, but why not let the kernel expose its
        supported range directly.
      
      - ETHTOOL_A_MM_TX_MIN_FRAG_SIZE: the standard managed object is called
        aMACMergeAddFragSize, and expresses the "additional" fragment size
        (on top of ETH_ZLEN), whereas this expresses the absolute value of the
        fragment size.
      
      - ETHTOOL_A_MM_RX_MIN_FRAG_SIZE: there doesn't appear to exist a managed
        object mandated by the standard, but user space clearly needs to know
        what is the minimum supported fragment size of our local receiver,
        since LLDP must advertise a value no lower than that.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b30f829
    • Peilin Ye's avatar
      net/sock: Introduce trace_sk_data_ready() · 40e0b090
      Peilin Ye authored
      As suggested by Cong, introduce a tracepoint for all ->sk_data_ready()
      callback implementations.  For example:
      
      <...>
        iperf-609  [002] .....  70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
        iperf-609  [002] .....  70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
      <...>
      Suggested-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40e0b090
  2. 21 Jan, 2023 17 commits
  3. 20 Jan, 2023 20 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b3c588cd
      Jakub Kicinski authored
      drivers/net/ipa/ipa_interrupt.c
      drivers/net/ipa/ipa_interrupt.h
        9ec9b2a3 ("net: ipa: disable ipa interrupt during suspend")
        8e461e1f ("net: ipa: introduce ipa_interrupt_enable()")
        d50ed355 ("net: ipa: enable IPA interrupt handlers separate from registration")
      https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/
      https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b3c588cd
    • Linus Torvalds's avatar
      Merge tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 5deaa985
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from wireless, bluetooth, bpf and netfilter.
      
        Current release - regressions:
      
         - Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
           addrconf", fix nsna_ping mode of team
      
         - wifi: mt76: fix bugs in Rx queue handling and DMA mapping
      
         - eth: mlx5:
            - add missing mutex_unlock in error reporter
            - protect global IPsec ASO with a lock
      
        Current release - new code bugs:
      
         - rxrpc: fix wrong error return in rxrpc_connect_call()
      
        Previous releases - regressions:
      
         - bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2
      
         - wifi:
            - mac80211: fix crashes on Rx due to incorrect initialization of
              rx->link and rx->link_sta
            - mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
              aggregation handling, crashes
            - brcmfmac: fix regression for Broadcom PCIe wifi devices
            - rndis_wlan: prevent buffer overflow in rndis_query_oid
      
         - netfilter: conntrack: handle tcp challenge acks during connection
           reuse
      
         - sched: avoid grafting on htb_destroy_class_offload when destroying
      
         - virtio-net: correctly enable callback during start_xmit, fix stalls
      
         - tcp: avoid the lookup process failing to get sk in ehash table
      
         - ipa: disable ipa interrupt during suspend
      
         - eth: stmmac: enable all safety features by default
      
        Previous releases - always broken:
      
         - bpf:
            - fix pointer-leak due to insufficient speculative store bypass
              mitigation (Spectre v4)
            - skip task with pid=1 in send_signal_common() to avoid a splat
            - fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
              PERF_BPF_EVENT_PROG_UNLOAD events
            - fix potential deadlock in htab_lock_bucket from same bucket
              index but different map_locked index
      
         - bluetooth:
            - fix a buffer overflow in mgmt_mesh_add()
            - hci_qca: fix driver shutdown on closed serdev
            - ISO: fix possible circular locking dependency
            - CIS: hci_event: fix invalid wait context
      
         - wifi: brcmfmac: fixes for survey dump handling
      
         - mptcp: explicitly specify sock family at subflow creation time
      
         - netfilter: nft_payload: incorrect arithmetics when fetching VLAN
           header bits
      
         - tcp: fix rate_app_limited to default to 1
      
         - l2tp: close all race conditions in l2tp_tunnel_register()
      
         - eth: mlx5: fixes for QoS config and eswitch configuration
      
         - eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()
      
         - eth: stmmac: fix invalid call to mdiobus_get_phy()
      
        Misc:
      
         - ethtool: add netlink attr in rss get reply only if the value is not
           empty"
      
      * tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
        Revert "Merge branch 'octeontx2-af-CPT'"
        tcp: fix rate_app_limited to default to 1
        bnxt: Do not read past the end of test names
        net: stmmac: enable all safety features by default
        octeontx2-af: add mbox to return CPT_AF_FLT_INT info
        octeontx2-af: update cpt lf alloc mailbox
        octeontx2-af: restore rxc conf after teardown sequence
        octeontx2-af: optimize cpt pf identification
        octeontx2-af: modify FLR sequence for CPT
        octeontx2-af: add mbox for CPT LF reset
        octeontx2-af: recover CPT engine when it gets fault
        net: dsa: microchip: ksz9477: port map correction in ALU table entry register
        selftests/net: toeplitz: fix race on tpacket_v3 block close
        net/ulp: use consistent error code when blocking ULP
        octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
        tcp: avoid the lookup process failing to get sk in ehash table
        Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf"
        MAINTAINERS: add networking entries for Willem
        net: sched: gred: prevent races when adding offloads to stats
        l2tp: prevent lockdep issue in l2tp_tunnel_register()
        ...
      5deaa985
    • Jakub Kicinski's avatar
      Revert "Merge branch 'octeontx2-af-CPT'" · 45a919bb
      Jakub Kicinski authored
      This reverts commit b4fbf0b2, reversing
      changes made to 6c977c5c.
      
      This seems like net-next material.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      45a919bb
    • Jakub Kicinski's avatar
      Merge branch 'octeontx2-af-miscellaneous-changes-for-cpt' · 7a590bd6
      Jakub Kicinski authored
      Srujana Challa says:
      
      ====================
      octeontx2-af: Miscellaneous changes for CPT
      
      This patchset consists of miscellaneous changes for CPT.
      - Adds a new mailbox to reset the requested CPT LF.
      - Modify FLR sequence as per HW team suggested.
      - Adds support to recover CPT engines when they gets fault.
      - Updates CPT inbound inline IPsec configuration mailbox,
        as per new generation of the OcteonTX2 chips.
      - Adds a new mailbox to return CPT FLT Interrupt info.
      ====================
      
      Link: https://lore.kernel.org/r/20230118120354.1017961-1-schalla@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a590bd6
    • Srujana Challa's avatar
      octeontx2-af: add mbox to return CPT_AF_FLT_INT info · b814cc90
      Srujana Challa authored
      CPT HW would trigger the CPT AF FLT interrupt when CPT engines
      hits some uncorrectable errors and AF is the one which receives
      the interrupt and recovers the engines.
      This patch adds a mailbox for CPT VFs to request for CPT faulted
      and recovered engines info.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b814cc90
    • Srujana Challa's avatar
      octeontx2-af: update cpt lf alloc mailbox · d1e1de10
      Srujana Challa authored
      The CN10K CPT coprocessor contains a context processor
      to accelerate updates to the IPsec security association
      contexts. The context processor contains a context cache.
      This patch updates CPT LF ALLOC mailbox to config ctx_ilen
      requested by VFs. CPT_LF_ALLOC:ctx_ilen is the size of
      initial context fetch.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1e1de10
    • Nithin Dabilpuram's avatar
      octeontx2-af: restore rxc conf after teardown sequence · e2784acb
      Nithin Dabilpuram authored
      CN10K CPT coprocessor includes a component named RXC which
      is responsible for reassembly of inner IP packets. RXC has
      the feature to evict oldest entries based on age/threshold.
      The age/threshold is being set to minimum values to evict
      all entries at the time of teardown.
      This patch adds code to restore timeout and threshold config
      after teardown sequence is complete as it is global config.
      Signed-off-by: default avatarNithin Dabilpuram <ndabilpuram@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e2784acb
    • Srujana Challa's avatar
      octeontx2-af: optimize cpt pf identification · 41b166e5
      Srujana Challa authored
      Optimize CPT PF identification in mbox handling for faster
      mbox response by doing it at AF driver probe instead of
      every mbox message.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41b166e5
    • Srujana Challa's avatar
      octeontx2-af: modify FLR sequence for CPT · 5c22fce6
      Srujana Challa authored
      On OcteonTX2 platform CPT instruction enqueue is only
      possible via LMTST operations.
      The existing FLR sequence mentioned in HRM requires
      a dummy LMTST to CPT but LMTST can't be submitted from
      AF driver. So, HW team provided a new sequence to avoid
      dummy LMTST. This patch adds code for the same.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5c22fce6
    • Srujana Challa's avatar
      octeontx2-af: add mbox for CPT LF reset · b7e41527
      Srujana Challa authored
      On OcteonTX2 SoC, the admin function (AF) is the only one with all
      priviliges to configure HW and alloc resources, PFs and it's VFs
      have to request AF via mailbox for all their needs.
      This patch adds a new mailbox for CPT VFs to request for CPT LF
      reset.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b7e41527
    • Srujana Challa's avatar
      octeontx2-af: recover CPT engine when it gets fault · e625dad8
      Srujana Challa authored
      When CPT engine has uncorrectable errors, it will get halted and
      must be disabled and re-enabled. This patch adds code for the same.
      Signed-off-by: default avatarSrujana Challa <schalla@marvell.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e625dad8
    • Arkadiusz Kubalewski's avatar
      ice: use GNSS subsystem instead of TTY · c7ef8221
      Arkadiusz Kubalewski authored
      Previously support for GNSS was implemented as a TTY driver, it allowed
      to access GNSS receiver on /dev/ttyGNSS_<bus><func>.
      
      Use generic GNSS subsystem API instead of implementing own TTY driver.
      The receiver is accessible on /dev/gnss<id>. In case of multiple receivers
      in the OS, correct device can be found by enumerating either:
      - /sys/class/net/<eth port>/device/gnss/
      - /sys/class/gnss/gnss<id>/device/
      
      Using GNSS subsystem is superior to implementing own TTY driver, as the
      GNSS subsystem was designed solely for this purpose. It also implements
      TTY driver but in a common and defined way.
      
      From user perspective, there is no difference in communicating with a
      device, except new path to the device shall be used. The device will
      provide same information to the userspace as the old one, and can be used
      in the same way, i.e.:
      old # gpsmon /dev/ttyGNSS_2100_0
      new # gpsmon /dev/gnss0
      There is no other impact on userspace tools.
      
      User expecting onboard GNSS receiver support is required to enable
      CONFIG_GNSS=y/m in kernel config.
      Reviewed-by: default avatarAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: default avatarKarol Kolacinski <karol.kolacinski@intel.com>
      Signed-off-by: default avatarMichal Michalik <michal.michalik@intel.com>
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7ef8221
    • Andy Shevchenko's avatar
      net: hns: Switch to use acpi_evaluate_dsm_typed() · 498fe810
      Andy Shevchenko authored
      The acpi_evaluate_dsm_typed() provides a way to check the type of the
      object evaluated by _DSM call. Use it instead of open coded variant.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      498fe810
    • Andy Shevchenko's avatar
      ACPI: utils: Add acpi_evaluate_dsm_typed() and acpi_check_dsm() stubs · 1b94ad7c
      Andy Shevchenko authored
      When the ACPI part of a driver is optional the methods used in it
      are expected to be available even if CONFIG_ACPI=n. This is not
      the case for _DSM related methods. Add stubs for
      acpi_evaluate_dsm_typed() and acpi_check_dsm() methods.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b94ad7c
    • David Morley's avatar
      tcp: fix rate_app_limited to default to 1 · 300b655d
      David Morley authored
      The initial default value of 0 for tp->rate_app_limited was incorrect,
      since a flow is indeed application-limited until it first sends
      data. Fixing the default to be 1 is generally correct but also
      specifically will help user-space applications avoid using the initial
      tcpi_delivery_rate value of 0 that persists until the connection has
      some non-zero bandwidth sample.
      
      Fixes: eb8329e0 ("tcp: export data delivery rate")
      Suggested-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid Morley <morleyd@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Tested-by: default avatarDavid Morley <morleyd@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      300b655d
    • Kees Cook's avatar
      bnxt: Do not read past the end of test names · d3e599c0
      Kees Cook authored
      Test names were being concatenated based on a offset beyond the end of
      the first name, which tripped the buffer overflow detection logic:
      
       detected buffer overflow in strnlen
       [...]
       Call Trace:
       bnxt_ethtool_init.cold+0x18/0x18
      
      Refactor struct hwrm_selftest_qlist_output to use an actual array,
      and adjust the concatenation to use snprintf() rather than a series of
      strncat() calls.
      Reported-by: default avatarNiklas Cassel <Niklas.Cassel@wdc.com>
      Link: https://lore.kernel.org/lkml/Y8F%2F1w1AZTvLglFX@x1-carbon/Tested-by: default avatarNiklas Cassel <Niklas.Cassel@wdc.com>
      Fixes: eb513658 ("bnxt_en: Add basic ethtool -t selftest support.")
      Cc: Michael Chan <michael.chan@broadcom.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Reviewed-by: default avatarNiklas Cassel <niklas.cassel@wdc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3e599c0
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · ba197fde
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-01-19 (ice)
      
      This series contains updates to ice driver only.
      
      Tsotne and Anatolii implement new handling, and AdminQ command, for
      firmware LLDP, adding a pending notification to allow for proper
      cleanup between TC changes.
      
      Amritha extends support for drop action outside of switchdev.
      
      Siddaraju adjusts restriction for PTP HW clock adjustments.
      
      Ani removes an unneeded non-null check and improves reporting of some link
      modes to utilize more appropriate values.
      
      Jesse adds checks to ensure PF VSI type.
      
      Przemek combines duplicate checks of the same condition into one check.
      
      Tony makes various cleanups to code: removes comments for cppcheck
      suppressions, reduces scope of some variables, changes some return
      statements to reflect an explicit 0 return, matches naming for function
      declaration and definition, adds local variable for readability, and
      fixes indenting.
      
      Sergey separates DDP (Dynamic Device Personalization) code into its own
      file.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba197fde
    • David S. Miller's avatar
      Merge branch 'net-dcb-rewrite-table' · f5339209
      David S. Miller authored
      Daniel Machon says:
      
      ====================
      net: Introduce new DCB rewrite table
      
      There is currently no support for per-port egress mapping of priority to PCP and
      priority to DSCP. Some support for expressing egress mapping of PCP is supported
      through ip link, with the 'egress-qos-map', however this command only maps
      priority to PCP, and for vlan interfaces only. DCB APP already has support for
      per-port ingress mapping of PCP/DEI, DSCP and a bunch of other stuff. So why not
      take advantage of this fact, and add a new table that does the reverse.
      
      This patch series introduces the new DCB rewrite table. Whereas the DCB
      APP table deals with ingress mapping of PID (protocol identifier) to priority,
      the rewrite table deals with egress mapping of priority to PID.
      
      It is indeed possible to integrate rewrite in the existing APP table, by
      introducing new dedicated rewrite selectors, and altering existing functions
      to treat rewrite entries specially. However, I feel like this is not a good
      solution, and will pollute the APP namespace. APP is well-defined in IEEE, and
      some userspace relies of advertised entries - for this fact, separating APP and
      rewrite into to completely separate objects, seems to me the best solution.
      
      The new table shares much functionality with the APP table, and as such, much
      existing code is reused, or slightly modified, to work for both.
      
      ================================================================================
      DCB rewrite table in a nutshell
      ================================================================================
      The table is implemented as a simple linked list, and uses the same lock as the
      APP table. New functions for getting, setting and deleting entries have been
      added, and these are exported, so they can be used by the stack or drivers.
      Additionnaly, new dcbnl_setrewr and dcnl_delrewr hooks has been added, to
      support hardware offload of the entries.
      
      ================================================================================
      Sparx5 per-port PCP rewrite support
      ================================================================================
      Sparx5 supports PCP egress mapping through two eight-entry switch tables.
      One table maps QoS class 0-7 to PCP for DE0 (DP levels mapped to
      drop-eligibility 0) and the other for DE1. DCB does currently not have support
      for expressing DP/color, so instead, the tagged DEI bit will reflect the DP
      levels, for any rewrite entries> 7 ('de').
      
      The driver will take apptrust (contributed earlier) into consideration, so
      that the mapping tables only be used, if PCP is trusted *and* the rewrite table
      has active mappings, otherwise classified PCP (same as frame PCP) will be used
      instead.
      
      ================================================================================
      Sparx5 per-port DSCP rewrite support
      ================================================================================
      Sparx5 support DSCP egress mapping through a single 32-entry table. This table
      maps classified QoS class and DP level to classified DSCP, and is consulted by
      the switch Analyzer Classifier at ingress. At egress, the frame DSCP can either
      be rewritten to classified DSCP to frame DSCP.
      
      The driver will take apptrust into consideration, so that the mapping tables
      only be used, if DSCP is trusted *and* the rewrite table has active mappings,
      otherwise frame DSCP will be used instead.
      
      ================================================================================
      Patches
      ================================================================================
      Patch #1 modifies dcb_app_add to work for both APP and rewrite
      
      Patch #2 adds dcbnl_app_table_setdel() for setting and deleting both APP and
               rewrite entries.
      
      Patch #3 adds the rewrite table and all required functions, offload hooks and
               bookkeeping for maintaining it.
      
      Patch #4 adds two new helper functions for getting a priority to PCP bitmask
               map, and a priority to DSCP bitmask map.
      
      Patch #5 adds support for PCP rewrite in the Sparx5 driver.
      Patch #6 adds support for DSCP rewrite in the Sparx5 driver.
      
      ================================================================================
      v2 -> v3:
        in dcbnl_ieee_fill() use nla_nest_start() instead of the _noflag() version.
        Also, cancel the rewrite nest in case of an error (Petr Machata).
      
      v1 -> v2:
        In dcb_setrewr() change proto to u16 as it ought to be, and remove zero
        initialization of err. (Dan Carpenter).
        Change name of dcbnl_apprewr_setdel -> dcbnl_app_table_setdel and change the
        function signature to take a single function pointer. Update uses accordingly
        (Petr Machata).
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5339209
    • Daniel Machon's avatar
      net: microchip: sparx5: add support for DSCP rewrite · 246c77f6
      Daniel Machon authored
      Add support for DSCP rewrite in Sparx5 driver. On egress DSCP is
      rewritten from either classified DSCP, or frame DSCP. Classified DSCP is
      determined by the Analyzer Classifier on ingress, and is mapped from
      classified QoS class and DP level. Classification of DSCP is by default
      enabled for all ports.
      
      It is required that DSCP is trusted for the egress port *and* rewrite
      table is not empty, in order to rewrite DSCP based on classified DSCP,
      otherwise DSCP is always rewritten from frame DSCP.
      
      classified_dscp = qos_dscp_map[8 * dp_level + qos_class];
      if (active_mappings && dscp_is_trusted)
      	rewritten_dscp = classified_dscp
      else
      	rewritten_dscp = frame_dscp
      
      To rewrite DSCP to 20 for any frames with priority 7:
      
      $ dcb apptrust set dev eth0 order dscp
      $ dcb rewr add dev eth0 7:20 <-- not in iproute2/dcb yet
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      246c77f6
    • Daniel Machon's avatar
      net: microchip: sparx5: add support for PCP rewrite · 2234879f
      Daniel Machon authored
      Add support for rewrite of PCP and DEI, based on classified Quality of
      Service (QoS) class and Drop-Precedence (DP) level.
      
      The DCB rewrite table is queried for mappings between priority and
      PCP/DEI. The classified DP level is then encoded in the DEI bit, if a
      mapping for DEI exists.
      
      Sparx5 has four DP levels, where by default, 0 is mapped to DE0 and 1-3
      are mapped to DE1. If a mapping exists where DEI=1, then all classified
      DP levels mapped to DE1 will set the DEI bit. The other way around for
      DEI=0. Effectively, this means that the tagged DEI bit will reflect the
      DP level for any mappings where DEI=1.
      
      Map priority=1 to PCP=1 and DEI=1:
      $ dcb rewr add dev eth0 pcp-prio 1:1de
      
      Map priority=7 to PCP=2 and DEI=0
      $ dcb rewr add dev eth0 pcp-prio 7:2nd
      
      Also, sparx5_dcb_ieee_dscp_setdel() has been refactored, to work for
      both APP and rewrite entries.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2234879f