1. 18 Oct, 2018 40 commits
    • Xin Long's avatar
      sctp: use sk_wmem_queued to check for writable space · cd305c74
      Xin Long authored
      sk->sk_wmem_queued is used to count the size of chunks in out queue
      while sk->sk_wmem_alloc is for counting the size of chunks has been
      sent. sctp is increasing both of them before enqueuing the chunks,
      and using sk->sk_wmem_alloc to check for writable space.
      
      However, sk_wmem_alloc is also increased by 1 for the skb allocked
      for sending in sctp_packet_transmit() but it will not wake up the
      waiters when sk_wmem_alloc is decreased in this skb's destructor.
      
      If msg size is equal to sk_sndbuf and sendmsg is waiting for sndbuf,
      the check 'msg_len <= sctp_wspace(asoc)' in sctp_wait_for_sndbuf()
      will keep waiting if there's a skb allocked in sctp_packet_transmit,
      and later even if this skb got freed, the waiting thread will never
      get waked up.
      
      This issue has been there since very beginning, so we change to use
      sk->sk_wmem_queued to check for writable space as sk_wmem_queued is
      not increased for the skb allocked for sending, also as TCP does.
      
      SOCK_SNDBUF_LOCK check is also removed here as it's for tx buf auto
      tuning which I will add in another patch.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd305c74
    • Xin Long's avatar
      sctp: count both sk and asoc sndbuf with skb truesize and sctp_chunk size · 605c0ac1
      Xin Long authored
      Now it's confusing that asoc sndbuf_used is doing memory accounting with
      SCTP_DATA_SNDSIZE(chunk) + sizeof(sk_buff) + sizeof(sctp_chunk) while sk
      sk_wmem_alloc is doing that with skb->truesize + sizeof(sctp_chunk).
      
      It also causes sctp_prsctp_prune to count with a wrong freed memory when
      sndbuf_policy is not set.
      
      To make this right and also keep consistent between asoc sndbuf_used, sk
      sk_wmem_alloc and sk_wmem_queued, use skb->truesize + sizeof(sctp_chunk)
      for them.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      605c0ac1
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 2d0f0ca2
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2018-10-17
      
      This series adds support for the new igc driver.
      
      The igc driver is the new client driver supporting the Intel I225
      Ethernet Controller, which supports 2.5GbE speeds.  The reason for
      creating a new client driver, instead of adding support for the new
      device in e1000e, is that the silicon behaves more like devices
      supported in igb driver.  It also did not make sense to add a client
      part, to the igb driver which supports only 1GbE server parts.
      
      This initial set of patches is designed for basic support (i.e. link and
      pass traffic).  Follow-on patch series will add more advanced support
      like VLAN, Wake-on-LAN, etc..
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d0f0ca2
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2018-10-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 99e9acd8
      David S. Miller authored
      mlx5-updates-2018-10-17
      
      ========================================================================
      
      From Or Gerlitz <ogerlitz@mellanox.com>:
      
      This series from Paul adds support to mlx5 e-switch tc offloading of multiple priorities and chains.
      
      This is made of four building blocks (along with few minor driver refactors):
      
      [1] Split FDB fast path prio to multiple namespaces
      
      Currently the FDB name-space contains two priorities, fast path (p0) and slow path (p1).
      The slow path contains the per representor SQ send-to-vport TX rule and the match-all
      RX miss rule. As a pre-step to support multi-chains and priorities, we split the FDB fast path
      to multiple namespaces  (sub namespaces), each with multiple priorities.
      
      [2] E-Switch chains and priorities
      
      A chain is a group of priorities. We use the fdb parallel sub-namespaces to implement chains,
      and a flow table for each priority in them.
      
      Because these namespaces are parallel and in series to the slow path
      fdb, the chains aren't connected to each other (but to the slow path),
      and one must use a explicit goto action to reach a different chain.
      
      Flow tables for the priorities are created on demand and destroyed
      once not used.
      
      [3] Add a no-append flow insertion mode, use it for TC offloads
      
      Enhance the driver fs core, such that if a no-append flag is set by the caller,
      we add a new FTE, instead of appending the actions of the inserted rule when
      the same match already exists.
      
      For encap rules, we defer the HW offloading till we have a valid neighbor. This can
      result in the packet hitting a lower priority rule in the HW DP. Use the no-append API
      to push these packets to the slow path FDB table, so they go to the TC kernel DP as done
      before priorities where supported.
      
      [4] Offloading tc priorities and chains for eswitch flows
      
      Using [1], [2] and [3] above we add the support for offloading both chains
      and priorities. To get to a new chain, use the tc goto action. We support
      a fixed prio range 1-16, and chains 0-3.
      =============================================================================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99e9acd8
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next · 8f18da47
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net-next): ipsec-next 2018-10-18
      
      1) Remove an unnecessary dev->tstats check in xfrmi_get_stats64.
         From Li RongQing.
      
      2) We currently do a sizeof(element) instead of a sizeof(array)
         check when initializing the ovec array of the secpath.
         Currently this array can have only one element, so code is
         OK but error-prone. Change this to do a sizeof(array)
         check so that we can add more elements in future.
         From Li RongQing.
      
      3) Improve xfrm IPv6 address hashing by using the complete IPv6
         addresses for a hash. From Michal Kubecek.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f18da47
    • Gustavo A. R. Silva's avatar
      net: skbuff.h: Mark expected switch fall-throughs · 82385b0d
      Gustavo A. R. Silva authored
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82385b0d
    • Arthur Kiyanovski's avatar
      net: ena: enable Low Latency Queues · 9fd25592
      Arthur Kiyanovski authored
      Use the new API to enable usage of LLQ.
      Signed-off-by: default avatarArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fd25592
    • Netanel Belgazal's avatar
      net: ena: Fix Kconfig dependency on X86 · 8c590f97
      Netanel Belgazal authored
      The Kconfig limitation of X86 is to too wide.
      The ENA driver only requires a little endian dependency.
      
      Change the dependency to be on little endian CPU.
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c590f97
    • David S. Miller's avatar
      Merge branch 'tcp_bbr-TCP-BBR-changes-for-EDT-pacing-model' · a58598a4
      David S. Miller authored
      Neal Cardwell says:
      
      ====================
      tcp_bbr: TCP BBR changes for EDT pacing model
      
      Two small patches for TCP BBR to follow up with Eric's recent work to change
      the TCP and fq pacing machinery to an "earliest departure time" (EDT) model:
      
      - The first patch adjusts the TCP BBR logic to work with the new
        "earliest departure time" (EDT) pacing model.
      
      - The second patch adjusts the TCP BBR logic to centralize the setting
        of gain values, to simplify the code and prepare for future changes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a58598a4
    • Neal Cardwell's avatar
      tcp_bbr: centralize code to set gains · cf33e25c
      Neal Cardwell authored
      Centralize the code that sets gains used for computing cwnd and pacing
      rate. This simplifies the code and makes it easier to change the state
      machine or (in the future) dynamically change the gain values and
      ensure that the correct gain values are always used.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf33e25c
    • Neal Cardwell's avatar
      tcp_bbr: adjust TCP BBR for departure time pacing · a87c83d5
      Neal Cardwell authored
      Adjust TCP BBR for the new departure time pacing model in the recent
      commit ab408b6d ("tcp: switch tcp and sch_fq to new earliest
      departure time model").
      
      With TSQ and pacing at lower layers, there are often several skbs
      queued in the pacing layer, and thus there is less data "in the
      network" than "in flight".
      
      With departure time pacing at lower layers (e.g. fq or potential
      future NICs), the data in the pacing layer now has a pre-scheduled
      ("baked-in") departure time that cannot be changed, even if the
      congestion control algorithm decides to use a new pacing rate.
      
      This means that there can be a non-trivial lag between when BBR makes
      a pacing rate change and when the inter-skb pacing delays
      change. After a pacing rate change, the number of packets in the
      network can gradually evolve to be higher or lower, depending on
      whether the sending rate is higher or lower than the delivery
      rate. Thus ignoring this lag can cause significant overshoot, with the
      flow ending up with too many or too few packets in the network.
      
      This commit changes BBR to adapt its pacing rate based on the amount
      of data in the network that it estimates has already been "baked in"
      by previous departure time decisions. We estimate the number of our
      packets that will be in the network at the earliest departure time
      (EDT) for the next skb scheduled as:
      
         in_network_at_edt = inflight_at_edt - (EDT - now) * bw
      
      If we're increasing the amount of data in the network ("in_network"),
      then we want to know if the transmit of the EDT skb will push
      in_network above the target, so our answer includes
      bbr_tso_segs_goal() from the skb departing at EDT. If we're decreasing
      in_network, then we want to know if in_network will sink too low just
      before the EDT transmit, so our answer does not include the segments
      from the skb departing at EDT.
      
      Why do we treat pacing_gain > 1.0 case and pacing_gain < 1.0 case
      differently? The in_network curve is a step function: in_network goes
      up on transmits, and down on ACKs. To accurately predict when
      in_network will go beyond our target value, this will happen on
      different events, depending on whether we're concerned about
      in_network potentially going too high or too low:
      
       o if pushing in_network up (pacing_gain > 1.0),
         then in_network goes above target upon a transmit event
      
       o if pushing in_network down (pacing_gain < 1.0),
         then in_network goes below target upon an ACK event
      
      This commit changes the BBR state machine to use this estimated
      "packets in network" value to make its decisions.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a87c83d5
    • Vijay Khemka's avatar
      net/ncsi: Add NCSI Broadcom OEM command · cb10c7c0
      Vijay Khemka authored
      This patch adds OEM Broadcom commands and response handling. It also
      defines OEM Get MAC Address handler to get and configure the device.
      
      ncsi_oem_gma_handler_bcm: This handler send NCSI broadcom command for
      getting mac address.
      ncsi_rsp_handler_oem_bcm: This handles response received for all
      broadcom OEM commands.
      ncsi_rsp_handler_oem_bcm_gma: This handles get mac address response and
      set it to device.
      Signed-off-by: default avatarVijay Khemka <vijaykhemka@fb.com>
      Reviewed-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb10c7c0
    • David S. Miller's avatar
      Merge branch 'mscc-fixes' · 1010c17e
      David S. Miller authored
      Gustavo A. R. Silva says:
      
      ====================
      fix signedness bug and memory leak in mscc driver
      
      This patchset aims to fix a signedness bug in function
      vsc85xx_downshift_get() and a memory leak in function
      vsc8574_config_pre_init().
      
      Changes in v3:
       - Add Quentin's Reviewed-by to commit log in patch 2/2.
       - Post the series to netdev.
      
      Changes in v2:
       - Add Quentin's Reviewed-by to commit log in patch 1/2.
       - Jump to out label so all functions in the driver exit with the PHY
         set to access the standard page. Thanks to Quentin Schulz for
         pointing this out.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1010c17e
    • Gustavo A. R. Silva's avatar
      net: phy: mscc: fix memory leak in vsc8574_config_pre_init · 47d20212
      Gustavo A. R. Silva authored
      In case memory resources for *fw* were successfully allocated,
      release them before return.
      
      Addresses-Coverity-ID: 1473968 ("Resource leak")
      Fixes: 00d70d8e ("net: phy: mscc: add support for VSC8574 PHY")
      Reviewed-by: default avatarQuentin Schulz <quentin.schulz@bootlin.com>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47d20212
    • Gustavo A. R. Silva's avatar
      net: phy: mscc: fix signedness bug in vsc85xx_downshift_get · e519869a
      Gustavo A. R. Silva authored
      Currently, the error handling for the call to function
      phy_read_paged() doesn't work because *reg_val* is of
      type u16 (16 bits, unsigned), which makes it impossible
      for it to hold a value less than 0.
      
      Fix this by changing the type of variable *reg_val* to int.
      
      Addresses-Coverity-ID: 1473970 ("Unsigned compared against 0")
      Fixes: 6a0bfbbe ("net: phy: mscc: migrate to phy_select/restore_page functions")
      Reviewed-by: default avatarQuentin Schulz <quentin.schulz@bootlin.com>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e519869a
    • Kyeongdon Kim's avatar
      net: fix warning in af_unix · 33c4368e
      Kyeongdon Kim authored
      This fixes the "'hash' may be used uninitialized in this function"
      
      net/unix/af_unix.c:1041:20: warning: 'hash' may be used uninitialized in this function [-Wmaybe-uninitialized]
        addr->hash = hash ^ sk->sk_type;
      Signed-off-by: default avatarKyeongdon Kim <kyeongdon.kim@lge.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33c4368e
    • Marek Behún's avatar
      net: dsa: mv88e6xxx: Fix 88E6141/6341 2500mbps SERDES speed · 26422340
      Marek Behún authored
      This is a fix for the port_set_speed method for the Topaz family.
      Currently the same method is used as for the Peridot family, but
      this is wrong for the SERDES port.
      
      On Topaz, the SERDES port is port 5, not 9 and 10 as in Peridot.
      Moreover setting alt_bit on Topaz only makes sense for port 0 (for
      (differentiating 100mbps vs 200mbps). The SERDES port does not
      support more than 2500mbps, so alt_bit does not make any difference.
      Signed-off-by: default avatarMarek Behún <marek.behun@nic.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26422340
    • David S. Miller's avatar
      Merge branch 'octeontx2-af-NPA-and-NIX-blocks-initialization' · e943d94e
      David S. Miller authored
      Sunil Goutham says:
      
      ====================
      octeontx2-af: NPA and NIX blocks initialization
      
      This patchset is a continuation to earlier submitted patch series
      to add a new driver for Marvell's OcteonTX2 SOC's
      Resource virtualization unit (RVU) admin function driver.
      
      octeontx2-af: Add RVU Admin Function driver
      https://www.spinics.net/lists/netdev/msg528272.html
      
      This patch series adds logic for the following.
      - Modified register polling loop to use time_before(jiffies, timeout),
        as suggested by Arnd Bergmann.
      - Support to forward interface link status notifications sent by
        firmware to registered PFs mapped to a CGX::LMAC.
      - Support to set CGX LMAC in loopback mode, retrieve stats,
        configure DMAC filters at CGX level etc.
      - Network pool allocator (NPA) functional block initialization,
        admin queue support, NPALF aura/pool contexts memory allocation, init
        and deinit.
      - Network interface controller (NIX) functional block basic init,
        admin queue support, NIXLF RQ/CQ/SQ HW contexts memory allocation,
        init and deinit.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e943d94e
    • Geetha sowjanya's avatar
      octeontx2-af: Support for disabling NIX RQ/SQ/CQ contexts · 557dd485
      Geetha sowjanya authored
      This patch adds support for a RVU PF/VF to disable all RQ/SQ/CQ
      contexts of a NIX LF via mbox. This will be used by PF/VF drivers
      upon teardown or while freeing up HW resources.
      
      A HW context which is not INIT'ed cannot be modified and a
      RVU PF/VF driver may or may not INIT all the RQ/SQ/CQ contexts.
      So a bitmap is introduced to keep track of enabled NIX RQ/SQ/CQ
      contexts, so that only enabled hw contexts are disabled upon LF
      teardown.
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Signed-off-by: default avatarStanislaw Kardach <skardach@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      557dd485
    • Sunil Goutham's avatar
      octeontx2-af: NIX AQ instruction enqueue support · ffb0abd7
      Sunil Goutham authored
      Add support for a RVU PF/VF to submit instructions to NIX AQ
      via mbox. Instructions can be to init/write/read RQ/SQ/CQ/RSS
      contexts. In case of read, context will be returned as part of
      response to the mbox msg received.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ffb0abd7
    • Sunil Goutham's avatar
      octeontx2-af: Alloc bitmaps for NIX Tx scheduler queues · 709a4f0c
      Sunil Goutham authored
      Allocate bitmaps and memory for PFVF mapping info for
      maintaining NIX transmit scheduler queues maintenance.
      PF/VF drivers will request for alloc, free e.t.c of
      Tx schedulers via mailbox.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      709a4f0c
    • Sunil Goutham's avatar
      octeontx2-af: NIX LSO config for TSOv4/v6 offload · 59360e98
      Sunil Goutham authored
      Config LSO formats for TSOv4 and TSOv6 offloads.
      These formats tell HW which fields in the TCP packet's
      headers have to be updated while performing segmentation
      offload.
      
      Also report PF/VF drivers the LSO format indices as part
      of response to NIX_LF_ALLOC mbox msg. These indices are
      used in SQE extension headers while framing SQE for pkt
      transmission with TSO offload.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59360e98
    • Sunil Goutham's avatar
      octeontx2-af: NIX block LF initialization · cb30711a
      Sunil Goutham authored
      Upon receiving NIX_LF_ALLOC mbox message allocate memory for
      NIXLF's CQ, SQ, RQ, CINT, QINT and RSS HW contexts and configure
      respective base iova HW. Enable caching of contexts into NIX NDC.
      
      Return SQ buffer (SQB) size, this PF/VF MAC address etc info
      e.t.c to the mbox msg sender.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb30711a
    • Sunil Goutham's avatar
      octeontx2-af: NIX block admin queue init · aba53d5d
      Sunil Goutham authored
      Initialize NIX admin queue (AQ) i.e alloc memory for
      AQ instructions and for the results. All NIX LFs will submit
      instructions to AQ to init/write/read RQ/SQ/CQ/RSS contexts
      and in case of read, get context from result memory.
      
      Also before configuring/using NIX block calibrate X2P bus
      and check if NIX interfaces like CGX and LBK are in active
      and working state.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aba53d5d
    • Geetha sowjanya's avatar
      octeontx2-af: Support for disabling NPA Aura/Pool contexts · 57856dde
      Geetha sowjanya authored
      This patch adds support for a RVU PF/VF to disable all Aura/Pool
      contexts of a NPA LF via mbox. This will be used by PF/VF drivers
      upon teardown or while freeing up HW resources.
      
      A HW context which is not INIT'ed cannot be modified and a
      RVU PF/VF driver may or may not INIT all the Aura/Pool contexts.
      So a bitmap is introduced to keep track of enabled NPA Aura/Pool
      contexts, so that only enabled hw contexts are disabled upon LF
      teardown.
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Signed-off-by: default avatarStanislaw Kardach <skardach@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57856dde
    • Sunil Goutham's avatar
      octeontx2-af: NPA AQ instruction enqueue support · 4a3581cd
      Sunil Goutham authored
      Add support for a RVU PF/VF to submit instructions to NPA AQ
      via mbox. Instructions can be to init/write/read Aura/Pool/Qint
      contexts. In case of read, context will be returned as part of
      response to the mbox msg received.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a3581cd
    • Sunil Goutham's avatar
      octeontx2-af: NPA block LF initialization · 3fa4c323
      Sunil Goutham authored
      Upon receiving NPA_LF_ALLOC mbox message allocate memory for
      NPALF's aura, pool and qint contexts and configure the same
      to HW. Enable caching of contexts into NPA NDC.
      
      Return pool related info like stack size, num pointers per
      stack page e.t.c to the mbox msg sender.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fa4c323
    • Sunil Goutham's avatar
      octeontx2-af: NPA block admin queue init · 7a37245e
      Sunil Goutham authored
      Initialize NPA admin queue (AQ) i.e alloc memory for
      AQ instructions and for the results. All NPA LFs will submit
      instructions to AQ to init/write/read Aura/Pool contexts
      and in case of read, get context from result memory.
      
      Added some common APIs for allocating memory for a queue
      and get IOVA in return, these APIs will be used by
      NIX AQ and for other purposes.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a37245e
    • Geetha sowjanya's avatar
      octeontx2-af: Enable or disable CGX internal loopback · 23999b30
      Geetha sowjanya authored
      Add support to enable or disable internal loopback mode in CGX.
      New mbox IDs CGX_INTLBK_ENABLE/DISABLE added for this.
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Signed-off-by: default avatarLinu Cherian <lcherian@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23999b30
    • Linu Cherian's avatar
      octeontx2-af: Forward CGX link notifications to PFs · 61071a87
      Linu Cherian authored
      Upon receiving notification from firmware the CGX event handler
      in the AF driver gets the current link info such as status, speed,
      duplex etc from CGX driver and sends it across to PFs who have
      registered to receive such notifications.
      
      To support above
       - Mbox messaging support for sending msgs from AF to PF has been added.
       - Added mbox msgs so that PFs can register/unregister for link events.
       - Link notifications are sent to PF under two scenarioss.
        1. When a asynchronous link change notification is received from
           firmware with notification flag turned on for that PF.
        2. Upon notification turn on request, the current link status is
           send to the PF.
      
      Also added a new mailbox msg using which RVU PF/VF can retrieve
      their mapped CGX LMAC's current link info. Link info includes
      status, speed, duplex and lmac type.
      Signed-off-by: default avatarLinu Cherian <lcherian@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61071a87
    • Vidhya Raman's avatar
      octeontx2-af: Support for MAC address filters in CGX · 96be2e0d
      Vidhya Raman authored
      This patch adds support for setting MAC address filters in CGX
      for PF interfaces. Also PF interfaces can be put in promiscuous
      mode. Dataplane PFs access this functionality using mailbox
      messages to the AF driver.
      Signed-off-by: default avatarVidhya Raman <vraman@marvell.com>
      Signed-off-by: default avatarStanislaw Kardach <skardach@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96be2e0d
    • Christina Jacob's avatar
      octeontx2-af: Support to retrieve CGX LMAC stats · 66208910
      Christina Jacob authored
      This patch adds support for a RVU PF/VF driver to retrieve
      it's mapped CGX LMAC Rx and Tx stats from AF via mbox.
      New mailbox msg is added is added.
      Signed-off-by: default avatarChristina Jacob <cjacob@marvell.com>
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66208910
    • Sunil Goutham's avatar
      octeontx2-af: CGX Rx/Tx enable/disable mbox handlers · 1435f66a
      Sunil Goutham authored
      Added new mailbox msgs for RVU PF/VFs to request AF
      to enable/disable their mapped CGX::LMAC Rx & Tx.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarLinu Cherian <lcherian@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1435f66a
    • Sunil Goutham's avatar
      octeontx2-af: Improve register polling loop · 6ca3ee2f
      Sunil Goutham authored
      Instead of looping on a integer timeout, use time_before(jiffies),
      so that maximum poll time is capped.
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ca3ee2f
    • David S. Miller's avatar
      Merge branch 'mlxsw-Add-VxLAN-support' · 53e50a6e
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Add VxLAN support
      
      This patchset adds support for VxLAN offload in the mlxsw driver.
      
      With regards to the forwarding plane, VxLAN support is composed from two
      main parts: Encapsulation and decapsulation.
      
      In the device, NVE encapsulation (and VxLAN in particular) takes place
      in the bridge. A packet can be encapsulated using VxLAN either because
      it hit an FDB entry that forwards it to the router with the IP of the
      remote VTEP or because it was flooded, in which case it is sent to a
      list of remote VTEPs (in addition to local ports). In either case, the
      VNI is derived from the filtering identifier (FID) the packet was
      classified to at ingress and the underlay source IP is taken from a
      device global configuration.
      
      VxLAN decapsulation takes place in the underlay router, where packets
      that hit a local route that corresponds to the source IP of the local
      VTEP are decapsulated and injected to the bridge. The packets are
      classified to a FID based on the VNI they came with.
      
      The first six patches export the required APIs in the VxLAN and mlxsw
      drivers in order to allow for the introduction of the NVE core in the
      next two patches. The NVE core is designed to support a variety of NVE
      encapsulations (e.g., VxLAN, NVGRE) and different ASICs, but currently
      only VxLAN and Spectrum are supported. Spectrum-2 support will be added
      in the future.
      
      The last 10 patches add support for VxLAN decapsulation and
      encapsulation and include the addition of the required switchdev APIs in
      the VxLAN driver. These APIs allow capable drivers to get a notification
      about the addition / deletion of FDB entries to / from the VxLAN's FDB.
      
      Subsequent patchset will add selftests (generic and mlxsw-specific),
      data plane learning, FDB extack and vetoing and support for VLAN-aware
      bridges (one VNI per VxLAN device model).
      
      v2:
      * Implement netif_is_vxlan() using rtnl_link_ops->kind (Jakub & Stephen)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53e50a6e
    • Ido Schimmel's avatar
      mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation · 1231e04f
      Ido Schimmel authored
      In the device, VxLAN encapsulation takes place in the FDB table where
      certain {MAC, FID} entries are programmed with an underlay unicast IP.
      MAC addresses that are not programmed in the FDB are flooded to the
      relevant local ports and also to a list of underlay unicast IPs that are
      programmed using the all zeros MAC address in the VxLAN driver.
      
      One difference between the hardware and software data paths is the fact
      that in the software data path there are two FDB lookups prior to the
      encapsulation of the packet. First in the bridge's FDB table using {MAC,
      VID} and another in the VxLAN's FDB table using {MAC, VNI}.
      
      Therefore, when a new VxLAN FDB entry is notified, it is only programmed
      to the device if there is a corresponding entry in the bridge's FDB
      table. Similarly, when a new bridge FDB entry pointing to the VxLAN
      device is notified, it is only programmed to the device if there is a
      corresponding entry in the VxLAN's FDB table.
      
      Note that the above scheme will result in a discrepancy between both
      data paths if only one FDB table is populated in the software data path.
      For example, if only the bridge's FDB is populated with an entry
      pointing to a VxLAN device, then a packet hitting the entry will only be
      flooded by the kernel to remote VTEPs whereas the device will also flood
      the packets to other local ports member in the VLAN.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1231e04f
    • Ido Schimmel's avatar
      mlxsw: spectrum: Enable VxLAN enslavement to bridges · 1c30d183
      Ido Schimmel authored
      Enslavement of VxLAN devices to offloaded bridges was never forbidden by
      mlxsw, but this patch makes sure the required configuration is performed
      in order to allow VxLAN encapsulation and decapsulation to take place in
      the device.
      
      The patch handles both the case where a VxLAN device is enslaved to an
      already offloaded bridge and the case where the first mlxsw port is
      enslaved to a bridge that already has VxLAN device configured.
      
      Invalid configurations are sanitized and an error string is returned via
      extack.
      
      Since encapsulation and decapsulation do not occur when the VxLAN device
      is down, the driver makes sure to enable / disable these functionalities
      based on NETDEV_PRE_UP and NETDEV_DOWN events.
      
      Note that NETDEV_PRE_UP is used in favor of NETDEV_UP, as the former
      allows to veto the operation, if necessary.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c30d183
    • Ido Schimmel's avatar
      bridge: switchdev: Allow clearing FDB entry offload indication · e9ba0fbc
      Ido Schimmel authored
      Currently, an FDB entry only ceases being offloaded when it is deleted.
      This changes with VxLAN encapsulation.
      
      Devices capable of performing VxLAN encapsulation usually have only one
      FDB table, unlike the software data path which has two - one in the
      bridge driver and another in the VxLAN driver.
      
      Therefore, bridge FDB entries pointing to a VxLAN device are only
      offloaded if there is a corresponding entry in the VxLAN FDB.
      
      Allow clearing the offload indication in case the corresponding entry
      was deleted from the VxLAN FDB.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9ba0fbc
    • Petr Machata's avatar
      vxlan: Notify for each remote of a removed FDB entry · 045a5a99
      Petr Machata authored
      When notifications are sent about FDB activity, and an FDB entry with
      several remotes is removed, the notification is sent only for the first
      destination. That makes it impossible to distinguish between the case
      where only this first remote is removed, and the one where the FDB entry
      is removed as a whole.
      
      Therefore send one notification for each remote of a removed FDB entry.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      045a5a99
    • Petr Machata's avatar
      vxlan: Support marking RDSTs as offloaded · 0efe1173
      Petr Machata authored
      Offloaded bridge FDB entries are marked with NTF_OFFLOADED. Implement a
      similar mechanism for VXLAN, where a given remote destination can be
      marked as offloaded.
      
      To that end, introduce a new event, SWITCHDEV_VXLAN_FDB_OFFLOADED,
      through which the marking is communicated to the vxlan driver. To
      identify which RDST should be marked as offloaded, an
      switchdev_notifier_vxlan_fdb_info is passed to the listeners. The
      "offloaded" flag in that object determines whether the offloaded mark
      should be set or cleared.
      
      When sending offloaded FDB entries over netlink, mark them with
      NTF_OFFLOADED.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0efe1173