1. 17 Jul, 2020 19 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2020-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · d44a919a
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2020-07-16
      
      Fixes:
      1) Fix build break when CONFIG_XPS is not set
      2) Fix missing switch_id for representors
      
      Updates:
      1) IPsec XFRM RX offloads from Raed and Huy.
        - Added IPSec RX steering flow tables to NIC RX
        - Refactoring of the existing FPGA IPSec, to add support
          for ConnectX IPsec.
        - RX data path handling for IPSec traffic
        - Synchronize offloading device ESN with xfrm received SN
      
      2) Parav allows E-Switch to siwtch to switchdev mode directly without
         the need to go through legacy mode first.
      
      3) From Tariq, Misc updates including:
         3.1) indirect calls for RX and XDP handlers
         3.2) Make MLX5_EN_TLS non-prompt as it should always be enabled when
              TLS and MLX5_EN are selected.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d44a919a
    • Christophe JAILLET's avatar
      net: alteon: Avoid some useless memset · 721dab2b
      Christophe JAILLET authored
      Avoid a memset after a call to 'dma_alloc_coherent()'.
      This is useless since
      commit 518a2f19 ("dma-mapping: zero memory returned from dma_alloc_*")
      
      Replace a kmalloc+memset with a corresponding kzalloc.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      721dab2b
    • Christophe JAILLET's avatar
      net: alteon: switch from 'pci_' to 'dma_' API · f4079e5d
      Christophe JAILLET authored
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below and has been
      hand modified to replace GFP_ with a correct flag.
      It has been compile tested.
      
      When memory is allocated in 'ace_allocate_descriptors()' and
      'ace_init()' GFP_KERNEL can be used because both functions are called from
      the probe function and no lock is acquired.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4079e5d
    • Christophe JAILLET's avatar
      net: sungem: switch from 'pci_' to 'dma_' API · 8d4f62ca
      Christophe JAILLET authored
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below and has been
      hand modified to replace GFP_ with a correct flag.
      It has been compile tested.
      
      When memory is allocated in 'gem_init_one()', GFP_KERNEL can be used
      because it is a probe function and no lock is acquired.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d4f62ca
    • Suraj Upadhyay's avatar
      net: decnet: af_decnet: Simplify goto loop. · e0c3f4c4
      Suraj Upadhyay authored
      Replace goto loop with while loop.
      Signed-off-by: default avatarSuraj Upadhyay <usuraj35@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0c3f4c4
    • David S. Miller's avatar
      Merge branch 'tcp-dsack-multi-seg' · c4fefd5a
      David S. Miller authored
      Priyaranjan Jha says:
      
      ====================
      tcp: improve handling of DSACK covering multiple segments
      
      Currently, while processing DSACK, we assume DSACK covers only one
      segment. This leads to significant underestimation of no. of duplicate
      segments with LRO/GRO. Also, the existing SNMP counters, TCPDSACKRecv
      and TCPDSACKOfoRecv, make similar assumption for DSACK, which makes them
      unusable for estimating spurious retransmit rates.
      
      This patch series fixes the segment accounting with DSACK, by estimating
      number of duplicate segments based on: (DSACKed sequence range) / MSS.
      It also introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks
      the estimated number of duplicate segments.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4fefd5a
    • Priyaranjan Jha's avatar
      tcp: add SNMP counter for no. of duplicate segments reported by DSACK · e3a5a1e8
      Priyaranjan Jha authored
      There are two existing SNMP counters, TCPDSACKRecv and TCPDSACKOfoRecv,
      which are incremented depending on whether the DSACKed range is below
      the cumulative ACK sequence number or not. Unfortunately, these both
      implicitly assume each DSACK covers only one segment. This makes these
      counters unusable for estimating spurious retransmit rates,
      or real/non-spurious loss rate.
      
      This patch introduces a new SNMP counter, TCPDSACKRecvSegs, which tracks
      the estimated number of duplicate segments based on:
      (DSACKed sequence range) / MSS. This counter is usable for estimating
      spurious retransmit rates, or real/non-spurious loss rate.
      Signed-off-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3a5a1e8
    • Priyaranjan Jha's avatar
      tcp: fix segment accounting when DSACK range covers multiple segments · a71d77e6
      Priyaranjan Jha authored
      Currently, while processing DSACK, we assume DSACK covers only one
      segment. This leads to significant underestimation of DSACKs with
      LRO/GRO. This patch fixes segment accounting with DSACK by estimating
      segment count from DSACK sequence range / MSS.
      Signed-off-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarYousuk Seung <ysseung@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a71d77e6
    • Christophe JAILLET's avatar
      net: sun: cassini: switch from 'pci_' to 'dma_' API · dcc82bb0
      Christophe JAILLET authored
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below and has been
      hand modified to replace GFP_ with a correct flag.
      It has been compile tested.
      
      When memory is allocated in 'cas_tx_tiny_alloc()', GFP_KERNEL can be used
      because a few lines below in its only caller, 'cas_alloc_rxds()', is also
      called. This function makes an explicit use of GFP_KERNEL.
      
      When memory is allocated in 'cas_init_one()', GFP_KERNEL can be used
      because it is a probe function and no lock is acquired.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcc82bb0
    • Davide Caratti's avatar
      mptcp: silence warning in subflow_data_ready() · 8c728940
      Davide Caratti authored
      since commit d47a7215 ("mptcp: fix race in subflow_data_ready()"), it
      is possible to observe a regression in MP_JOIN kselftests. For sockets in
      TCP_CLOSE state, it's not sufficient to just wake up the main socket: we
      also need to ensure that received data are made available to the reader.
      Silence the WARN_ON_ONCE() in these cases: it preserves the syzkaller fix
      and restores kselftests	when they are ran as follows:
      
        # while true; do
        > make KBUILD_OUTPUT=/tmp/kselftest TARGETS=net/mptcp kselftest
        > done
      Reported-by: default avatarFlorian Westphal <fw@strlen.de>
      Fixes: d47a7215 ("mptcp: fix race in subflow_data_ready()")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/47Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c728940
    • David S. Miller's avatar
      Merge branch 'usbnet-multicast-filter-support-for-cdc-ncm-devices' · 79814d81
      David S. Miller authored
      Bjørn Mork says:
      
      ====================
      usbnet: multicast filter support for cdc ncm devices
      
      This revives a 2 year old patch set from Miguel Rodríguez
      Pérez, which appears to have been lost somewhere along the
      way.  I've based it on the last version I found (v4), and
      added one patch which I believe must have been missing in
      the original.
      
      I kept Oliver's ack on one of the patches, since both the patch and
      the motivation still is the same.  Hope this is OK..
      
      Thanks to the anonymous user <wxcafe@wxcafe.net> for bringing up this
      problem in https://bugs.debian.org/965074
      
      This is only build and load tested by me.  I don't have any device
      where I can test the actual functionality.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79814d81
    • Miguel Rodríguez Pérez's avatar
      net: cdc_ncm: hook into set_rx_mode to admit multicast traffic · e10dcb1b
      Miguel Rodríguez Pérez authored
      We set set_rx_mode to usbnet_cdc_update_filter provided
      by cdc_ether that simply admits all multicast traffic
      if there is more than one multicast filter configured.
      Signed-off-by: default avatarMiguel Rodríguez Pérez <miguel@det.uvigo.gal>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e10dcb1b
    • Miguel Rodríguez Pérez's avatar
      net: cdc_ncm: add .ndo_set_rx_mode to cdc_ncm_netdev_ops · 37a2ebdd
      Miguel Rodríguez Pérez authored
      The cdc_ncm driver overrides the net_device_ops structure used by usbnet
      to be able to hook into .ndo_change_mtu. However, the structure was
      missing the .ndo_set_rx_mode field, preventing the driver from
      hooking into usbnet's set_rx_mode. This patch adds the missing callback to
      usbnet_set_rx_mode in net_device_ops.
      Signed-off-by: default avatarMiguel Rodríguez Pérez <miguel@det.uvigo.gal>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37a2ebdd
    • Bjørn Mork's avatar
      net: usbnet: export usbnet_set_rx_mode() · 1ea2b748
      Bjørn Mork authored
      This function can be reused by other usbnet minidrivers.
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ea2b748
    • Miguel Rodríguez Pérez's avatar
      net: cdc_ether: export usbnet_cdc_update_filter · e506adde
      Miguel Rodríguez Pérez authored
      This makes the function available to other drivers, like cdc_ncm.
      Signed-off-by: default avatarMiguel Rodríguez Pérez <miguel@det.uvigo.gal>
      Acked-by: default avatarOliver Neukum <oneukum@suse.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e506adde
    • Miguel Rodríguez Pérez's avatar
      net: cdc_ether: use dev->intf to get interface information · 0226009c
      Miguel Rodríguez Pérez authored
      usbnet_cdc_update_filter was getting the interface number from the
      usb_interface struct in cdc_state->control. However, cdc_ncm does
      not initialize that structure in its bind function, but uses
      cdc_ncm_ctx instead. Getting intf directly from struct usbnet solves
      the problem.
      Signed-off-by: default avatarMiguel Rodríguez Pérez <miguel@det.uvigo.gal>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0226009c
    • Eelco Chaudron's avatar
      net: openvswitch: reorder masks array based on usage · eac87c41
      Eelco Chaudron authored
      This patch reorders the masks array every 4 seconds based on their
      usage count. This greatly reduces the masks per packet hit, and
      hence the overall performance. Especially in the OVS/OVN case for
      OpenShift.
      
      Here are some results from the OVS/OVN OpenShift test, which use
      8 pods, each pod having 512 uperf connections, each connection
      sends a 64-byte request and gets a 1024-byte response (TCP).
      All uperf clients are on 1 worker node while all uperf servers are
      on the other worker node.
      
      Kernel without this patch     :  7.71 Gbps
      Kernel with this patch applied: 14.52 Gbps
      
      We also run some tests to verify the rebalance activity does not
      lower the flow insertion rate, which does not.
      Signed-off-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Tested-by: default avatarAndrew Theurer <atheurer@redhat.com>
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eac87c41
    • Chris Healy's avatar
      net: phy: sfp: Cotsworks SFF module EEPROM fixup · b18432c5
      Chris Healy authored
      Some Cotsworks SFF have invalid data in the first few bytes of the
      module EEPROM.  This results in these modules not being detected as
      valid modules.
      
      Address this by poking the correct EEPROM values into the module
      EEPROM when the model/PN match and the existing module EEPROM contents
      are not correct.
      Signed-off-by: default avatarChris Healy <cphealy@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b18432c5
    • Vladimir Oltean's avatar
      net: phy: continue searching for C45 MMDs even if first returned ffff:ffff · bba238ed
      Vladimir Oltean authored
      At the time of introduction, in commit bdeced75 ("net: dsa: felix:
      Add PCS operations for PHYLINK"), support for the Lynx PCS inside Felix
      was relying, for USXGMII support, on the fact that get_phy_device() is
      able to parse the Lynx PCS "device-in-package" registers for this C45
      MDIO device and identify it correctly.
      
      However, this was actually working somewhat by mistake (in the sense
      that, even though it was detected, it was detected for the wrong
      reasons).
      
      The get_phy_c45_ids() function works by iterating through all MMDs
      starting from 1 (MDIO_MMD_PMAPMD) and stops at the first one which
      returns a non-zero value in the "device-in-package" register pair,
      proceeding to see what that non-zero value is.
      
      For the Felix PCS, the first MMD (1, for the PMA/PMD) returns a non-zero
      value of 0xffffffff in the "device-in-package" registers. There is a
      code branch which is supposed to treat this case and flag it as wrong,
      and normally, this would have caught my attention when adding initial
      support for this PCS:
      
      	if ((devs_in_pkg & 0x1fffffff) == 0x1fffffff) {
      		/* If mostly Fs, there is no device there, then let's probe
      		 * MMD 0, as some 10G PHYs have zero Devices In package,
      		 * e.g. Cortina CS4315/CS4340 PHY.
      		 */
      
      However, this code never actually kicked in, it seems, because this
      snippet from get_phy_c45_devs_in_pkg() was basically sabotaging itself,
      by returning 0xfffffffe instead of 0xffffffff:
      
      	/* Bit 0 doesn't represent a device, it indicates c22 regs presence */
      	*devices_in_package &= ~BIT(0);
      
      Then the rest of the code just carried on thinking "ok, MMD 1 (PMA/PMD)
      says that there are 31 devices in that package, each having a device id
      of ffff:ffff, that's perfectly fine, let's go ahead and probe this PHY
      device".
      
      But after cleanup commit 320ed3bf ("net: phy: split
      devices_in_package"), this got "fixed", and now devs_in_pkg is no longer
      0xfffffffe, but 0xffffffff. So now, get_phy_device is returning -ENODEV
      for the Lynx PCS, because the semantics have remained mostly unchanged:
      the loop stops at the first MMD that returns a non-zero value, and that
      is MMD 1.
      
      But the Lynx PCS is simply a clause 37 PCS which implements the required
      MAC-side functionality for USXGMII (when operated in C45 mode, which is
      where C45 devices-in-package detection is relevant to). Of course it
      will fail the PMD/PMA test (MMD 1), since it is not a PHY. But it does
      implement detection for MDIO_MMD_PCS (3):
      
      - MDIO_DEVS1=0x008a, MDIO_DEVS2=0x0000,
      - MDIO_DEVID1=0x0083, MDIO_DEVID2=0xe400
      
      Let get_phy_c45_ids() continue searching for valid MMDs, and don't
      assume that every phy_device has a PMA/PMD MMD implemented.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bba238ed
  2. 16 Jul, 2020 21 commits
    • Jakub Kicinski's avatar
      Merge branch 'net-sched-do-not-drop-root-lock-in-tcf_qevent_handle' · 4291dc1a
      Jakub Kicinski authored
      Petr Machata says:
      
      ====================
      net: sched: Do not drop root lock in tcf_qevent_handle()
      
      Mirred currently does not mix well with blocks executed after the qdisc
      root lock is taken. This includes classification blocks (such as in PRIO,
      ETS, DRR qdiscs) and qevents. The locking caused by the packet mirrored by
      mirred can cause deadlocks: either when the thread of execution attempts to
      take the lock a second time, or when two threads end up waiting on each
      other's locks.
      
      The qevent patchset attempted to not introduce further badness of this
      sort, and dropped the lock before executing the qevent block. However this
      lead to too little locking and races between qdisc configuration and packet
      enqueue in the RED qdisc.
      
      Before the deadlock issues are solved in a way that can be applied across
      many qdiscs reasonably easily, do for qevents what is done for the
      classification blocks and just keep holding the root lock.
      
      That is done in patch #1. Patch #2 then drops the now unnecessary root_lock
      argument from Qdisc_ops.enqueue.
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4291dc1a
    • Petr Machata's avatar
      Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" · ac5c66f2
      Petr Machata authored
      This reverts commit aebe4426.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ac5c66f2
    • Petr Machata's avatar
      net: sched: Do not drop root lock in tcf_qevent_handle() · 55f656cd
      Petr Machata authored
      Mirred currently does not mix well with blocks executed after the qdisc
      root lock is taken. This includes classification blocks (such as in PRIO,
      ETS, DRR qdiscs) and qevents. The locking caused by the packet mirrored by
      mirred can cause deadlocks: either when the thread of execution attempts to
      take the lock a second time, or when two threads end up waiting on each
      other's locks.
      
      The qevent patchset attempted to not introduce further badness of this
      sort, and dropped the lock before executing the qevent block. However this
      lead to too little locking and races between qdisc configuration and packet
      enqueue in the RED qdisc.
      
      Before the deadlock issues are solved in a way that can be applied across
      many qdiscs reasonably easily, do for qevents what is done for the
      classification blocks and just keep holding the root lock.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      55f656cd
    • Eli Britstein's avatar
      net/mlx5e: CT: Map 128 bits labels to 32 bit map ID · 54b154ec
      Eli Britstein authored
      The 128 bits ct_label field is matched using a 32 bit hardware register.
      As such, only the lower 32 bits of ct_label field are offloaded. Change
      this logic to support setting and matching higher bits too.
      Map the 128 bits data to a unique 32 bits ID. Matching is done as exact
      match of the mapping ID of key & mask.
      Signed-off-by: default avatarEli Britstein <elibr@mellanox.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarMaor Dickman <maord@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      54b154ec
    • Tariq Toukan's avatar
      net/mlx5e: Do not request completion on every single UMR WQE · 0bdc89b3
      Tariq Toukan authored
      UMR WQEs are posted in bulks, and HW is notified once per a bulk.
      Reduce the number of completions by requesting such only for
      the last WQE of the bulk.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      0bdc89b3
    • Tariq Toukan's avatar
      net/mlx5e: RX, Avoid indirect call in representor CQE handling · 2901a5c6
      Tariq Toukan authored
      Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call
      when/if CONFIG_RETPOLINE=y.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2901a5c6
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Avoid indirect call in TX flow · 93761ca1
      Tariq Toukan authored
      Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call
      when/if CONFIG_RETPOLINE=y.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      93761ca1
    • Raed Salem's avatar
      net/mlx5e: IPsec: Add Connect-X IPsec ESN update offload support · 7ed92f97
      Raed Salem authored
      Synchronize offloading device ESN with xfrm received SN
      by updating an existing IPsec HW context with the new SN.
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      7ed92f97
    • Raed Salem's avatar
      net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload · b2ac7541
      Raed Salem authored
      On receive flow inspect received packets for IPsec offload indication
      using the cqe, for IPsec offloaded packets propagate offload status
      and stack handle to stack for further processing.
      
      Supported statuses:
      - Offload ok.
      - Authentication failure.
      - Bad trailer indication.
      
      Connect-X IPsec does not use mlx5e_ipsec_handle_rx_cqe.
      
      For RX only offload, we see the BW gain. Below is the iperf3
      performance report on two server of 24 cores Intel(R) Xeon(R)
      CPU E5-2620 v3 @ 2.40GHz with ConnectX6-DX.
      We use one thread per IPsec tunnel.
      
      ---------------------------------------------------------------------
      Mode          |  Num tunnel | BW     | Send CPU util | Recv CPU util
                    |             | (Gbps) | (Average %)   | (Average %)
      ---------------------------------------------------------------------
      Cryto offload | 1           | 4.6    | 4.2           | 14.5
      ---------------------------------------------------------------------
      Cryto offload | 24          | 38     | 73            | 63
      ---------------------------------------------------------------------
      Non-offload   | 1           | 4      | 4             | 13
      ---------------------------------------------------------------------
      Non-offload   | 24          | 23     | 52            | 67
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b2ac7541
    • Huy Nguyen's avatar
      net/mlx5e: IPsec: Add IPsec steering in local NIC RX · 5e466345
      Huy Nguyen authored
      Introduce decrypt FT, the RX error FT and the default rules.
      
      The IPsec RX decrypt flow table is pointed by the TTC
      (Traffic Type Classifier) ESP steering rules.
      The decrypt flow table has two flow groups. The first flow group
      keeps the decrypt steering rule programmed via the "ip xfrm s" interface.
      The second flow group has a default rule to forward all non-offloaded
      ESP packet to the TTC ESP default RSS TIR.
      
      The RX error flow table is the destination of the decrypt steering rules
      in the IPsec RX decrypt flow table. It has a fixed rule with single
      copy action that copies ipsec_syndrome to metadata_regB[0:6]. The IPsec
      syndrome is used to filter out non-ipsec packet and to return the IPsec
      crypto offload status in Rx flow. The destination of RX error flow table
      is the TTC ESP default RSS TIR.
      
      All the FTs (decrypt FT and error FT) are created only when IPsec SAs
      are added. If there is no IPsec SAs, the FTs are removed.
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5e466345
    • Huy Nguyen's avatar
      net/mlx5: Add IPsec related Flow steering entry's fields · 78fb6122
      Huy Nguyen authored
      Add FTE actions IPsec ENCRYPT/DECRYPT
      Add ipsec_obj_id field in FTE
      Add new action field MLX5_ACTION_IN_FIELD_IPSEC_SYNDROME
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarRaed Salem <raeds@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      78fb6122
    • Raed Salem's avatar
      net/mlx5: IPsec: Add HW crypto offload support · 2d64663c
      Raed Salem authored
      This patch adds support for Connect-X IPsec crypto offload
      by implementing the IPsec acceleration layer needed routines,
      which delegates IPsec offloads to Connect-X routines.
      
      In Connect-X IPsec, a Security Association (SA) is added or deleted
      via allocating a HW context of an encryption/decryption key and
      a HW context of a matching SA (IPsec object).
      The Security Policy (SP) is added or deleted by creating matching Tx/Rx
      steering rules whith an action of encryption/decryption respectively,
      executed using the previously allocated SA HW context.
      
      When new xfrm state (SA) is added:
      - Use a separate crypto key HW context.
      - Create a separate IPsec context in HW to inlcude the SA properties:
       - aes-gcm salt.
       - ICV properties (ICV length, implicit IV).
       - on supported devices also update ESN.
       - associate the allocated crypto key with this IPsec context.
      
      Introduce a new compilation flag MLX5_IPSEC for it.
      
      Downstream patches will implement the Rx,Tx steering
      and will add the update esn.
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Signed-off-by: default avatarHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      2d64663c
    • Raed Salem's avatar
      net/mlx5: Accel, Add core IPsec support for the Connect-X family · 9a6ad1ad
      Raed Salem authored
      This to set the base for downstream patches to support
      the new IPsec implementation of the Connect-X family.
      
      Following modifications made:
      - Remove accel layer dependency from MLX5_FPGA_IPSEC.
      - Introduce accel_ipsec_ops, each IPsec device will
        have to support these ops.
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      9a6ad1ad
    • Parav Pandit's avatar
      net/mlx5: E-switch, Reduce dependency on num_vfs during mode set · ea2128fd
      Parav Pandit authored
      Currently only ECPF allows enabling eswitch when SR-IOV is disabled.
      
      Enable PF also to enable eswitch when SR-IOV is disabled.
      Load VF vports when eswitch is already enabled.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ea2128fd
    • Parav Pandit's avatar
      net/mlx5: E-switch, Avoid function change handler for non ECPF · 3d5f41ca
      Parav Pandit authored
      for non ECPF eswitch manager function, vports are already
      enabled/disabled when eswitch is enabled/disabled respectively.
      Simplify function change handler for such eswitch manager function.
      
      Therefore, ECPF is the only one which remains PF/VF function change
      handler.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      3d5f41ca
    • Tariq Toukan's avatar
      net/mlx5: Make MLX5_EN_TLS non-prompt · e21feb88
      Tariq Toukan authored
      TLS runs only over Eth, and the Eth driver is the only user of
      the core TLS functionality.
      There is no meaning of having the core functionality without the usage
      in Eth driver.
      Hence, let both TLS core implementations depend on MLX5_CORE_EN,
      and select MLX5_EN_TLS.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e21feb88
    • Saeed Mahameed's avatar
      net/mlx5e: Fix build break when CONFIG_XPS is not set · 8b5ec43d
      Saeed Mahameed authored
      mlx5e_accel_sk_get_rxq is only used in ktls_rx.c file which already
      depends on XPS to be compiled, move it from the generic en_accel.h
      header to be local in ktls_rx.c, to fix the below build break
      
      In file included from
      ../drivers/net/ethernet/mellanox/mlx5/core/en_main.c:49:0:
      ../drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h:
      In function ‘mlx5e_accel_sk_get_rxq’:
      ../drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h:153:12:
      error: implicit declaration of function ‘sk_rx_queue_get’ ...
        int rxq = sk_rx_queue_get(sk);
                  ^~~~~~~~~~~~~~~
      
      Fixes: 1182f365 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Reported-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      8b5ec43d
    • Parav Pandit's avatar
      net/mlx5e: Fix missing switch_id for representors · 1315971f
      Parav Pandit authored
      Cited commit in fixes tag missed to set the switch id of the PF and VF
      ports. Due to this flow cannot be offloaded, a simple command like below
      fails to offload with below error.
      
      tc filter add dev ens2f0np0 parent ffff: prio 1 flower \
       dst_mac 00:00:00:00:00:00/00:00:00:00:00:00 skip_sw \
       action mirred egress redirect dev ens2f0np0pf0vf0
      
      Error: mlx5_core: devices are not on same switch HW, can't offload forwarding.
      
      Hence, fix it by setting switch id for each PF and VF representors port
      as before the cited commit.
      
      Fixes: 71ad8d55 ("devlink: Replace devlink_port_attrs_set parameters with a struct")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      1315971f
    • Vladimir Oltean's avatar
      net: mscc: ocelot: rethink Kconfig dependencies again · 89e35f66
      Vladimir Oltean authored
      Having the users of MSCC_OCELOT_SWITCH_LIB depend on REGMAP_MMIO was a
      bad idea, since that symbol is not user-selectable. So we should have
      kept a 'select REGMAP_MMIO'.
      
      When we do that, we run into 2 more problems:
      
      - By depending on GENERIC_PHY, we are causing a recursive dependency.
        But it looks like GENERIC_PHY has no other dependencies, and other
        drivers select it, so we can select it too:
      
      drivers/of/Kconfig:69:error: recursive dependency detected!
      drivers/of/Kconfig:69:  symbol OF_IRQ depends on IRQ_DOMAIN
      kernel/irq/Kconfig:68:  symbol IRQ_DOMAIN is selected by REGMAP
      drivers/base/regmap/Kconfig:7:  symbol REGMAP default is visible depending on REGMAP_MMIO
      drivers/base/regmap/Kconfig:39: symbol REGMAP_MMIO is selected by MSCC_OCELOT_SWITCH_LIB
      drivers/net/ethernet/mscc/Kconfig:15:   symbol MSCC_OCELOT_SWITCH_LIB is selected by MSCC_OCELOT_SWITCH
      drivers/net/ethernet/mscc/Kconfig:22:   symbol MSCC_OCELOT_SWITCH depends on GENERIC_PHY
      drivers/phy/Kconfig:8:  symbol GENERIC_PHY is selected by PHY_BCM_NS_USB3
      drivers/phy/broadcom/Kconfig:41:        symbol PHY_BCM_NS_USB3 depends on MDIO_BUS
      drivers/net/phy/Kconfig:13:     symbol MDIO_BUS depends on MDIO_DEVICE
      drivers/net/phy/Kconfig:6:      symbol MDIO_DEVICE is selected by PHYLIB
      drivers/net/phy/Kconfig:254:    symbol PHYLIB is selected by ARC_EMAC_CORE
      drivers/net/ethernet/arc/Kconfig:19:    symbol ARC_EMAC_CORE is selected by ARC_EMAC
      drivers/net/ethernet/arc/Kconfig:25:    symbol ARC_EMAC depends on OF_IRQ
      
      - By depending on PHYLIB, we are causing a recursive dependency. PHYLIB
        only has a single dependency, "depends on NETDEVICES", which we are
        already depending on, so we can again hack our way into conformance by
        turning the PHYLIB dependency into a select.
      
      drivers/of/Kconfig:69:error: recursive dependency detected!
      drivers/of/Kconfig:69:  symbol OF_IRQ depends on IRQ_DOMAIN
      kernel/irq/Kconfig:68:  symbol IRQ_DOMAIN is selected by REGMAP
      drivers/base/regmap/Kconfig:7:  symbol REGMAP default is visible depending on REGMAP_MMIO
      drivers/base/regmap/Kconfig:39: symbol REGMAP_MMIO is selected by MSCC_OCELOT_SWITCH_LIB
      drivers/net/ethernet/mscc/Kconfig:15:   symbol MSCC_OCELOT_SWITCH_LIB is selected by MSCC_OCELOT_SWITCH
      drivers/net/ethernet/mscc/Kconfig:22:   symbol MSCC_OCELOT_SWITCH depends on PHYLIB
      drivers/net/phy/Kconfig:254:    symbol PHYLIB is selected by ARC_EMAC_CORE
      drivers/net/ethernet/arc/Kconfig:19:    symbol ARC_EMAC_CORE is selected by ARC_EMAC
      drivers/net/ethernet/arc/Kconfig:25:    symbol ARC_EMAC depends on OF_IRQ
      
      Fixes: f4d0323b ("net: mscc: ocelot: convert MSCC_OCELOT_SWITCH into a library")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      89e35f66
    • John Ogness's avatar
      af_packet: TPACKET_V3: replace busy-wait loop · 632ca50f
      John Ogness authored
      A busy-wait loop is used to implement waiting for bits to be copied
      from the skb to the kernel buffer before retiring a block. This is
      a problem on PREEMPT_RT because the copying task could be preempted
      by the busy-waiting task and thus live lock in the busy-wait loop.
      
      Replace the busy-wait logic with an rwlock_t. This provides lockdep
      coverage and makes the code RT ready.
      Signed-off-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      632ca50f
    • Jakub Kicinski's avatar
      Merge branch 'net-fec-a-few-improvements' · 999cf8ae
      Jakub Kicinski authored
      Sergey Organov says:
      
      ====================
      net: fec: a few improvements
      
      This is a collection of simple improvements that reduce and/or
      simplify code. They got developed out of attempt to use DP83640 PTP
      PHY connected to built-in FEC (that has its own PTP support) of the
      iMX 6SX micro-controller. The primary bug-fix was now submitted
      separately, and this is the rest of the changes.
      
      NOTE: the patches are developed and tested on 4.9.146, and rebased on
      top of recent 'net-next/master', where, besides visual inspection, I
      only tested that they do compile.
      ====================
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      999cf8ae