1. 05 Aug, 2020 9 commits
    • Po-Hsu Lin's avatar
      selftests: rtnetlink: make kci_test_encap() return sub-test result · 72f70c15
      Po-Hsu Lin authored
      kci_test_encap() is actually composed by two different sub-tests,
      kci_test_encap_vxlan() and kci_test_encap_fou()
      
      Therefore we should check the test result of these two in
      kci_test_encap() to let the script be aware of the pass / fail status.
      Otherwise it will generate false-negative result like below:
          $ sudo ./test.sh
          PASS: policy routing
          PASS: route get
          PASS: preferred_lft addresses have expired
          PASS: promote_secondaries complete
          PASS: tc htb hierarchy
          PASS: gre tunnel endpoint
          PASS: gretap
          PASS: ip6gretap
          PASS: erspan
          PASS: ip6erspan
          PASS: bridge setup
          PASS: ipv6 addrlabel
          PASS: set ifalias 5b193daf-0a08-46d7-af2c-e7aadd422ded for test-dummy0
          PASS: vrf
          PASS: vxlan
          FAIL: can't add fou port 7777, skipping test
          PASS: macsec
          PASS: bridge fdb get
          PASS: neigh get
          $ echo $?
          0
      Signed-off-by: default avatarPo-Hsu Lin <po-hsu.lin@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72f70c15
    • Po-Hsu Lin's avatar
      selftests: rtnetlink: correct the final return value for the test · c2a4d274
      Po-Hsu Lin authored
      The return value "ret" will be reset to 0 from the beginning of each
      sub-test in rtnetlink.sh, therefore this test will always pass if the
      last sub-test has passed:
          $ sudo ./rtnetlink.sh
          PASS: policy routing
          PASS: route get
          PASS: preferred_lft addresses have expired
          PASS: promote_secondaries complete
          PASS: tc htb hierarchy
          PASS: gre tunnel endpoint
          PASS: gretap
          PASS: ip6gretap
          PASS: erspan
          PASS: ip6erspan
          PASS: bridge setup
          PASS: ipv6 addrlabel
          PASS: set ifalias a39ee707-e36b-41d3-802f-63179ed4d580 for test-dummy0
          PASS: vrf
          PASS: vxlan
          FAIL: can't add fou port 7777, skipping test
          PASS: macsec
          PASS: ipsec
          3,7c3,7
          < sa[0]    spi=0x00000009 proto=0x32 salt=0x64636261 crypt=1
          < sa[0]    key=0x31323334 35363738 39303132 33343536
          < sa[1] rx ipaddr=0x00000000 00000000 00000000 c0a87b03
          < sa[1]    spi=0x00000009 proto=0x32 salt=0x64636261 crypt=1
          < sa[1]    key=0x31323334 35363738 39303132 33343536
          ---
          > sa[0]    spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1
          > sa[0]    key=0x34333231 38373635 32313039 36353433
          > sa[1] rx ipaddr=0x00000000 00000000 00000000 037ba8c0
          > sa[1]    spi=0x00000009 proto=0x32 salt=0x61626364 crypt=1
          > sa[1]    key=0x34333231 38373635 32313039 36353433
          FAIL: ipsec_offload incorrect driver data
          FAIL: ipsec_offload
          PASS: bridge fdb get
          PASS: neigh get
          $ echo $?
          0
      
      Make "ret" become a local variable for all sub-tests.
      Also, check the sub-test results in kci_test_rtnl() and return the
      final result for this test.
      Signed-off-by: default avatarPo-Hsu Lin <po-hsu.lin@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2a4d274
    • Vladimir Oltean's avatar
      net: dsa: sja1105: use detected device id instead of DT one on mismatch · 0b0e2997
      Vladimir Oltean authored
      Although we can detect the chip revision 100% at runtime, it is useful
      to specify it in the device tree compatible string too, because
      otherwise there would be no way to assess the correctness of device tree
      bindings statically, without booting a board (only some switch versions
      have internal RGMII delays and/or an SGMII port).
      
      But for testing the P/Q/R/S support, what I have is a reworked board
      with the SJA1105T replaced by a pin-compatible SJA1105Q, and I don't
      want to keep a separate device tree blob just for this one-off board.
      Since just the chip has been replaced, its RGMII delay setup is
      inherently the same (meaning: delays added by the PHY on the slave
      ports, and by PCB traces on the fixed-link CPU port).
      
      For this board, I'd rather have the driver shout at me, but go ahead and
      use what it found even if it doesn't match what it's been told is there.
      
      [    2.970826] sja1105 spi0.1: Device tree specifies chip SJA1105T but found SJA1105Q, please fix it!
      [    2.980010] sja1105 spi0.1: Probed switch chip: SJA1105Q
      [    3.005082] sja1105 spi0.1: Enabled switch tagging
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0b0e2997
    • David S. Miller's avatar
      Merge branch 'net-fix-a-mcast-issue-for-tipc-udp-media' · 273d405b
      David S. Miller authored
      Xin Long says:
      
      ====================
      net: fix a mcast issue for tipc udp media
      
      Patch 1 is to add a function to get the dev by source address,
      which will be used by Patch 2.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      273d405b
    • Xin Long's avatar
      tipc: set ub->ifindex for local ipv6 address · 5a6f6f57
      Xin Long authored
      Without ub->ifindex set for ipv6 address in tipc_udp_enable(),
      ipv6_sock_mc_join() may make the wrong dev join the multicast
      address in enable_mcast(). This causes that tipc links would
      never be created.
      
      So fix it by getting the right netdev and setting ub->ifindex,
      as it does for ipv4 address.
      Reported-by: default avatarShuang Li <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a6f6f57
    • Xin Long's avatar
      ipv6: add ipv6_dev_find() · 81f6cb31
      Xin Long authored
      This is to add an ip_dev_find like function for ipv6, used to find
      the dev by saddr.
      
      It will be used by TIPC protocol. So also export it.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81f6cb31
    • Tonghao Zhang's avatar
      net: openvswitch: silence suspicious RCU usage warning · 5845589e
      Tonghao Zhang authored
      ovs_flow_tbl_destroy always is called from RCU callback
      or error path. It is no need to check if rcu_read_lock
      or lockdep_ovsl_is_held was held.
      
      ovs_dp_cmd_fill_info always is called with ovs_mutex,
      So use the rcu_dereference_ovsl instead of rcu_dereference
      in ovs_flow_tbl_masks_cache_size.
      
      Fixes: 9bf24f59 ("net: openvswitch: make masks cache size configurable")
      Cc: Eelco Chaudron <echaudro@redhat.com>
      Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com
      Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com
      Signed-off-by: default avatarTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5845589e
    • Hangbin Liu's avatar
      Revert "vxlan: fix tos value before xmit" · a0dced17
      Hangbin Liu authored
      This reverts commit 71130f29.
      
      In commit 71130f29 ("vxlan: fix tos value before xmit") we want to
      make sure the tos value are filtered by RT_TOS() based on RFC1349.
      
             0     1     2     3     4     5     6     7
          +-----+-----+-----+-----+-----+-----+-----+-----+
          |   PRECEDENCE    |          TOS          | MBZ |
          +-----+-----+-----+-----+-----+-----+-----+-----+
      
      But RFC1349 has been obsoleted by RFC2474. The new DSCP field defined like
      
             0     1     2     3     4     5     6     7
          +-----+-----+-----+-----+-----+-----+-----+-----+
          |          DS FIELD, DSCP           | ECN FIELD |
          +-----+-----+-----+-----+-----+-----+-----+-----+
      
      So with
      
      IPTOS_TOS_MASK          0x1E
      RT_TOS(tos)		((tos)&IPTOS_TOS_MASK)
      
      the first 3 bits DSCP info will get lost.
      
      To take all the DSCP info in xmit, we should revert the patch and just push
      all tos bits to ip_tunnel_ecn_encap(), which will handling ECN field later.
      
      Fixes: 71130f29 ("vxlan: fix tos value before xmit")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0dced17
    • Vladimir Oltean's avatar
      ptp: only allow phase values lower than 1 period · c29f9aa3
      Vladimir Oltean authored
      The way we define the phase (the difference between the time of the
      signal's rising edge, and the closest integer multiple of the period),
      it doesn't make sense to have a phase value equal or larger than 1
      period.
      
      So deny these settings coming from the user.
      Signed-off-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Acked-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c29f9aa3
  2. 04 Aug, 2020 31 commits
    • Christophe JAILLET's avatar
      farsync: switch from 'pci_' to 'dma_' API · 4c900a6b
      Christophe JAILLET authored
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below and has been
      hand modified to replace GFP_ with a correct flag.
      It has been compile tested.
      
      When memory is allocated in 'fst_add_one()', GFP_KERNEL can be used
      because it is a probe function and no lock is acquired.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c900a6b
    • Christophe JAILLET's avatar
      wan: wanxl: switch from 'pci_' to 'dma_' API · 24dd377a
      Christophe JAILLET authored
      The wrappers in include/linux/pci-dma-compat.h should go away.
      
      The patch has been generated with the coccinelle script below and has been
      hand modified to replace GFP_ with a correct flag.
      It has been compile tested.
      
      When memory is allocated in 'wanxl_pci_init_one()', GFP_KERNEL can be used
      because it is a probe function and no lock is acquired.
      Moreover, just a few lines above, GFP_KERNEL is already used.
      
      @@
      @@
      -    PCI_DMA_BIDIRECTIONAL
      +    DMA_BIDIRECTIONAL
      
      @@
      @@
      -    PCI_DMA_TODEVICE
      +    DMA_TO_DEVICE
      
      @@
      @@
      -    PCI_DMA_FROMDEVICE
      +    DMA_FROM_DEVICE
      
      @@
      @@
      -    PCI_DMA_NONE
      +    DMA_NONE
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_alloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3;
      @@
      -    pci_zalloc_consistent(e1, e2, e3)
      +    dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_free_consistent(e1, e2, e3, e4)
      +    dma_free_coherent(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_single(e1, e2, e3, e4)
      +    dma_map_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_single(e1, e2, e3, e4)
      +    dma_unmap_single(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4, e5;
      @@
      -    pci_map_page(e1, e2, e3, e4, e5)
      +    dma_map_page(&e1->dev, e2, e3, e4, e5)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_page(e1, e2, e3, e4)
      +    dma_unmap_page(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_map_sg(e1, e2, e3, e4)
      +    dma_map_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_unmap_sg(e1, e2, e3, e4)
      +    dma_unmap_sg(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
      +    dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_single_for_device(e1, e2, e3, e4)
      +    dma_sync_single_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
      +    dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2, e3, e4;
      @@
      -    pci_dma_sync_sg_for_device(e1, e2, e3, e4)
      +    dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
      
      @@
      expression e1, e2;
      @@
      -    pci_dma_mapping_error(e1, e2)
      +    dma_mapping_error(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_dma_mask(e1, e2)
      +    dma_set_mask(&e1->dev, e2)
      
      @@
      expression e1, e2;
      @@
      -    pci_set_consistent_dma_mask(e1, e2)
      +    dma_set_coherent_mask(&e1->dev, e2)
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      24dd377a
    • Stephen Hemminger's avatar
      hv_netvsc: do not use VF device if link is down · 7c9864bb
      Stephen Hemminger authored
      If the accelerated networking SRIOV VF device has lost carrier
      use the synthetic network device which is available as backup
      path. This is a rare case since if VF link goes down, normally
      the VMBus device will also loose external connectivity as well.
      But if the communication is between two VM's on the same host
      the VMBus device will still work.
      Reported-by: default avatar"Shah, Ashish N" <ashish.n.shah@intel.com>
      Fixes: 0c195567 ("netvsc: transparent VF management")
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Reviewed-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c9864bb
    • YueHaibing's avatar
      dpaa2-eth: Fix passing zero to 'PTR_ERR' warning · 02afa9c6
      YueHaibing authored
      Fix smatch warning:
      
      drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c:2419
       alloc_channel() warn: passing zero to 'ERR_PTR'
      
      setup_dpcon() should return ERR_PTR(err) instead of zero in error
      handling case.
      
      Fixes: d7f5a9d8 ("dpaa2-eth: defer probe on object allocate")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02afa9c6
    • Stefan Roese's avatar
      net: macb: Properly handle phylink on at91sam9x · f7ba7dbf
      Stefan Roese authored
      I just recently noticed that ethernet does not work anymore since v5.5
      on the GARDENA smart Gateway, which is based on the AT91SAM9G25.
      Debugging showed that the "GEM bits" in the NCFGR register are now
      unconditionally accessed, which is incorrect for the !macb_is_gem()
      case.
      
      This patch adds the macb_is_gem() checks back to the code
      (in macb_mac_config() & macb_mac_link_up()), so that the GEM register
      bits are not accessed in this case any more.
      
      Fixes: 7897b071 ("net: macb: convert to phylink")
      Signed-off-by: default avatarStefan Roese <sr@denx.de>
      Cc: Reto Schneider <reto.schneider@husqvarnagroup.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7ba7dbf
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · ee895a30
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Flush the cleanup xtables worker to make sure destructors
         have completed, from Florian Westphal.
      
      2) iifgroup is matching erroneously, also from Florian.
      
      3) Add selftest for meta interface matching, from Florian Westphal.
      
      4) Move nf_ct_offload_timeout() to header, from Roi Dayan.
      
      5) Call nf_ct_offload_timeout() from flow_offload_add() to
         make sure garbage collection does not evict offloaded flow,
         from Roi Dayan.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee895a30
    • Xin Long's avatar
      net: thunderx: use spin_lock_bh in nicvf_set_rx_mode_task() · bab9693a
      Xin Long authored
      A dead lock was triggered on thunderx driver:
      
              CPU0                    CPU1
              ----                    ----
         [01] lock(&(&nic->rx_mode_wq_lock)->rlock);
                                 [11] lock(&(&mc->mca_lock)->rlock);
                                 [12] lock(&(&nic->rx_mode_wq_lock)->rlock);
         [02] <Interrupt> lock(&(&mc->mca_lock)->rlock);
      
      The path for each is:
      
        [01] worker_thread() -> process_one_work() -> nicvf_set_rx_mode_task()
        [02] mld_ifc_timer_expire()
        [11] ipv6_add_dev() -> ipv6_dev_mc_inc() -> igmp6_group_added() ->
        [12] dev_mc_add() -> __dev_set_rx_mode() -> nicvf_set_rx_mode()
      
      To fix it, it needs to disable bh on [1], so that the timer on [2]
      wouldn't be triggered until rx_mode_wq_lock is released. So change
      to use spin_lock_bh() instead of spin_lock().
      
      Thanks to Paolo for helping with this.
      
      v1->v2:
        - post to netdev.
      Reported-by: default avatarRafael P. <rparrazo@redhat.com>
      Tested-by: default avatarDean Nelson <dnelson@redhat.com>
      Fixes: 469998c8 ("net: thunderx: prevent concurrent data re-writing by nicvf_set_rx_mode")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bab9693a
    • David S. Miller's avatar
      Merge branch 'Support-PMTU-discovery-with-bridged-UDP-tunnels' · 2ac24d6d
      David S. Miller authored
      Stefano Brivio says:
      
      ====================
      Support PMTU discovery with bridged UDP tunnels
      
      Currently, PMTU discovery for UDP tunnels only works if packets are
      routed to the encapsulating interfaces, not bridged.
      
      This results from the fact that we generally don't have valid routes
      to the senders we can use to relay ICMP and ICMPv6 errors, and makes
      PMTU discovery completely non-functional for VXLAN and GENEVE ports of
      both regular bridges and Open vSwitch instances.
      
      If the sender is local, and packets are forwarded to the port by a
      regular bridge, all it takes is to generate a corresponding route
      exception on the encapsulating device. The bridge then finds the route
      exception carrying the PMTU value estimate as it forwards frames, and
      relays ICMP messages back to the socket of the local sender. Patch 1/6
      fixes this case.
      
      If the sender resides on another node, we actually need to reply to
      IP and IPv6 packets ourselves and send these ICMP or ICMPv6 errors
      back, using the same encapsulating device. Patch 2/6, based on an
      original idea by Florian Westphal, adds the needed functionality,
      while patches 3/6 and 4/6 add matching support for VXLAN and GENEVE.
      
      Finally, 5/6 and 6/6 introduce selftests for all combinations of
      inner and outer IP versions, covering both VXLAN and GENEVE, with
      both regular bridges and Open vSwitch instances.
      
      v2: Add helper to check for any bridge port, skip oif check for PMTU
          routes for bridge ports only, split IPv4 and IPv6 helpers and
          functions (all suggested by David Ahern)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ac24d6d
    • Stefano Brivio's avatar
      selftests: pmtu.sh: Add tests for UDP tunnels handled by Open vSwitch · 7b53682c
      Stefano Brivio authored
      The new tests check that IP and IPv6 packets exceeding the local PMTU
      estimate, forwarded by an Open vSwitch instance from another node,
      result in the correct route exceptions being created, and that
      communication with end-to-end fragmentation, over GENEVE and VXLAN
      Open vSwitch ports, is now possible as a result of PMTU discovery.
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b53682c
    • Stefano Brivio's avatar
      selftests: pmtu.sh: Add tests for bridged UDP tunnels · df40e39c
      Stefano Brivio authored
      The new tests check that IP and IPv6 packets exceeding the local PMTU
      estimate, both locally generated and forwarded by a bridge from
      another node, result in the correct route exceptions being created,
      and that communication with end-to-end fragmentation over VXLAN and
      GENEVE tunnels is now possible as a result of PMTU discovery.
      
      Part of the existing setup functions aren't generic enough to simply
      add a namespace and a bridge to the existing routing setup. This
      rework is in progress and we can easily shrink this once more generic
      topology functions are available.
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df40e39c
    • Stefano Brivio's avatar
      geneve: Support for PMTU discovery on directly bridged links · c1a800e8
      Stefano Brivio authored
      If the interface is a bridge or Open vSwitch port, and we can't
      forward a packet because it exceeds the local PMTU estimate,
      trigger an ICMP or ICMPv6 reply to the sender, using the same
      interface to forward it back.
      
      If metadata collection is enabled, set destination and source
      addresses for the flow as if we were receiving the packet, so that
      Open vSwitch can match the ICMP error against the existing
      association.
      
      v2: Use netif_is_any_bridge_port() (David Ahern)
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1a800e8
    • Stefano Brivio's avatar
      vxlan: Support for PMTU discovery on directly bridged links · fc68c995
      Stefano Brivio authored
      If the interface is a bridge or Open vSwitch port, and we can't
      forward a packet because it exceeds the local PMTU estimate,
      trigger an ICMP or ICMPv6 reply to the sender, using the same
      interface to forward it back.
      
      If metadata collection is enabled, reverse destination and source
      addresses, so that Open vSwitch is able to match this packet against
      the existing, reverse flow.
      
      v2: Use netif_is_any_bridge_port() (David Ahern)
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc68c995
    • Stefano Brivio's avatar
      tunnels: PMTU discovery support for directly bridged IP packets · 4cb47a86
      Stefano Brivio authored
      It's currently possible to bridge Ethernet tunnels carrying IP
      packets directly to external interfaces without assigning them
      addresses and routes on the bridged network itself: this is the case
      for UDP tunnels bridged with a standard bridge or by Open vSwitch.
      
      PMTU discovery is currently broken with those configurations, because
      the encapsulation effectively decreases the MTU of the link, and
      while we are able to account for this using PMTU discovery on the
      lower layer, we don't have a way to relay ICMP or ICMPv6 messages
      needed by the sender, because we don't have valid routes to it.
      
      On the other hand, as a tunnel endpoint, we can't fragment packets
      as a general approach: this is for instance clearly forbidden for
      VXLAN by RFC 7348, section 4.3:
      
         VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
         fragment encapsulated VXLAN packets due to the larger frame size.
         The destination VTEP MAY silently discard such VXLAN fragments.
      
      The same paragraph recommends that the MTU over the physical network
      accomodates for encapsulations, but this isn't a practical option for
      complex topologies, especially for typical Open vSwitch use cases.
      
      Further, it states that:
      
         Other techniques like Path MTU discovery (see [RFC1191] and
         [RFC1981]) MAY be used to address this requirement as well.
      
      Now, PMTU discovery already works for routed interfaces, we get
      route exceptions created by the encapsulation device as they receive
      ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
      we already rebuild those messages with the appropriate MTU and route
      them back to the sender.
      
      Add the missing bits for bridged cases:
      
      - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
        to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
        RFC 4443 section 2.4 for ICMPv6. This function is already called by
        UDP tunnels
      
      - a new function generating those ICMP or ICMPv6 replies. We can't
        reuse icmp_send() and icmp6_send() as we don't see the sender as a
        valid destination. This doesn't need to be generic, as we don't
        cover any other type of ICMP errors given that we only provide an
        encapsulation function to the sender
      
      While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
      we might receive GSO buffers here, and the passed headroom already
      includes the inner MAC length, so we don't have to account for it
      a second time (that would imply three MAC headers on the wire, but
      there are just two).
      
      This issue became visible while bridging IPv6 packets with 4500 bytes
      of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
      bytes of encapsulation headroom, we would advertise MTU as 3950, and
      we would reject fragmented IPv6 datagrams of 3958 bytes size on the
      wire. We're exclusively dealing with network MTU here, though, so we
      could get Ethernet frames up to 3964 octets in that case.
      
      v2:
      - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
      - split IPv4/IPv6 functions (David Ahern)
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cb47a86
    • Stefano Brivio's avatar
      ipv4: route: Ignore output interface in FIB lookup for PMTU route · df23bb18
      Stefano Brivio authored
      Currently, processes sending traffic to a local bridge with an
      encapsulation device as a port don't get ICMP errors if they exceed
      the PMTU of the encapsulated link.
      
      David Ahern suggested this as a hack, but it actually looks like
      the correct solution: when we update the PMTU for a given destination
      by means of updating or creating a route exception, the encapsulation
      might trigger this because of PMTU discovery happening either on the
      encapsulation device itself, or its lower layer. This happens on
      bridged encapsulations only.
      
      The output interface shouldn't matter, because we already have a
      valid destination. Drop the output interface restriction from the
      associated route lookup.
      
      For UDP tunnels, we will now have a route exception created for the
      encapsulation itself, with a MTU value reflecting its headroom, which
      allows a bridge forwarding IP packets originated locally to deliver
      errors back to the sending socket.
      
      The behaviour is now consistent with IPv6 and verified with selftests
      pmtu_ipv{4,6}_br_{geneve,vxlan}{4,6}_exception introduced later in
      this series.
      
      v2:
      - reset output interface only for bridge ports (David Ahern)
      - add and use netif_is_any_bridge_port() helper (David Ahern)
      Suggested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df23bb18
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-2020-08-04' of... · cabf06e5
      David S. Miller authored
      Merge tag 'wireless-drivers-next-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for v5.9
      
      Second set of patches for v5.9. mt76 has most of patches this time.
      Otherwise it's just smaller fixes and cleanups to other drivers.
      
      There was a major conflict in mt76 driver between wireless-drivers and
      wireless-drivers-next. I solved that by merging the former to the
      latter.
      
      Major changes:
      
      rtw88
      
      * add support for ieee80211_ops::change_interface
      
      * add support for enabling and disabling beacon
      
      * add debugfs file for testing h2c
      
      mt76
      
      * ARP filter offload for 7663
      
      * runtime power management for 7663
      
      * testmode support for mfg calibration
      
      * support for more channels
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cabf06e5
    • Joe Perches's avatar
      via-velocity: Use more typical logging styles · 93f4ddd6
      Joe Perches authored
      Use netdev_<level> in place of VELOCITY_PRT.
      Use pr_<level> in place of printk(KERN_<LEVEL>.
      
      Miscellanea:
      
      o Add pr_fmt to prefix pr_<level> output with "via-velocity: "
      o Remove now unused functions and macros
      o Realign some logging lines
      o Remove devname where pr_<level> is also used
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93f4ddd6
    • David S. Miller's avatar
      Merge branch 'hinic-mailbox-channel-enhancement' · a79da695
      David S. Miller authored
      Luo bin says:
      
      ====================
      hinic: mailbox channel enhancement
      
      add support to generate mailbox random id for VF to ensure that
      the mailbox message from VF is valid and PF should check whether
      the cmd from VF is supported before passing it to hw.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a79da695
    • Luo bin's avatar
      hinic: add check for mailbox msg from VF · c8c29ec3
      Luo bin authored
      PF should check whether the cmd from VF is supported and its content
      is right before passing it to hw.
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8c29ec3
    • Luo bin's avatar
      hinic: add generating mailbox random index support · 088c5f0d
      Luo bin authored
      add support to generate mailbox random id of VF to ensure that
      mailbox messages PF received are from the correct VF.
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      088c5f0d
    • Kalle Valo's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git · 2cfd71f1
      Kalle Valo authored
      mt76 driver had major conflicts within mt7615 directory. To make it easier for
      every merge wireless-drivers to wireless-drivers-next and solve those
      conflicts.
      2cfd71f1
    • David S. Miller's avatar
      sfc: Fix build with CONFIG_RFS_ACCEL disabled. · da795540
      David S. Miller authored
         drivers/net/ethernet/sfc/ef100_nic.c:835:3: error: 'const struct efx_nic_type' has no member named 'filter_rfs_expire_one'
           835 |  .filter_rfs_expire_one = efx_mcdi_filter_rfs_expire_one,
               |   ^~~~~~~~~~~~~~~~~~~~~
      >> drivers/net/ethernet/sfc/ef100_nic.c:835:27: error: initialization of 'void (*)(struct efx_nic *, u32)' {aka 'void (*)(struct efx_nic *, unsigned int)'} from incompatible pointer type 'bool (*)(struct efx_nic *, u32,  unsigned int)' {aka '_Bool (*)(struct efx_nic *, unsigned int,  unsigned int)'} [-Werror=incompatible-pointer-types]
           835 |  .filter_rfs_expire_one = efx_mcdi_filter_rfs_expire_one,
               |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da795540
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 2e7199bd
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2020-08-04
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      We've added 73 non-merge commits during the last 9 day(s) which contain
      a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).
      
      The main changes are:
      
      1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
         syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.
      
      2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
         in-kernel inspection, from Yonghong Song and Alexei Starovoitov.
      
      3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
         unwinder errors, from Song Liu.
      
      4) Allow cgroup local storage map to be shared between programs on the same
         cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.
      
      5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
         load instructions, from Jean-Philippe Brucker.
      
      6) Follow-up fixes on BPF socket lookup in combination with reuseport group
         handling. Also add related BPF selftests, from Jakub Sitnicki.
      
      7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
         socket create/release as well as bind functions, from Stanislav Fomichev.
      
      8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
         xdp_statistics, from Peilin Ye.
      
      9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.
      
      10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
          fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.
      
      11) Fix a bpftool segfault due to missing program type name and make it more robust
          to prevent them in future gaps, from Quentin Monnet.
      
      12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
          resolver issue, from John Fastabend.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e7199bd
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 76769c38
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2020-08-03
      
      This patchset introduces some updates to mlx5 driver.
      
      1) Jakub converts mlx5 to use the new udp tunnel infrastructure.
         Starting with a hack to allow drivers to request a static configuration
         of the default vxlan port, and then a patch that converts mlx5.
      
      2) Parav implements change_carrier ndo for VF eswitch representors,
         to speedup link state control of representors netdevices.
      
      3) Alex Vesker, makes a simple update to software steering to fix an issue
         with push vlan action sequence
      
      4) Leon removes a redundant dump stack on error flow.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76769c38
    • David S. Miller's avatar
      Merge branch 'sfc-driver-for-EF100-family-NICs-part-2' · c4b83061
      David S. Miller authored
      Edward Cree says:
      
      ====================
      sfc: driver for EF100 family NICs, part 2
      
      This series implements the data path and various other functionality
       for Xilinx/Solarflare EF100 NICs.
      
      Changed from v2:
       * Improved error handling of design params (patch #3)
       * Removed 'inline' from .c file in patch #4
       * Don't report common stats to ethtool -S (patch #8)
      
      Changed from v1:
       * Fixed build errors on CONFIG_RFS_ACCEL=n (patch #5) and 32-bit
         (patch #8)
       * Dropped patch #10 (ethtool ops) as it's buggy and will need a
         bigger rework to fix.
      ====================
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4b83061
    • Edward Cree's avatar
      sfc_ef100: add nic-type for VFs, and bind to them · d61592a1
      Edward Cree authored
      We don't yet have a .sriov_configure() to create them, though.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d61592a1
    • Edward Cree's avatar
      sfc_ef100: read pf_index at probe time · ef2c57b9
      Edward Cree authored
      We'll need it later, for VF representors.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef2c57b9
    • Edward Cree's avatar
      sfc_ef100: functions for selftests · 43c3df0d
      Edward Cree authored
      Self-tests for event and interrupt reception and NVRAM.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43c3df0d
    • Edward Cree's avatar
      sfc_ef100: statistics gathering · b593b6f1
      Edward Cree authored
      MAC stats work much the same as on EF10, with a periodic DMA to a region
       specified via an MCDI.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b593b6f1
    • Edward Cree's avatar
      sfc_ef100: plumb in fini_dmaq · b780feac
      Edward Cree authored
      Bring down the TX and RX queues at ifdown, so that we can then fini the
       EVQs (otherwise the MC would return EBUSY because they're still in use).
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b780feac
    • Edward Cree's avatar
      sfc_ef100: RX path for EF100 · 8e57daf7
      Edward Cree authored
      Includes RSS spreading.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e57daf7
    • Edward Cree's avatar