1. 28 Sep, 2020 7 commits
    • David S. Miller's avatar
      Merge branch 'net-core-fix-a-lockdep-splat-in-the-dev_addr_list' · 0ba56b89
      David S. Miller authored
      Taehee Yoo says:
      
      ====================
      net: core: fix a lockdep splat in the dev_addr_list.
      
      This patchset is to avoid lockdep splat.
      
      When a stacked interface graph is changed, netif_addr_lock() is called
      recursively and it internally calls spin_lock_nested().
      The parameter of spin_lock_nested() is 'dev->lower_level',
      this is called subclass.
      The problem of 'dev->lower_level' is that while 'dev->lower_level' is
      being used as a subclass of spin_lock_nested(), its value can be changed.
      So, spin_lock_nested() would be called recursively with the same
      subclass value, the lockdep understands a deadlock.
      In order to avoid this, a new variable is needed and it is going to be
      used as a parameter of spin_lock_nested().
      The first and second patch is a preparation patch for the third patch.
      In the third patch, the problem will be fixed.
      
      The first patch is to add __netdev_upper_dev_unlink().
      An existed netdev_upper_dev_unlink() is renamed to
      __netdev_upper_dev_unlink(). and netdev_upper_dev_unlink()
      is added as an wrapper of this function.
      
      The second patch is to add the netdev_nested_priv structure.
      netdev_walk_all_{ upper | lower }_dev() pass both private functions
      and "data" pointer to handle their own things.
      At this point, the data pointer type is void *.
      In order to make it easier to expand common variables and functions,
      this new netdev_nested_priv structure is added.
      
      The third patch is to add a new variable 'nested_level'
      into the net_device structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      Due to this variable, it can avoid lockdep splat.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ba56b89
    • Taehee Yoo's avatar
      net: core: add nested_level variable in net_device · 1fc70edb
      Taehee Yoo authored
      This patch is to add a new variable 'nested_level' into the net_device
      structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      
      netif_addr_lock() can be called recursively so spin_lock_nested() is
      used instead of spin_lock() and dev->lower_level is used as a parameter
      of spin_lock_nested().
      But, dev->lower_level value can be updated while it is being used.
      So, lockdep would warn a possible deadlock scenario.
      
      When a stacked interface is deleted, netif_{uc | mc}_sync() is
      called recursively.
      So, spin_lock_nested() is called recursively too.
      At this moment, the dev->lower_level variable is used as a parameter of it.
      dev->lower_level value is updated when interfaces are being unlinked/linked
      immediately.
      Thus, After unlinking, dev->lower_level shouldn't be a parameter of
      spin_lock_nested().
      
          A (macvlan)
          |
          B (vlan)
          |
          C (bridge)
          |
          D (macvlan)
          |
          E (vlan)
          |
          F (bridge)
      
          A->lower_level : 6
          B->lower_level : 5
          C->lower_level : 4
          D->lower_level : 3
          E->lower_level : 2
          F->lower_level : 1
      
      When an interface 'A' is removed, it releases resources.
      At this moment, netif_addr_lock() would be called.
      Then, netdev_upper_dev_unlink() is called recursively.
      Then dev->lower_level is updated.
      There is no problem.
      
      But, when the bridge module is removed, 'C' and 'F' interfaces
      are removed at once.
      If 'F' is removed first, a lower_level value is like below.
          A->lower_level : 5
          B->lower_level : 4
          C->lower_level : 3
          D->lower_level : 2
          E->lower_level : 1
          F->lower_level : 1
      
      Then, 'C' is removed. at this moment, netif_addr_lock() is called
      recursively.
      The ordering is like this.
      C(3)->D(2)->E(1)->F(1)
      At this moment, the lower_level value of 'E' and 'F' are the same.
      So, lockdep warns a possible deadlock scenario.
      
      In order to avoid this problem, a new variable 'nested_level' is added.
      This value is the same as dev->lower_level - 1.
      But this value is updated in rtnl_unlock().
      So, this variable can be used as a parameter of spin_lock_nested() safely
      in the rtnl context.
      
      Test commands:
         ip link add br0 type bridge vlan_filtering 1
         ip link add vlan1 link br0 type vlan id 10
         ip link add macvlan2 link vlan1 type macvlan
         ip link add br3 type bridge vlan_filtering 1
         ip link set macvlan2 master br3
         ip link add vlan4 link br3 type vlan id 10
         ip link add macvlan5 link vlan4 type macvlan
         ip link add br6 type bridge vlan_filtering 1
         ip link set macvlan5 master br6
         ip link add vlan7 link br6 type vlan id 10
         ip link add macvlan8 link vlan7 type macvlan
      
         ip link set br0 up
         ip link set vlan1 up
         ip link set macvlan2 up
         ip link set br3 up
         ip link set vlan4 up
         ip link set macvlan5 up
         ip link set br6 up
         ip link set vlan7 up
         ip link set macvlan8 up
         modprobe -rv bridge
      
      Splat looks like:
      [   36.057436][  T744] WARNING: possible recursive locking detected
      [   36.058848][  T744] 5.9.0-rc6+ #728 Not tainted
      [   36.059959][  T744] --------------------------------------------
      [   36.061391][  T744] ip/744 is trying to acquire lock:
      [   36.062590][  T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30
      [   36.064922][  T744]
      [   36.064922][  T744] but task is already holding lock:
      [   36.066626][  T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.068851][  T744]
      [   36.068851][  T744] other info that might help us debug this:
      [   36.070731][  T744]  Possible unsafe locking scenario:
      [   36.070731][  T744]
      [   36.072497][  T744]        CPU0
      [   36.073238][  T744]        ----
      [   36.074007][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.075290][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.076590][  T744]
      [   36.076590][  T744]  *** DEADLOCK ***
      [   36.076590][  T744]
      [   36.078515][  T744]  May be due to missing lock nesting notation
      [   36.078515][  T744]
      [   36.080491][  T744] 3 locks held by ip/744:
      [   36.081471][  T744]  #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490
      [   36.083614][  T744]  #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.085942][  T744]  #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80
      [   36.088400][  T744]
      [   36.088400][  T744] stack backtrace:
      [   36.089772][  T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728
      [   36.091364][  T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   36.093630][  T744] Call Trace:
      [   36.094416][  T744]  dump_stack+0x77/0x9b
      [   36.095385][  T744]  __lock_acquire+0xbc3/0x1f40
      [   36.096522][  T744]  lock_acquire+0xb4/0x3b0
      [   36.097540][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.098657][  T744]  ? rtmsg_ifinfo+0x1f/0x30
      [   36.099711][  T744]  ? __dev_notify_flags+0xa5/0xf0
      [   36.100874][  T744]  ? rtnl_is_locked+0x11/0x20
      [   36.101967][  T744]  ? __dev_set_promiscuity+0x7b/0x1a0
      [   36.103230][  T744]  _raw_spin_lock_bh+0x38/0x70
      [   36.104348][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.105461][  T744]  dev_set_rx_mode+0x19/0x30
      [   36.106532][  T744]  dev_set_promiscuity+0x36/0x50
      [   36.107692][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.108929][  T744]  dev_set_promiscuity+0x1e/0x50
      [   36.110093][  T744]  br_port_set_promisc+0x1f/0x40 [bridge]
      [   36.111415][  T744]  br_manage_promisc+0x8b/0xe0 [bridge]
      [   36.112728][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.113967][  T744]  ? __hw_addr_sync_one+0x23/0x50
      [   36.115135][  T744]  __dev_set_rx_mode+0x68/0x90
      [   36.116249][  T744]  dev_uc_sync+0x70/0x80
      [   36.117244][  T744]  dev_uc_add+0x50/0x60
      [   36.118223][  T744]  macvlan_open+0x18e/0x1f0 [macvlan]
      [   36.119470][  T744]  __dev_open+0xd6/0x170
      [   36.120470][  T744]  __dev_change_flags+0x181/0x1d0
      [   36.121644][  T744]  dev_change_flags+0x23/0x60
      [   36.122741][  T744]  do_setlink+0x30a/0x11e0
      [   36.123778][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.124929][  T744]  ? __nla_validate_parse.part.6+0x45/0x8e0
      [   36.126309][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.127457][  T744]  __rtnl_newlink+0x546/0x8e0
      [   36.128560][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.129623][  T744]  ? deactivate_slab.isra.85+0x6a1/0x850
      [   36.130946][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.132102][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.133176][  T744]  ? is_bpf_text_address+0x5/0xe0
      [   36.134364][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.135445][  T744]  ? rcu_read_lock_sched_held+0x32/0x60
      [   36.136771][  T744]  ? kmem_cache_alloc_trace+0x2d8/0x380
      [   36.138070][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.139164][  T744]  rtnl_newlink+0x47/0x70
      [ ... ]
      
      Fixes: 845e0ebb ("net: change addr_list_lock back to static key")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fc70edb
    • Taehee Yoo's avatar
      net: core: introduce struct netdev_nested_priv for nested interface infrastructure · eff74233
      Taehee Yoo authored
      Functions related to nested interface infrastructure such as
      netdev_walk_all_{ upper | lower }_dev() pass both private functions
      and "data" pointer to handle their own things.
      At this point, the data pointer type is void *.
      In order to make it easier to expand common variables and functions,
      this new netdev_nested_priv structure is added.
      
      In the following patch, a new member variable will be added into this
      struct to fix the lockdep issue.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eff74233
    • Taehee Yoo's avatar
      net: core: add __netdev_upper_dev_unlink() · fe8300fd
      Taehee Yoo authored
      The netdev_upper_dev_unlink() has to work differently according to flags.
      This idea is the same with __netdev_upper_dev_link().
      
      In the following patches, new flags will be added.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe8300fd
    • Cong Wang's avatar
      net_sched: remove a redundant goto chain check · 1aad8049
      Cong Wang authored
      All TC actions call tcf_action_check_ctrlact() to validate
      goto chain, so this check in tcf_action_init_1() is actually
      redundant. Remove it to save troubles of leaking memory.
      
      Fixes: e49d8c22 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()")
      Reported-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Suggested-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1aad8049
    • Nikolay Aleksandrov's avatar
      net: bridge: fdb: don't flush ext_learn entries · f2f3729f
      Nikolay Aleksandrov authored
      When a user-space software manages fdb entries externally it should
      set the ext_learn flag which marks the fdb entry as externally managed
      and avoids expiring it (they're treated as static fdbs). Unfortunately
      on events where fdb entries are flushed (STP down, netlink fdb flush
      etc) these fdbs are also deleted automatically by the bridge. That in turn
      causes trouble for the managing user-space software (e.g. in MLAG setups
      we lose remote fdb entries on port flaps).
      These entries are completely externally managed so we should avoid
      automatically deleting them, the only exception are offloaded entries
      (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as
      before.
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2f3729f
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · a4be47af
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2020-09-28
      
      1) Fix a build warning in ip_vti if CONFIG_IPV6 is not set.
         From YueHaibing.
      
      2) Restore IPCB on espintcp before handing the packet to xfrm
         as the information there is still needed.
         From Sabrina Dubroca.
      
      3) Fix pmtu updating for xfrm interfaces.
         From Sabrina Dubroca.
      
      4) Some xfrm state information was not cloned with xfrm_do_migrate.
         Fixes to clone the full xfrm state, from Antony Antony.
      
      5) Use the correct address family in xfrm_state_find. The struct
         flowi must always be interpreted along with the original
         address family. This got lost over the years.
         Fix from Herbert Xu.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4be47af
  2. 27 Sep, 2020 4 commits
  3. 26 Sep, 2020 7 commits
  4. 25 Sep, 2020 22 commits
    • Ivan Khoronzhuk's avatar
      net: ethernet: cavium: octeon_mgmt: use phy_start and phy_stop · 4663ff60
      Ivan Khoronzhuk authored
      To start also "phy state machine", with UP state as it should be,
      the phy_start() has to be used, in another case machine even is not
      triggered. After this change negotiation is supposed to be triggered
      by SM workqueue.
      
      It's not correct usage, but it appears after the following patch,
      so add it as a fix.
      
      Fixes: 74a992b3 ("net: phy: add phy_check_link_status")
      Signed-off-by: default avatarIvan Khoronzhuk <ikhoronz@cisco.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4663ff60
    • Wong Vee Khee's avatar
      net: stmmac: Fix clock handling on remove path · ac322f86
      Wong Vee Khee authored
      While unloading the dwmac-intel driver, clk_disable_unprepare() is
      being called twice in stmmac_dvr_remove() and
      intel_eth_pci_remove(). This causes kernel panic on the second call.
      
      Removing the second call of clk_disable_unprepare() in
      intel_eth_pci_remove().
      
      Fixes: 09f012e6 ("stmmac: intel: Fix clock handling on error and remove paths")
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@intel.com>
      Reviewed-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac322f86
    • Ronak Doshi's avatar
      vmxnet3: fix cksum offload issues for non-udp tunnels · 1dac3b1b
      Ronak Doshi authored
      Commit dacce2be ("vmxnet3: add geneve and vxlan tunnel offload
      support") added support for encapsulation offload. However, the inner
      offload capability is to be restrictued to UDP tunnels.
      
      This patch fixes the issue for non-udp tunnels by adding features
      check capability and filtering appropriate features for non-udp tunnels.
      
      Fixes: dacce2be ("vmxnet3: add geneve and vxlan tunnel offload support")
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1dac3b1b
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · abe2f12d
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2020-09-25
      
      This series contains updates to the iavf and ice driver.
      
      Sylwester fixes a crash with iavf resume due to getting the wrong pointers.
      
      Ani fixes a call trace in ice resume by calling pci_save_state().
      
      Jakes fixes memory leaks in case of register_netdev() failure or
      ice_cfg_vsi_lan() failure for the ice driver.
      
      v2: Rebased; no other changes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      abe2f12d
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2020-09-25' of... · 4e1b469a
      David S. Miller authored
      Merge tag 'wireless-drivers-2020-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.9
      
      Second, and last, set of fixes for v5.9. Only one important regression
      fix for mt76.
      
      mt76
      
      * fix a regression in aggregation which appeared after mac80211 changes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e1b469a
    • Jacob Keller's avatar
      ice: fix memory leak in ice_vsi_setup · f6a07271
      Jacob Keller authored
      During ice_vsi_setup, if ice_cfg_vsi_lan fails, it does not properly
      release memory associated with the VSI rings. If we had used devres
      allocations for the rings, this would be ok. However, we use kzalloc and
      kfree_rcu for these ring structures.
      
      Using the correct label to cleanup the rings during ice_vsi_setup
      highlights an issue in the ice_vsi_clear_rings function: it can leave
      behind stale ring pointers in the q_vectors structure.
      
      When releasing rings, we must also ensure that no q_vector associated
      with the VSI will point to this ring again. To resolve this, loop over
      all q_vectors and release their ring mapping. Because we are about to
      free all rings, no q_vector should remain pointing to any of the rings
      in this VSI.
      
      Fixes: 5513b920 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f6a07271
    • Jacob Keller's avatar
      ice: fix memory leak if register_netdev_fails · 135f4b9e
      Jacob Keller authored
      The ice_setup_pf_sw function can cause a memory leak if register_netdev
      fails, due to accidentally failing to free the VSI rings. Fix the memory
      leak by using ice_vsi_release, ensuring we actually go through the full
      teardown process.
      
      This should be safe even if the netdevice is not registered because we
      will have set the netdev pointer to NULL, ensuring ice_vsi_release won't
      call unregister_netdev.
      
      An alternative fix would be moving management of the PF VSI netdev into
      the main VSI setup code. This is complicated and likely requires
      significant refactor in how we manage VSIs
      
      Fixes: 3a858ba3 ("ice: Add support for VSI allocation and deallocation")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      135f4b9e
    • Anirudh Venkataramanan's avatar
      ice: Fix call trace on suspend · 466e4392
      Anirudh Venkataramanan authored
      It appears that the ice_suspend flow is missing a call to pci_save_state
      and this is triggering the message "State of device not saved by
      ice_suspend" and a call trace. Fix it.
      
      Fixes: 769c500d ("ice: Add advanced power mgmt for WoL")
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      466e4392
    • Sylwester Dziedziuch's avatar
      iavf: Fix incorrect adapter get in iavf_resume · 75598a8f
      Sylwester Dziedziuch authored
      When calling iavf_resume there was a crash because wrong
      function was used to get iavf_adapter and net_device pointers.
      Changed how iavf_resume is getting iavf_adapter and net_device
      pointers from pci_dev.
      
      Fixes: 5eae00c5 ("i40evf: main driver core")
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Reviewed-by: default avatarAleksandr Loktionov <aleksandr.loktionov@intel.com>
      Tested-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      75598a8f
    • Herbert Xu's avatar
      xfrm: Use correct address family in xfrm_state_find · e94ee171
      Herbert Xu authored
      The struct flowi must never be interpreted by itself as its size
      depends on the address family.  Therefore it must always be grouped
      with its original family value.
      
      In this particular instance, the original family value is lost in
      the function xfrm_state_find.  Therefore we get a bogus read when
      it's coupled with the wrong family which would occur with inter-
      family xfrm states.
      
      This patch fixes it by keeping the original family value.
      
      Note that the same bug could potentially occur in LSM through
      the xfrm_state_pol_flow_match hook.  I checked the current code
      there and it seems to be safe for now as only secid is used which
      is part of struct flowi_common.  But that API should be changed
      so that so that we don't get new bugs in the future.  We could
      do that by replacing fl with just secid or adding a family field.
      
      Reported-by: syzbot+577fbac3145a6eb2e7a5@syzkaller.appspotmail.com
      Fixes: 48b8d783 ("[XFRM]: State selection update to use inner...")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      e94ee171
    • Priyaranjan Jha's avatar
      tcp: skip DSACKs with dubious sequence ranges · ad2b9b0f
      Priyaranjan Jha authored
      Currently, we use length of DSACKed range to compute number of
      delivered packets. And if sequence range in DSACK is corrupted,
      we can get bogus dsacked/acked count, and bogus cwnd.
      
      This patch put bounds on DSACKed range to skip update of data
      delivery and spurious retransmission information, if the DSACK
      is unlikely caused by sender's action:
      - DSACKed range shouldn't be greater than maximum advertised rwnd.
      - Total no. of DSACKed segments shouldn't be greater than total
        no. of retransmitted segs. Unlike spurious retransmits, network
        duplicates or corrupted DSACKs shouldn't be counted as delivery.
      Signed-off-by: default avatarPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad2b9b0f
    • Jamie Iles's avatar
      net/fsl: quieten expected MDIO access failures · 1ec8e748
      Jamie Iles authored
      MDIO reads can happen during PHY probing, and printing an error with
      dev_err can result in a large number of error messages during device
      probe.  On a platform with a serial console this can result in
      excessively long boot times in a way that looks like an infinite loop
      when multiple busses are present.  Since 0f183fd1 (net/fsl: enable
      extended scanning in xgmac_mdio) we perform more scanning so there are
      potentially more failures.
      
      Reduce the logging level to dev_dbg which is consistent with the
      Freescale enetc driver.
      
      Cc: Jeremy Linton <jeremy.linton@arm.com>
      Signed-off-by: default avatarJamie Iles <jamie@nuviainc.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ec8e748
    • Helmut Grohne's avatar
      net: dsa: microchip: really look for phy-mode in port nodes · 912aae27
      Helmut Grohne authored
      The previous implementation failed to account for the "ports" node. The
      actual port nodes are not child nodes of the switch node, but a "ports"
      node sits in between.
      
      Fixes: edecfa98 ("net: dsa: microchip: look for phy-mode in port nodes")
      Signed-off-by: default avatarHelmut Grohne <helmut.grohne@intenta.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      912aae27
    • Rohit Maheshwari's avatar
      net/tls: race causes kernel panic · 38f7e1c0
      Rohit Maheshwari authored
      BUG: kernel NULL pointer dereference, address: 00000000000000b8
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 80000008b6fef067 P4D 80000008b6fef067 PUD 8b6fe6067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 12 PID: 23871 Comm: kworker/12:80 Kdump: loaded Tainted: G S
       5.9.0-rc3+ #1
       Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.1 03/29/2018
       Workqueue: events tx_work_handler [tls]
       RIP: 0010:tx_work_handler+0x1b/0x70 [tls]
       Code: dc fe ff ff e8 16 d4 a3 f6 66 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 8b
       6f 58 48 8b bd a0 04 00 00 48 85 ff 74 1c 48 8b 47 28 <48> 8b 90 b8 00 00 00 83
       e2 02 75 0c f0 48 0f ba b0 b8 00 00 00 00
       RSP: 0018:ffffa44ace61fe88 EFLAGS: 00010286
       RAX: 0000000000000000 RBX: ffff91da9e45cc30 RCX: dead000000000122
       RDX: 0000000000000001 RSI: ffff91da9e45cc38 RDI: ffff91d95efac200
       RBP: ffff91da133fd780 R08: 0000000000000000 R09: 000073746e657665
       R10: 8080808080808080 R11: 0000000000000000 R12: ffff91dad7d30700
       R13: ffff91dab6561080 R14: 0ffff91dad7d3070 R15: ffff91da9e45cc38
       FS:  0000000000000000(0000) GS:ffff91dad7d00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000000000b8 CR3: 0000000906478003 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        process_one_work+0x1a7/0x370
        worker_thread+0x30/0x370
        ? process_one_work+0x370/0x370
        kthread+0x114/0x130
        ? kthread_park+0x80/0x80
        ret_from_fork+0x22/0x30
      
      tls_sw_release_resources_tx() waits for encrypt_pending, which
      can have race, so we need similar changes as in commit
      0cada332 here as well.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Signed-off-by: default avatarRohit Maheshwari <rohitm@chelsio.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38f7e1c0
    • Wang Qing's avatar
      net/ethernet/broadcom: fix spelling typo · 0eb11dfe
      Wang Qing authored
      Modify the comment typo: "compliment" -> "complement".
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0eb11dfe
    • Xiaoliang Yang's avatar
      net: mscc: ocelot: fix fields offset in SG_CONFIG_REG_3 · 4ab810a4
      Xiaoliang Yang authored
      INIT_IPS and GATE_ENABLE fields have a wrong offset in SG_CONFIG_REG_3.
      This register is used by stream gate control of PSFP, and it has not
      been used before, because PSFP is not implemented in ocelot driver.
      Signed-off-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ab810a4
    • Xiaoliang Yang's avatar
      net: dsa: felix: convert TAS link speed based on phylink speed · dba1e466
      Xiaoliang Yang authored
      state->speed holds a value of 10, 100, 1000 or 2500, but
      QSYS_TAG_CONFIG_LINK_SPEED expects a value of 0, 1, 2, 3. So convert the
      speed to a proper value.
      
      Fixes: de143c0e ("net: dsa: felix: Configure Time-Aware Scheduler via taprio offload")
      Signed-off-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dba1e466
    • Luo bin's avatar
      hinic: fix wrong return value of mac-set cmd · f68910a8
      Luo bin authored
      It should also be regarded as an error when hw return status=4 for PF's
      setting mac cmd. Only if PF return status=4 to VF should this cmd be
      taken special treatment.
      
      Fixes: 7dd29ee1 ("hinic: add sriov feature support")
      Signed-off-by: default avatarLuo bin <luobin9@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f68910a8
    • Xie He's avatar
      drivers/net/wan/x25_asy: Correct the ndo_open and ndo_stop functions · ed46cd1d
      Xie He authored
      1.
      Move the lapb_register/lapb_unregister calls into the ndo_open/ndo_stop
      functions.
      This makes the LAPB protocol start/stop when the network interface
      starts/stops. When the network interface is down, the LAPB protocol
      shouldn't be running and the LAPB module shoudn't be generating control
      frames.
      
      2.
      Move netif_start_queue/netif_stop_queue into the ndo_open/ndo_stop
      functions.
      This makes the TX queue start/stop when the network interface
      starts/stops.
      (netif_stop_queue was originally in the ndo_stop function. But to make
      the code look better, I created a new function to use as ndo_stop, and
      made it call the original ndo_stop function. I moved netif_stop_queue
      from the original ndo_stop function to the new ndo_stop function.)
      
      Cc: Martin Schiller <ms@dev.tdt.de>
      Signed-off-by: default avatarXie He <xie.he.0141@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed46cd1d
    • Maciej Żenczykowski's avatar
      net/ipv4: always honour route mtu during forwarding · 02a1b175
      Maciej Żenczykowski authored
      Documentation/networking/ip-sysctl.txt:46 says:
        ip_forward_use_pmtu - BOOLEAN
          By default we don't trust protocol path MTUs while forwarding
          because they could be easily forged and can lead to unwanted
          fragmentation by the router.
          You only need to enable this if you have user-space software
          which tries to discover path mtus by itself and depends on the
          kernel honoring this information. This is normally not the case.
          Default: 0 (disabled)
          Possible values:
          0 - disabled
          1 - enabled
      
      Which makes it pretty clear that setting it to 1 is a potential
      security/safety/DoS issue, and yet it is entirely reasonable to want
      forwarded traffic to honour explicitly administrator configured
      route mtus (instead of defaulting to device mtu).
      
      Indeed, I can't think of a single reason why you wouldn't want to.
      Since you configured a route mtu you probably know better...
      
      It is pretty common to have a higher device mtu to allow receiving
      large (jumbo) frames, while having some routes via that interface
      (potentially including the default route to the internet) specify
      a lower mtu.
      
      Note that ipv6 forwarding uses device mtu unless the route is locked
      (in which case it will use the route mtu).
      
      This approach is not usable for IPv4 where an 'mtu lock' on a route
      also has the side effect of disabling TCP path mtu discovery via
      disabling the IPv4 DF (don't frag) bit on all outgoing frames.
      
      I'm not aware of a way to lock a route from an IPv6 RA, so that also
      potentially seems wrong.
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
      Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
      Cc: Tyler Wear <twear@quicinc.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02a1b175
    • David S. Miller's avatar
      Merge branch 'net_sched-fix-a-UAF-in-tcf_action_init' · 6d889996
      David S. Miller authored
      Cong Wang says:
      
      ====================
      net_sched: fix a UAF in tcf_action_init()
      
      This patchset fixes a use-after-free triggered by syzbot. Please
      find more details in each patch description.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d889996
    • Cong Wang's avatar
      net_sched: commit action insertions together · 0fedc63f
      Cong Wang authored
      syzbot is able to trigger a failure case inside the loop in
      tcf_action_init(), and when this happens we clean up with
      tcf_action_destroy(). But, as these actions are already inserted
      into the global IDR, other parallel process could free them
      before tcf_action_destroy(), then we will trigger a use-after-free.
      
      Fix this by deferring the insertions even later, after the loop,
      and committing all the insertions in a separate loop, so we will
      never fail in the middle of the insertions any more.
      
      One side effect is that the window between alloction and final
      insertion becomes larger, now it is more likely that the loop in
      tcf_del_walker() sees the placeholder -EBUSY pointer. So we have
      to check for error pointer in tcf_del_walker().
      
      Reported-and-tested-by: syzbot+2287853d392e4b42374a@syzkaller.appspotmail.com
      Fixes: 0190c1d4 ("net: sched: atomically check-allocate action")
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fedc63f