1. 30 Sep, 2020 7 commits
    • David S. Miller's avatar
      Merge branch '100GbE' of https://github.com/anguy11/net-queue · 03e7e72c
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2020-09-30
      
      This series contains updates to ice driver only.
      
      Jake increases the wait time for firmware response as it can take longer
      than the current wait time. Preserves the NVM capabilities of the device in
      safe mode so the device reports its NVM update capabilities properly
      when in this state.
      
      v2: Added cover letter
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03e7e72c
    • Jacob Keller's avatar
      ice: preserve NVM capabilities in safe mode · be49b1ad
      Jacob Keller authored
      If the driver initializes in safe mode, it will call
      ice_set_safe_mode_caps. This results in clearing the capabilities
      structures, in order to set them up for operating in safe mode, ensuring
      many features are disabled.
      
      This has a side effect of also clearing the capability bits that relate
      to NVM update. The result is that the device driver will not indicate
      support for unified update, even if the firmware is capable.
      
      Fix this by adding the relevant capability fields to the list of values
      we preserve. To simplify the code, use a common_cap structure instead of
      a handful of local variables. To reduce some duplication of the
      capability name, introduce a couple of macros used to restore the
      capabilities values from the cached copy.
      
      Fixes: de9b277e ("ice: Add support for unified NVM update flow capability")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarBrijesh Behera <brijeshx.behera@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      be49b1ad
    • Jacob Keller's avatar
      ice: increase maximum wait time for flash write commands · 0ec86e8e
      Jacob Keller authored
      The ice driver needs to wait for a firmware response to each command to
      write a block of data to the scratch area used to update the device
      firmware. The driver currently waits for up to 1 second for this to be
      returned.
      
      It turns out that firmware might take longer than 1 second to return
      a completion in some cases. If this happens, the flash update will fail
      to complete.
      
      Fix this by increasing the maximum time that the driver will wait for
      both writing a block of data, and for activating the new NVM bank. The
      timeout for an erase command is already several minutes, as the firmware
      had to erase the entire bank which was already expected to take a minute
      or more in the worst case.
      
      In the case where firmware really won't respond, we will now take longer
      to fail. However, this ensures that if the firmware is simply slow to
      respond, the flash update can still complete. This new maximum timeout
      should not adversely increase the update time, as the implementation for
      wait_event_interruptible_timeout, and should wake very soon after we get
      a completion event. It is better for a flash update be slow but still
      succeed than to fail because we gave up too quickly.
      
      Fixes: d69ea414 ("ice: implement device flash update via devlink")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarBrijesh Behera <brijeshx.behera@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0ec86e8e
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1f25c9bb
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2020-09-29
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 7 non-merge commits during the last 14 day(s) which contain
      a total of 7 files changed, 28 insertions(+), 8 deletions(-).
      
      The main changes are:
      
      1) fix xdp loading regression in libbpf for old kernels, from Andrii.
      
      2) Do not discard packet when NETDEV_TX_BUSY, from Magnus.
      
      3) Fix corner cases in libbpf related to endianness and kconfig, from Tony.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f25c9bb
    • David S. Miller's avatar
      Merge branch 'mptcp-Fix-for-32-bit-DATA_FIN' · 2b3e981a
      David S. Miller authored
      Mat Martineau says:
      
      ====================
      mptcp: Fix for 32-bit DATA_FIN
      
      The main fix is contained in patch 2, and that commit message explains
      the issue with not properly converting truncated DATA_FIN sequence
      numbers sent by the peer.
      
      With patch 2 adding an unlocked read of msk->ack_seq, patch 1 cleans up
      access to that data with READ_ONCE/WRITE_ONCE.
      
      This does introduce two merge conflicts with net-next, but both have
      straightforward resolution. Patch 1 modifies a line that got removed in
      net-next so the modification can be dropped when merging. Patch 2 will
      require a trivial conflict resolution for a modified function
      declaration.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b3e981a
    • Mat Martineau's avatar
      mptcp: Handle incoming 32-bit DATA_FIN values · 1a49b2c2
      Mat Martineau authored
      The peer may send a DATA_FIN mapping with either a 32-bit or 64-bit
      sequence number. When a 32-bit sequence number is received for the
      DATA_FIN, it must be expanded to 64 bits before comparing it to the
      last acked sequence number. This expansion was missing.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/93
      Fixes: 3721b9b6 ("mptcp: Track received DATA_FIN sequence number and add related helpers")
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1a49b2c2
    • Mat Martineau's avatar
      mptcp: Consistently use READ_ONCE/WRITE_ONCE with msk->ack_seq · 917944da
      Mat Martineau authored
      The msk->ack_seq value is sometimes read without the msk lock held, so
      make proper use of READ_ONCE and WRITE_ONCE.
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      917944da
  2. 29 Sep, 2020 16 commits
  3. 28 Sep, 2020 8 commits
    • Manivannan Sadhasivam's avatar
      net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks · a7809ff9
      Manivannan Sadhasivam authored
      The rcu read locks are needed to avoid potential race condition while
      dereferencing radix tree from multiple threads. The issue was identified
      by syzbot. Below is the crash report:
      
      =============================
      WARNING: suspicious RCU usage
      5.7.0-syzkaller #0 Not tainted
      -----------------------------
      include/linux/radix-tree.h:176 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by kworker/u4:1/21:
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: spin_unlock_irq include/linux/spinlock.h:403 [inline]
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: process_one_work+0x6df/0xfd0 kernel/workqueue.c:2241
       #1: ffffc90000dd7d80 ((work_completion)(&qrtr_ns.work)){+.+.}-{0:0}, at: process_one_work+0x71e/0xfd0 kernel/workqueue.c:2243
      
      stack backtrace:
      CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.7.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: qrtr_ns_handler qrtr_ns_worker
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1e9/0x30e lib/dump_stack.c:118
       radix_tree_deref_slot include/linux/radix-tree.h:176 [inline]
       ctrl_cmd_new_lookup net/qrtr/ns.c:558 [inline]
       qrtr_ns_worker+0x2aff/0x4500 net/qrtr/ns.c:674
       process_one_work+0x76e/0xfd0 kernel/workqueue.c:2268
       worker_thread+0xa7f/0x1450 kernel/workqueue.c:2414
       kthread+0x353/0x380 kernel/kthread.c:268
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Reported-and-tested-by: syzbot+0f84f6eed90503da72fc@syzkaller.appspotmail.com
      Signed-off-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7809ff9
    • David S. Miller's avatar
      Merge branch 'net-core-fix-a-lockdep-splat-in-the-dev_addr_list' · 0ba56b89
      David S. Miller authored
      Taehee Yoo says:
      
      ====================
      net: core: fix a lockdep splat in the dev_addr_list.
      
      This patchset is to avoid lockdep splat.
      
      When a stacked interface graph is changed, netif_addr_lock() is called
      recursively and it internally calls spin_lock_nested().
      The parameter of spin_lock_nested() is 'dev->lower_level',
      this is called subclass.
      The problem of 'dev->lower_level' is that while 'dev->lower_level' is
      being used as a subclass of spin_lock_nested(), its value can be changed.
      So, spin_lock_nested() would be called recursively with the same
      subclass value, the lockdep understands a deadlock.
      In order to avoid this, a new variable is needed and it is going to be
      used as a parameter of spin_lock_nested().
      The first and second patch is a preparation patch for the third patch.
      In the third patch, the problem will be fixed.
      
      The first patch is to add __netdev_upper_dev_unlink().
      An existed netdev_upper_dev_unlink() is renamed to
      __netdev_upper_dev_unlink(). and netdev_upper_dev_unlink()
      is added as an wrapper of this function.
      
      The second patch is to add the netdev_nested_priv structure.
      netdev_walk_all_{ upper | lower }_dev() pass both private functions
      and "data" pointer to handle their own things.
      At this point, the data pointer type is void *.
      In order to make it easier to expand common variables and functions,
      this new netdev_nested_priv structure is added.
      
      The third patch is to add a new variable 'nested_level'
      into the net_device structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      Due to this variable, it can avoid lockdep splat.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ba56b89
    • Taehee Yoo's avatar
      net: core: add nested_level variable in net_device · 1fc70edb
      Taehee Yoo authored
      This patch is to add a new variable 'nested_level' into the net_device
      structure.
      This variable will be used as a parameter of spin_lock_nested() of
      dev->addr_list_lock.
      
      netif_addr_lock() can be called recursively so spin_lock_nested() is
      used instead of spin_lock() and dev->lower_level is used as a parameter
      of spin_lock_nested().
      But, dev->lower_level value can be updated while it is being used.
      So, lockdep would warn a possible deadlock scenario.
      
      When a stacked interface is deleted, netif_{uc | mc}_sync() is
      called recursively.
      So, spin_lock_nested() is called recursively too.
      At this moment, the dev->lower_level variable is used as a parameter of it.
      dev->lower_level value is updated when interfaces are being unlinked/linked
      immediately.
      Thus, After unlinking, dev->lower_level shouldn't be a parameter of
      spin_lock_nested().
      
          A (macvlan)
          |
          B (vlan)
          |
          C (bridge)
          |
          D (macvlan)
          |
          E (vlan)
          |
          F (bridge)
      
          A->lower_level : 6
          B->lower_level : 5
          C->lower_level : 4
          D->lower_level : 3
          E->lower_level : 2
          F->lower_level : 1
      
      When an interface 'A' is removed, it releases resources.
      At this moment, netif_addr_lock() would be called.
      Then, netdev_upper_dev_unlink() is called recursively.
      Then dev->lower_level is updated.
      There is no problem.
      
      But, when the bridge module is removed, 'C' and 'F' interfaces
      are removed at once.
      If 'F' is removed first, a lower_level value is like below.
          A->lower_level : 5
          B->lower_level : 4
          C->lower_level : 3
          D->lower_level : 2
          E->lower_level : 1
          F->lower_level : 1
      
      Then, 'C' is removed. at this moment, netif_addr_lock() is called
      recursively.
      The ordering is like this.
      C(3)->D(2)->E(1)->F(1)
      At this moment, the lower_level value of 'E' and 'F' are the same.
      So, lockdep warns a possible deadlock scenario.
      
      In order to avoid this problem, a new variable 'nested_level' is added.
      This value is the same as dev->lower_level - 1.
      But this value is updated in rtnl_unlock().
      So, this variable can be used as a parameter of spin_lock_nested() safely
      in the rtnl context.
      
      Test commands:
         ip link add br0 type bridge vlan_filtering 1
         ip link add vlan1 link br0 type vlan id 10
         ip link add macvlan2 link vlan1 type macvlan
         ip link add br3 type bridge vlan_filtering 1
         ip link set macvlan2 master br3
         ip link add vlan4 link br3 type vlan id 10
         ip link add macvlan5 link vlan4 type macvlan
         ip link add br6 type bridge vlan_filtering 1
         ip link set macvlan5 master br6
         ip link add vlan7 link br6 type vlan id 10
         ip link add macvlan8 link vlan7 type macvlan
      
         ip link set br0 up
         ip link set vlan1 up
         ip link set macvlan2 up
         ip link set br3 up
         ip link set vlan4 up
         ip link set macvlan5 up
         ip link set br6 up
         ip link set vlan7 up
         ip link set macvlan8 up
         modprobe -rv bridge
      
      Splat looks like:
      [   36.057436][  T744] WARNING: possible recursive locking detected
      [   36.058848][  T744] 5.9.0-rc6+ #728 Not tainted
      [   36.059959][  T744] --------------------------------------------
      [   36.061391][  T744] ip/744 is trying to acquire lock:
      [   36.062590][  T744] ffff8c4767509280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_set_rx_mode+0x19/0x30
      [   36.064922][  T744]
      [   36.064922][  T744] but task is already holding lock:
      [   36.066626][  T744] ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.068851][  T744]
      [   36.068851][  T744] other info that might help us debug this:
      [   36.070731][  T744]  Possible unsafe locking scenario:
      [   36.070731][  T744]
      [   36.072497][  T744]        CPU0
      [   36.073238][  T744]        ----
      [   36.074007][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.075290][  T744]   lock(&vlan_netdev_addr_lock_key);
      [   36.076590][  T744]
      [   36.076590][  T744]  *** DEADLOCK ***
      [   36.076590][  T744]
      [   36.078515][  T744]  May be due to missing lock nesting notation
      [   36.078515][  T744]
      [   36.080491][  T744] 3 locks held by ip/744:
      [   36.081471][  T744]  #0: ffffffff98571df0 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x236/0x490
      [   36.083614][  T744]  #1: ffff8c4767769280 (&vlan_netdev_addr_lock_key){+...}-{2:2}, at: dev_uc_add+0x1e/0x60
      [   36.085942][  T744]  #2: ffff8c476c8da280 (&bridge_netdev_addr_lock_key/4){+...}-{2:2}, at: dev_uc_sync+0x39/0x80
      [   36.088400][  T744]
      [   36.088400][  T744] stack backtrace:
      [   36.089772][  T744] CPU: 6 PID: 744 Comm: ip Not tainted 5.9.0-rc6+ #728
      [   36.091364][  T744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   36.093630][  T744] Call Trace:
      [   36.094416][  T744]  dump_stack+0x77/0x9b
      [   36.095385][  T744]  __lock_acquire+0xbc3/0x1f40
      [   36.096522][  T744]  lock_acquire+0xb4/0x3b0
      [   36.097540][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.098657][  T744]  ? rtmsg_ifinfo+0x1f/0x30
      [   36.099711][  T744]  ? __dev_notify_flags+0xa5/0xf0
      [   36.100874][  T744]  ? rtnl_is_locked+0x11/0x20
      [   36.101967][  T744]  ? __dev_set_promiscuity+0x7b/0x1a0
      [   36.103230][  T744]  _raw_spin_lock_bh+0x38/0x70
      [   36.104348][  T744]  ? dev_set_rx_mode+0x19/0x30
      [   36.105461][  T744]  dev_set_rx_mode+0x19/0x30
      [   36.106532][  T744]  dev_set_promiscuity+0x36/0x50
      [   36.107692][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.108929][  T744]  dev_set_promiscuity+0x1e/0x50
      [   36.110093][  T744]  br_port_set_promisc+0x1f/0x40 [bridge]
      [   36.111415][  T744]  br_manage_promisc+0x8b/0xe0 [bridge]
      [   36.112728][  T744]  __dev_set_promiscuity+0x123/0x1a0
      [   36.113967][  T744]  ? __hw_addr_sync_one+0x23/0x50
      [   36.115135][  T744]  __dev_set_rx_mode+0x68/0x90
      [   36.116249][  T744]  dev_uc_sync+0x70/0x80
      [   36.117244][  T744]  dev_uc_add+0x50/0x60
      [   36.118223][  T744]  macvlan_open+0x18e/0x1f0 [macvlan]
      [   36.119470][  T744]  __dev_open+0xd6/0x170
      [   36.120470][  T744]  __dev_change_flags+0x181/0x1d0
      [   36.121644][  T744]  dev_change_flags+0x23/0x60
      [   36.122741][  T744]  do_setlink+0x30a/0x11e0
      [   36.123778][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.124929][  T744]  ? __nla_validate_parse.part.6+0x45/0x8e0
      [   36.126309][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.127457][  T744]  __rtnl_newlink+0x546/0x8e0
      [   36.128560][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.129623][  T744]  ? deactivate_slab.isra.85+0x6a1/0x850
      [   36.130946][  T744]  ? __lock_acquire+0x92c/0x1f40
      [   36.132102][  T744]  ? lock_acquire+0xb4/0x3b0
      [   36.133176][  T744]  ? is_bpf_text_address+0x5/0xe0
      [   36.134364][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.135445][  T744]  ? rcu_read_lock_sched_held+0x32/0x60
      [   36.136771][  T744]  ? kmem_cache_alloc_trace+0x2d8/0x380
      [   36.138070][  T744]  ? rtnl_newlink+0x2e/0x70
      [   36.139164][  T744]  rtnl_newlink+0x47/0x70
      [ ... ]
      
      Fixes: 845e0ebb ("net: change addr_list_lock back to static key")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fc70edb
    • Taehee Yoo's avatar
      net: core: introduce struct netdev_nested_priv for nested interface infrastructure · eff74233
      Taehee Yoo authored
      Functions related to nested interface infrastructure such as
      netdev_walk_all_{ upper | lower }_dev() pass both private functions
      and "data" pointer to handle their own things.
      At this point, the data pointer type is void *.
      In order to make it easier to expand common variables and functions,
      this new netdev_nested_priv structure is added.
      
      In the following patch, a new member variable will be added into this
      struct to fix the lockdep issue.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eff74233
    • Taehee Yoo's avatar
      net: core: add __netdev_upper_dev_unlink() · fe8300fd
      Taehee Yoo authored
      The netdev_upper_dev_unlink() has to work differently according to flags.
      This idea is the same with __netdev_upper_dev_link().
      
      In the following patches, new flags will be added.
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe8300fd
    • Cong Wang's avatar
      net_sched: remove a redundant goto chain check · 1aad8049
      Cong Wang authored
      All TC actions call tcf_action_check_ctrlact() to validate
      goto chain, so this check in tcf_action_init_1() is actually
      redundant. Remove it to save troubles of leaking memory.
      
      Fixes: e49d8c22 ("net_sched: defer tcf_idr_insert() in tcf_action_init_1()")
      Reported-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Suggested-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1aad8049
    • Nikolay Aleksandrov's avatar
      net: bridge: fdb: don't flush ext_learn entries · f2f3729f
      Nikolay Aleksandrov authored
      When a user-space software manages fdb entries externally it should
      set the ext_learn flag which marks the fdb entry as externally managed
      and avoids expiring it (they're treated as static fdbs). Unfortunately
      on events where fdb entries are flushed (STP down, netlink fdb flush
      etc) these fdbs are also deleted automatically by the bridge. That in turn
      causes trouble for the managing user-space software (e.g. in MLAG setups
      we lose remote fdb entries on port flaps).
      These entries are completely externally managed so we should avoid
      automatically deleting them, the only exception are offloaded entries
      (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as
      before.
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f2f3729f
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · a4be47af
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2020-09-28
      
      1) Fix a build warning in ip_vti if CONFIG_IPV6 is not set.
         From YueHaibing.
      
      2) Restore IPCB on espintcp before handing the packet to xfrm
         as the information there is still needed.
         From Sabrina Dubroca.
      
      3) Fix pmtu updating for xfrm interfaces.
         From Sabrina Dubroca.
      
      4) Some xfrm state information was not cloned with xfrm_do_migrate.
         Fixes to clone the full xfrm state, from Antony Antony.
      
      5) Use the correct address family in xfrm_state_find. The struct
         flowi must always be interpreted along with the original
         address family. This got lost over the years.
         Fix from Herbert Xu.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4be47af
  4. 27 Sep, 2020 4 commits
  5. 26 Sep, 2020 5 commits