1. 25 Feb, 2022 19 commits
    • David S. Miller's avatar
      Merge branch 'ibmvnic-fixes' · 5a83dd14
      David S. Miller authored
      Sukadev Bhattiprolu says:
      
      ====================
      ibmvnic: Fix a race in ibmvnic_probe()
      
      If we get a transport (reset) event right after a successful CRQ_INIT
      during ibmvnic_probe() but before we set the adapter state to VNIC_PROBED,
      we will throw away the reset assuming that the adapter is still in the
      probing state. But since the adapter has completed the CRQ_INIT any
      subsequent CRQs the we send will be ignored by the vnicserver until
      we release/init the CRQ again. This can leave the adapter unconfigured.
      
      While here fix a couple of other bugs that were observed (Patches 1,2,4).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a83dd14
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Allow queueing resets during probe · fd98693c
      Sukadev Bhattiprolu authored
      We currently don't allow queuing resets when adapter is in VNIC_PROBING
      state - instead we throw away the reset and return EBUSY. The reasoning
      is probably that during ibmvnic_probe() the ibmvnic_adapter itself is
      being initialized so performing a reset during this time can lead us to
      accessing fields in the ibmvnic_adapter that are not fully initialized.
      A review of the code shows that all the adapter state neede to process a
      reset is initialized before registering the CRQ so that should no longer
      be a concern.
      
      Further the expectation is that if we do get a reset (transport event)
      during probe, the do..while() loop in ibmvnic_probe() will handle this
      by reinitializing the CRQ.
      
      While that is true to some extent, it is possible that the reset might
      occur _after_ the CRQ is registered and CRQ_INIT message was exchanged
      but _before_ the adapter state is set to VNIC_PROBED. As mentioned above,
      such a reset will be thrown away. While the client assumes that the
      adapter is functional, the vnic server will wait for the client to reinit
      the adapter. This disconnect between the two leaves the adapter down
      needing manual intervention.
      
      Because ibmvnic_probe() has other work to do after initializing the CRQ
      (such as registering the netdev at a minimum) and because the reset event
      can occur at any instant after the CRQ is initialized, there will always
      be a window between initializing the CRQ and considering the adapter
      ready for resets (ie state == PROBED).
      
      So rather than discarding resets during this window, allow queueing them
      - but only process them after the adapter is fully initialized.
      
      To do this, introduce a new completion state ->probe_done and have the
      reset worker thread wait on this before processing resets.
      
      This change brings up two new situations in or just after ibmvnic_probe().
      First after one or more resets were queued, we encounter an error and
      decide to retry the initialization.  At that point the queued resets are
      no longer relevant since we could be talking to a new vnic server. So we
      must purge/flush the queued resets before restarting the initialization.
      As a side note, since we are still in the probing stage and we have not
      registered the netdev, it will not be CHANGE_PARAM reset.
      
      Second this change opens up a potential race between the worker thread
      in __ibmvnic_reset(), the tasklet and the ibmvnic_open() due to the
      following sequence of events:
      
      	1. Register CRQ
      	2. Get transport event before CRQ_INIT completes.
      	3. Tasklet schedules reset:
      		a) add rwi to list
      		b) schedule_work() to start worker thread which runs
      		   and waits for ->probe_done.
      	4. ibmvnic_probe() decides to retry, purges rwi_list
      	5. Re-register crq and this time rest of probe succeeds - register
      	   netdev and complete(->probe_done).
      	6. Worker thread resumes in __ibmvnic_reset() from 3b.
      	7. Worker thread sets ->resetting bit
      	8. ibmvnic_open() comes in, notices ->resetting bit, sets state
      	   to IBMVNIC_OPEN and returns early expecting worker thread to
      	   finish the open.
      	9. Worker thread finds rwi_list empty and returns without
      	   opening the interface.
      
      If this happens, the ->ndo_open() call is effectively lost and the
      interface remains down. To address this, ensure that ->rwi_list is
      not empty before setting the ->resetting  bit. See also comments in
      __ibmvnic_reset().
      
      Fixes: 6a2fb0e9 ("ibmvnic: driver initialization for kdump/kexec")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd98693c
    • Sukadev Bhattiprolu's avatar
      ibmvnic: clear fop when retrying probe · f628ad53
      Sukadev Bhattiprolu authored
      Clear ->failover_pending flag that may have been set in the previous
      pass of registering CRQ. If we don't clear, a subsequent ibmvnic_open()
      call would be misled into thinking a failover is pending and assuming
      that the reset worker thread would open the adapter. If this pass of
      registering the CRQ succeeds (i.e there is no transport event), there
      wouldn't be a reset worker thread.
      
      This would leave the adapter unconfigured and require manual intervention
      to bring it up during boot.
      
      Fixes: 5a18e1e0 ("ibmvnic: Fix failover case for non-redundant configuration")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f628ad53
    • Sukadev Bhattiprolu's avatar
      ibmvnic: init init_done_rc earlier · ae16bf15
      Sukadev Bhattiprolu authored
      We currently initialize the ->init_done completion/return code fields
      before issuing a CRQ_INIT command. But if we get a transport event soon
      after registering the CRQ the taskslet may already have recorded the
      completion and error code. If we initialize here, we might overwrite/
      lose that and end up issuing the CRQ_INIT only to timeout later.
      
      If that timeout happens during probe, we will leave the adapter in the
      DOWN state rather than retrying to register/init the CRQ.
      
      Initialize the completion before registering the CRQ so we don't lose
      the notification.
      
      Fixes: 032c5e82 ("Driver for IBM System i/p VNIC protocol")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae16bf15
    • Sukadev Bhattiprolu's avatar
      ibmvnic: register netdev after init of adapter · 570425f8
      Sukadev Bhattiprolu authored
      Finish initializing the adapter before registering netdev so state
      is consistent.
      
      Fixes: c26eba03 ("ibmvnic: Update reset infrastructure to support tunable parameters")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      570425f8
    • Sukadev Bhattiprolu's avatar
      ibmvnic: complete init_done on transport events · 36491f2d
      Sukadev Bhattiprolu authored
      If we get a transport event, set the error and mark the init as
      complete so the attempt to send crq-init or login fail sooner
      rather than wait for the timeout.
      
      Fixes: bbd669a8 ("ibmvnic: Fix completion structure initialization")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36491f2d
    • Sukadev Bhattiprolu's avatar
      ibmvnic: define flush_reset_queue helper · 83da53f7
      Sukadev Bhattiprolu authored
      Define and use a helper to flush the reset queue.
      
      Fixes: 2770a798 ("ibmvnic: Introduce hard reset recovery")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83da53f7
    • Sukadev Bhattiprolu's avatar
      ibmvnic: initialize rc before completing wait · 765559b1
      Sukadev Bhattiprolu authored
      We should initialize ->init_done_rc before calling complete(). Otherwise
      the waiting thread may see ->init_done_rc as 0 before we have updated it
      and may assume that the CRQ was successful.
      
      Fixes: 6b278c0c ("ibmvnic delay complete()")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      765559b1
    • Sukadev Bhattiprolu's avatar
      ibmvnic: free reset-work-item when flushing · 8d0657f3
      Sukadev Bhattiprolu authored
      Fix a tiny memory leak when flushing the reset work queue.
      
      Fixes: 2770a798 ("ibmvnic: Introduce hard reset recovery")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d0657f3
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 31372fe9
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      1) Fix PMTU for IPv6 if the reported MTU minus the ESP overhead is
         smaller than 1280. From Jiri Bohac.
      
      2) Fix xfrm interface ID and inter address family tunneling when
         migrating xfrm states. From Yan Yan.
      
      3) Add missing xfrm intrerface ID initialization on xfrmi_changelink.
         From Antony Antony.
      
      4) Enforce validity of xfrm offload input flags so that userspace can't
         send undefined flags to the offload driver.
         From Leon Romanovsky.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31372fe9
    • Vladimir Oltean's avatar
      net: dcb: flush lingering app table entries for unregistered devices · 91b0383f
      Vladimir Oltean authored
      If I'm not mistaken (and I don't think I am), the way in which the
      dcbnl_ops work is that drivers call dcb_ieee_setapp() and this populates
      the application table with dynamically allocated struct dcb_app_type
      entries that are kept in the module-global dcb_app_list.
      
      However, nobody keeps exact track of these entries, and although
      dcb_ieee_delapp() is supposed to remove them, nobody does so when the
      interface goes away (example: driver unbinds from device). So the
      dcb_app_list will contain lingering entries with an ifindex that no
      longer matches any device in dcb_app_lookup().
      
      Reclaim the lost memory by listening for the NETDEV_UNREGISTER event and
      flushing the app table entries of interfaces that are now gone.
      
      In fact something like this used to be done as part of the initial
      commit (blamed below), but it was done in dcbnl_exit() -> dcb_flushapp(),
      essentially at module_exit time. That became dead code after commit
      7a6b6f51 ("DCB: fix kconfig option") which essentially merged
      "tristate config DCB" and "bool config DCBNL" into a single "bool config
      DCB", so net/dcb/dcbnl.c could not be built as a module anymore.
      
      Commit 36b9ad80 ("net/dcb: make dcbnl.c explicitly non-modular")
      recognized this and deleted dcbnl_exit() and dcb_flushapp() altogether,
      leaving us with the version we have today.
      
      Since flushing application table entries can and should be done as soon
      as the netdevice disappears, fundamentally the commit that is to blame
      is the one that introduced the design of this API.
      
      Fixes: 9ab933ab ("dcbnl: add appliction tlv handlers")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91b0383f
    • D. Wythe's avatar
      net/smc: fix connection leak · 9f1c50cf
      D. Wythe authored
      There's a potential leak issue under following execution sequence :
      
      smc_release  				smc_connect_work
      if (sk->sk_state == SMC_INIT)
      					send_clc_confirim
      	tcp_abort();
      					...
      					sk.sk_state = SMC_ACTIVE
      smc_close_active
      switch(sk->sk_state) {
      ...
      case SMC_ACTIVE:
      	smc_close_final()
      	// then wait peer closed
      
      Unfortunately, tcp_abort() may discard CLC CONFIRM messages that are
      still in the tcp send buffer, in which case our connection token cannot
      be delivered to the server side, which means that we cannot get a
      passive close message at all. Therefore, it is impossible for the to be
      disconnected at all.
      
      This patch tries a very simple way to avoid this issue, once the state
      has changed to SMC_ACTIVE after tcp_abort(), we can actively abort the
      smc connection, considering that the state is SMC_INIT before
      tcp_abort(), abandoning the complete disconnection process should not
      cause too much problem.
      
      In fact, this problem may exist as long as the CLC CONFIRM message is
      not received by the server. Whether a timer should be added after
      smc_close_final() needs to be discussed in the future. But even so, this
      patch provides a faster release for connection in above case, it should
      also be valuable.
      
      Fixes: 39f41f36 ("net/smc: common release code for non-accepted sockets")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f1c50cf
    • Vincent Whitchurch's avatar
      net: stmmac: only enable DMA interrupts when ready · 087a7b94
      Vincent Whitchurch authored
      In this driver's ->ndo_open() callback, it enables DMA interrupts,
      starts the DMA channels, then requests interrupts with request_irq(),
      and then finally enables napi.
      
      If RX DMA interrupts are received before napi is enabled, no processing
      is done because napi_schedule_prep() will return false.  If the network
      has a lot of broadcast/multicast traffic, then the RX ring could fill up
      completely before napi is enabled.  When this happens, no further RX
      interrupts will be delivered, and the driver will fail to receive any
      packets.
      
      Fix this by only enabling DMA interrupts after all other initialization
      is complete.
      
      Fixes: 523f11b5 ("net: stmmac: move hardware setup for stmmac_open to new function")
      Reported-by: default avatarLars Persson <larper@axis.com>
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      087a7b94
    • Marek Marczykowski-Górecki's avatar
      xen/netfront: destroy queues before real_num_tx_queues is zeroed · dcf4ff7a
      Marek Marczykowski-Górecki authored
      xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to
      delete queues. Since d7dac083
      ("net-sysfs: update the queue counts in the unregistration path"),
      unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two
      facts together means, that xennet_destroy_queues() called from
      xennet_remove() cannot do its job, because it's called after
      unregister_netdev(). This results in kfree-ing queues that are still
      linked in napi, which ultimately crashes:
      
          BUG: kernel NULL pointer dereference, address: 0000000000000000
          #PF: supervisor read access in kernel mode
          #PF: error_code(0x0000) - not-present page
          PGD 0 P4D 0
          Oops: 0000 [#1] PREEMPT SMP PTI
          CPU: 1 PID: 52 Comm: xenwatch Tainted: G        W         5.16.10-1.32.fc32.qubes.x86_64+ #226
          RIP: 0010:free_netdev+0xa3/0x1a0
          Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00
          RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286
          RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000
          RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff
          RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050
          R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680
          FS:  0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0
          Call Trace:
           <TASK>
           xennet_remove+0x13d/0x300 [xen_netfront]
           xenbus_dev_remove+0x6d/0xf0
           __device_release_driver+0x17a/0x240
           device_release_driver+0x24/0x30
           bus_remove_device+0xd8/0x140
           device_del+0x18b/0x410
           ? _raw_spin_unlock+0x16/0x30
           ? klist_iter_exit+0x14/0x20
           ? xenbus_dev_request_and_reply+0x80/0x80
           device_unregister+0x13/0x60
           xenbus_dev_changed+0x18e/0x1f0
           xenwatch_thread+0xc0/0x1a0
           ? do_wait_intr_irq+0xa0/0xa0
           kthread+0x16b/0x190
           ? set_kthread_struct+0x40/0x40
           ret_from_fork+0x22/0x30
           </TASK>
      
      Fix this by calling xennet_destroy_queues() from xennet_uninit(),
      when real_num_tx_queues is still available. This ensures that queues are
      destroyed when real_num_tx_queues is set to 0, regardless of how
      unregister_netdev() was called.
      
      Originally reported at
      https://github.com/QubesOS/qubes-issues/issues/7257
      
      Fixes: d7dac083 ("net-sysfs: update the queue counts in the unregistration path")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcf4ff7a
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes-for-5-17' · a6df953f
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for 5.17
      
      Patch 1 fixes an issue with the SIOCOUTQ ioctl in MPTCP sockets that
      have performed a fallback to TCP.
      
      Patch 2 is a selftest fix to correctly remove temp files.
      
      Patch 3 fixes a shift-out-of-bounds issue found by syzkaller.
      ====================
      
      Link: https://lore.kernel.org/r/20220225005259.318898-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a6df953f
    • Mat Martineau's avatar
      mptcp: Correctly set DATA_FIN timeout when number of retransmits is large · 877d11f0
      Mat Martineau authored
      Syzkaller with UBSAN uncovered a scenario where a large number of
      DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN
      timeout calculation:
      
      ================================================================================
      UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29
      shift exponent 32 is too large for 32-bit type 'unsigned int'
      CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted 5.17.0-rc2-00630-g5fbf21c90c60 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
      Workqueue: events mptcp_worker
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
       __ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330
       mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline]
       __mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445
       mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528
       process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307
       worker_thread+0x95/0xe10 kernel/workqueue.c:2454
       kthread+0x2f4/0x3b0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
       </TASK>
      ================================================================================
      
      This change limits the maximum timeout by limiting the size of the
      shift, which keeps all intermediate values in-bounds.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259
      Fixes: 6477dd39 ("mptcp: Retransmit DATA_FIN")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      877d11f0
    • Paolo Abeni's avatar
      selftests: mptcp: do complete cleanup at exit · 63bb8239
      Paolo Abeni authored
      After commit 05be5e27 ("selftests: mptcp: add disconnect tests")
      the mptcp selftests leave behind a couple of tmp files after
      each run. run_tests_disconnect() misnames a few variables used to
      track them. Address the issue setting the appropriate global variables
      
      Fixes: 05be5e27 ("selftests: mptcp: add disconnect tests")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      63bb8239
    • Paolo Abeni's avatar
      mptcp: accurate SIOCOUTQ for fallback socket · 07c2c7a3
      Paolo Abeni authored
      The MPTCP SIOCOUTQ implementation is not very accurate in
      case of fallback: it only measures the data in the MPTCP-level
      write queue, but it does not take in account the subflow
      write queue utilization. In case of fallback the first can be
      empty, while the latter is not.
      
      The above produces sporadic self-tests issues and can foul
      legit user-space application.
      
      Fix the issue additionally querying the subflow in case of fallback.
      
      Fixes: 644807e3 ("mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/260Reported-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      07c2c7a3
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 8a727100
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix regression with RFCOMM
       - Fix regression with LE devices using Privacy (RPA)
       - Fix regression with LE devices not waiting proper timeout to
         establish connections
       - Fix race in smp
      
      * tag 'for-net-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: hci_sync: Fix not using conn_timeout
        Bluetooth: hci_sync: Fix hci_update_accept_list_sync
        Bluetooth: assign len after null check
        Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
        Bluetooth: fix data races in smp_unregister(), smp_del_chan()
        Bluetooth: hci_core: Fix leaking sent_cmd skb
      ====================
      
      Link: https://lore.kernel.org/r/20220224210838.197787-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8a727100
  2. 24 Feb, 2022 21 commits
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · d8152cfe
      Linus Torvalds authored
      Pull pci fixes from Bjorn Helgaas:
      
       - Fix a merge error that broke PCI device enumeration on mvebu
         platforms, including Turris Omnia (Armada 385) (Pali Rohár)
      
       - Avoid using ATS on all AMD Navi10 and Navi14 GPUs because some
         VBIOSes don't account for "harvested" (disabled) parts of the chip
         when initializing caches (Alex Deucher)
      
      * tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
        PCI: mvebu: Fix device enumeration regression
      d8152cfe
    • Linus Torvalds's avatar
      Merge tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f672ff91
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bpf and netfilter.
      
        Current release - regressions:
      
         - bpf: fix crash due to out of bounds access into reg2btf_ids
      
         - mvpp2: always set port pcs ops, avoid null-deref
      
         - eth: marvell: fix driver load from initrd
      
         - eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"
      
        Current release - new code bugs:
      
         - mptcp: fix race in overlapping signal events
      
        Previous releases - regressions:
      
         - xen-netback: revert hotplug-status changes causing devices to not
           be configured
      
         - dsa:
            - avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
              held
            - fix panic when removing unoffloaded port from bridge
      
         - dsa: microchip: fix bridging with more than two member ports
      
        Previous releases - always broken:
      
         - bpf:
            - fix crash due to incorrect copy_map_value when both spin lock
              and timer are present in a single value
            - fix a bpf_timer initialization issue with clang
            - do not try bpf_msg_push_data with len 0
            - add schedule points in batch ops
      
         - nf_tables:
            - unregister flowtable hooks on netns exit
            - correct flow offload action array size
            - fix a couple of memory leaks
      
         - vsock: don't check owner in vhost_vsock_stop() while releasing
      
         - gso: do not skip outer ip header in case of ipip and net_failover
      
         - smc: use a mutex for locking "struct smc_pnettable"
      
         - openvswitch: fix setting ipv6 fields causing hw csum failure
      
         - mptcp: fix race in incoming ADD_ADDR option processing
      
         - sysfs: add check for netdevice being present to speed_show
      
         - sched: act_ct: fix flow table lookup after ct clear or switching
           zones
      
         - eth: intel: fixes for SR-IOV forwarding offloads
      
         - eth: broadcom: fixes for selftests and error recovery
      
         - eth: mellanox: flow steering and SR-IOV forwarding fixes
      
        Misc:
      
         - make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
           friends not report freed skbs as drops
      
         - force inlining of checksum functions in net/checksum.h"
      
      * tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
        net: mv643xx_eth: process retval from of_get_mac_address
        ping: remove pr_err from ping_lookup
        Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
        openvswitch: Fix setting ipv6 fields causing hw csum failure
        ipv6: prevent a possible race condition with lifetimes
        net/smc: Use a mutex for locking "struct smc_pnettable"
        bnx2x: fix driver load from initrd
        Revert "xen-netback: Check for hotplug-status existence before watching"
        Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
        net/mlx5e: Fix VF min/max rate parameters interchange mistake
        net/mlx5e: Add missing increment of count
        net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
        net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
        net/mlx5e: Add feature check for set fec counters
        net/mlx5e: TC, Skip redundant ct clear actions
        net/mlx5e: TC, Reject rules with forward and drop actions
        net/mlx5e: TC, Reject rules with drop and modify hdr action
        net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
        net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
        net/mlx5: Fix possible deadlock on rule deletion
        ...
      f672ff91
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sync: Fix not using conn_timeout · a56a1138
      Luiz Augusto von Dentz authored
      When using hci_le_create_conn_sync it shall wait for the conn_timeout
      since the connection complete may take longer than just 2 seconds.
      
      Also fix the masking of HCI_EV_LE_ENHANCED_CONN_COMPLETE and
      HCI_EV_LE_CONN_COMPLETE so they are never both set so we can predict
      which one the controller will use in case of HCI_OP_LE_CREATE_CONN.
      
      Fixes: 6cd29ec6 ("Bluetooth: hci_sync: Wait for proper events when connecting LE")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      a56a1138
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sync: Fix hci_update_accept_list_sync · 80740ebb
      Luiz Augusto von Dentz authored
      hci_update_accept_list_sync is returning the filter based on the error
      but that gets overwritten by hci_le_set_addr_resolution_enable_sync
      return instead of using the actual result of the likes of
      hci_le_add_accept_list_sync which was intended.
      
      Fixes: ad383c2c ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      80740ebb
    • Wang Qing's avatar
      Bluetooth: assign len after null check · 2e8ecb4b
      Wang Qing authored
      len should be assigned after a null check
      Signed-off-by: default avatarWang Qing <wangqing@vivo.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      2e8ecb4b
    • Luiz Augusto von Dentz's avatar
      Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks · 29fb6083
      Luiz Augusto von Dentz authored
      Since bt_skb_sendmmsg can be used with the likes of SOCK_STREAM it
      shall return the partial chunks it could allocate instead of freeing
      everything as otherwise it can cause problems like bellow.
      
      Fixes: 81be03e0 ("Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg")
      Reported-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Link: https://lore.kernel.org/r/d7206e12-1b99-c3be-84f4-df22af427ef5@molgen.mpg.de
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215594Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> (Nokia N9 (MeeGo/Harmattan)
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      29fb6083
    • Lin Ma's avatar
      Bluetooth: fix data races in smp_unregister(), smp_del_chan() · fa78d2d1
      Lin Ma authored
      Previous commit e0448092 ("Bluetooth: defer cleanup of resources
      in hci_unregister_dev()") defers all destructive actions to
      hci_release_dev() to prevent cocurrent problems like NPD, UAF.
      
      However, there are still some exceptions that are ignored.
      
      The smp_unregister() in hci_dev_close_sync() (previously in
      hci_dev_do_close) will release resources like the sensitive channel
      and the smp_dev objects. Consider the situations the device is detaching
      or power down while the kernel is still operating on it, the following
      data race could take place.
      
      thread-A  hci_dev_close_sync  | thread-B  read_local_oob_ext_data
                                    |
      hci_dev_unlock()              |
      ...                           | hci_dev_lock()
      if (hdev->smp_data)           |
        chan = hdev->smp_data       |
                                    | chan = hdev->smp_data (3)
                                    |
        hdev->smp_data = NULL (1)   | if (!chan || !chan->data) (4)
        ...                         |
        smp = chan->data            | smp = chan->data
        if (smp)                    |
          chan->data = NULL (2)     |
          ...                       |
          kfree_sensitive(smp)      |
                                    | // dereference smp trigger UFA
      
      That is, the objects hdev->smp_data and chan->data both suffer from the
      data races. In a preempt-enable kernel, the above schedule (when (3) is
      before (1) and (4) is before (2)) leads to UAF bugs. It can be
      reproduced in the latest kernel and below is part of the report:
      
      [   49.097146] ================================================================
      [   49.097611] BUG: KASAN: use-after-free in smp_generate_oob+0x2dd/0x570
      [   49.097611] Read of size 8 at addr ffff888006528360 by task generate_oob/155
      [   49.097611]
      [   49.097611] Call Trace:
      [   49.097611]  <TASK>
      [   49.097611]  dump_stack_lvl+0x34/0x44
      [   49.097611]  print_address_description.constprop.0+0x1f/0x150
      [   49.097611]  ? smp_generate_oob+0x2dd/0x570
      [   49.097611]  ? smp_generate_oob+0x2dd/0x570
      [   49.097611]  kasan_report.cold+0x7f/0x11b
      [   49.097611]  ? smp_generate_oob+0x2dd/0x570
      [   49.097611]  smp_generate_oob+0x2dd/0x570
      [   49.097611]  read_local_oob_ext_data+0x689/0xc30
      [   49.097611]  ? hci_event_packet+0xc80/0xc80
      [   49.097611]  ? sysvec_apic_timer_interrupt+0x9b/0xc0
      [   49.097611]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [   49.097611]  ? mgmt_init_hdev+0x1c/0x240
      [   49.097611]  ? mgmt_init_hdev+0x28/0x240
      [   49.097611]  hci_sock_sendmsg+0x1880/0x1e70
      [   49.097611]  ? create_monitor_event+0x890/0x890
      [   49.097611]  ? create_monitor_event+0x890/0x890
      [   49.097611]  sock_sendmsg+0xdf/0x110
      [   49.097611]  __sys_sendto+0x19e/0x270
      [   49.097611]  ? __ia32_sys_getpeername+0xa0/0xa0
      [   49.097611]  ? kernel_fpu_begin_mask+0x1c0/0x1c0
      [   49.097611]  __x64_sys_sendto+0xd8/0x1b0
      [   49.097611]  ? syscall_exit_to_user_mode+0x1d/0x40
      [   49.097611]  do_syscall_64+0x3b/0x90
      [   49.097611]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   49.097611] RIP: 0033:0x7f5a59f51f64
      ...
      [   49.097611] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5a59f51f64
      [   49.097611] RDX: 0000000000000007 RSI: 00007f5a59d6ac70 RDI: 0000000000000006
      [   49.097611] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      [   49.097611] R10: 0000000000000040 R11: 0000000000000246 R12: 00007ffec26916ee
      [   49.097611] R13: 00007ffec26916ef R14: 00007f5a59d6afc0 R15: 00007f5a59d6b700
      
      To solve these data races, this patch places the smp_unregister()
      function in the protected area by the hci_dev_lock(). That is, the
      smp_unregister() function can not be concurrently executed when
      operating functions (most of them are mgmt operations in mgmt.c) hold
      the device lock.
      
      This patch is tested with kernel LOCK DEBUGGING enabled. The price from
      the extended holding time of the device lock is supposed to be low as the
      smp_unregister() function is fairly short and efficient.
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      fa78d2d1
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_core: Fix leaking sent_cmd skb · dd3b1dc3
      Luiz Augusto von Dentz authored
      sent_cmd memory is not freed before freeing hci_dev causing it to leak
      it contents.
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      dd3b1dc3
    • Linus Torvalds's avatar
      Merge tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block · 73878e5e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - NVMe pull request:
          - send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
          - fix passthrough to namespaces with unsupported features (Christoph
            Hellwig)
      
       - Clear iocb->private at poll completion (Stefano)
      
      * tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block:
        nvme-tcp: send H2CData PDUs based on MAXH2CDATA
        nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
        nvme: don't return an error from nvme_configure_metadata
        block: clear iocb->private in blkdev_bio_end_io_async()
      73878e5e
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block · 3a5f59b1
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
      
       - Add a conditional schedule point in io_add_buffers() (Eric)
      
       - Fix for a quiesce speedup merged in this release (Dylan)
      
       - Don't convert to jiffies for event timeout waiting, it's way too
         coarse when we accept a timespec as input (me)
      
      * tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block:
        io_uring: disallow modification of rsrc_data during quiesce
        io_uring: don't convert to jiffies for waiting on timeouts
        io_uring: add a schedule point in io_add_buffers()
      3a5f59b1
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v5.17-4' of... · 6c528f34
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull more x86 platform driver fixes from Hans de Goede:
       "Two more fixes:
      
         - Fix suspend/resume regression on AMD Cezanne APUs in >= 5.16
      
         - Fix Microsoft Surface 3 battery readings"
      
      * tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        surface: surface3_power: Fix battery readings on batteries without a serial number
        platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup
      6c528f34
    • Mauri Sandberg's avatar
      net: mv643xx_eth: process retval from of_get_mac_address · 42404d8f
      Mauri Sandberg authored
      Obtaining a MAC address may be deferred in cases when the MAC is stored
      in an NVMEM block, for example, and it may not be ready upon the first
      retrieval attempt and return EPROBE_DEFER.
      
      It is also possible that a port that does not rely on NVMEM has been
      already created when getting the defer request. Thus, also the resources
      allocated previously must be freed when doing a roll-back.
      
      Fixes: 76723bca ("net: mv643xx_eth: add DT parsing support")
      Signed-off-by: default avatarMauri Sandberg <maukka@ext.kapsi.fi>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220223142337.41757-1-maukka@ext.kapsi.fiSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      42404d8f
    • Xin Long's avatar
      ping: remove pr_err from ping_lookup · cd33bdcb
      Xin Long authored
      As Jakub noticed, prints should be avoided on the datapath.
      Also, as packets would never come to the else branch in
      ping_lookup(), remove pr_err() from ping_lookup().
      
      Fixes: 35a79e64 ("ping: fix the dif and sdif check in ping_lookup")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Link: https://lore.kernel.org/r/1ef3f2fcd31bd681a193b1fcf235eee1603819bd.1645674068.git.lucien.xin@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cd33bdcb
    • Mateusz Palczewski's avatar
      Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC" · fe203715
      Mateusz Palczewski authored
      Revert of a patch that instead of fixing a AQ error when trying
      to reset BW limit introduced several regressions related to
      creation and managing TC. Currently there are errors when creating
      a TC on both PF and VF.
      
      Error log:
      [17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
      [17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391
      [17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
      [17428.783259] i40e 0000:3b:00.1: Unable to  configure TC map 0 for VSI 391
      
      This reverts commit 3d250466.
      
      Fixes: 3d250466 (i40e: Fix reset bw limit when DCB enabled with 1 TC)
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fe203715
    • Paul Blakey's avatar
      openvswitch: Fix setting ipv6 fields causing hw csum failure · d9b5ae5c
      Paul Blakey authored
      Ipv6 ttl, label and tos fields are modified without first
      pulling/pushing the ipv6 header, which would have updated
      the hw csum (if available). This might cause csum validation
      when sending the packet to the stack, as can be seen in
      the trace below.
      
      Fix this by updating skb->csum if available.
      
      Trace resulted by ipv6 ttl dec and then sending packet
      to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
      [295241.900063] s_pf0vf2: hw csum failure
      [295241.923191] Call Trace:
      [295241.925728]  <IRQ>
      [295241.927836]  dump_stack+0x5c/0x80
      [295241.931240]  __skb_checksum_complete+0xac/0xc0
      [295241.935778]  nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
      [295241.953030]  nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
      [295241.958344]  __ovs_ct_lookup+0xac/0x860 [openvswitch]
      [295241.968532]  ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
      [295241.979167]  do_execute_actions+0x54a/0xaa0 [openvswitch]
      [295242.001482]  ovs_execute_actions+0x48/0x100 [openvswitch]
      [295242.006966]  ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
      [295242.012626]  ovs_vport_receive+0x6c/0xc0 [openvswitch]
      [295242.028763]  netdev_frame_hook+0xc0/0x180 [openvswitch]
      [295242.034074]  __netif_receive_skb_core+0x2ca/0xcb0
      [295242.047498]  netif_receive_skb_internal+0x3e/0xc0
      [295242.052291]  napi_gro_receive+0xba/0xe0
      [295242.056231]  mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
      [295242.062513]  mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
      [295242.067669]  mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
      [295242.077958]  net_rx_action+0x149/0x3b0
      [295242.086762]  __do_softirq+0xd7/0x2d6
      [295242.090427]  irq_exit+0xf7/0x100
      [295242.093748]  do_IRQ+0x7f/0xd0
      [295242.096806]  common_interrupt+0xf/0xf
      [295242.100559]  </IRQ>
      [295242.102750] RIP: 0033:0x7f9022e88cbd
      [295242.125246] RSP: 002b:00007f9022282b20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
      [295242.132900] RAX: 0000000000000005 RBX: 0000000000000010 RCX: 0000000000000000
      [295242.140120] RDX: 00007f9022282ba8 RSI: 00007f9022282a30 RDI: 00007f9014005c30
      [295242.147337] RBP: 00007f9014014d60 R08: 0000000000000020 R09: 00007f90254a8340
      [295242.154557] R10: 00007f9022282a28 R11: 0000000000000246 R12: 0000000000000000
      [295242.161775] R13: 00007f902308c000 R14: 000000000000002b R15: 00007f9022b71f40
      
      Fixes: 3fdbd1ce ("openvswitch: add ipv6 'set' action")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d9b5ae5c
    • Niels Dossche's avatar
      ipv6: prevent a possible race condition with lifetimes · 6c0d8833
      Niels Dossche authored
      valid_lft, prefered_lft and tstamp are always accessed under the lock
      "lock" in other places. Reading these without taking the lock may result
      in inconsistencies regarding the calculation of the valid and preferred
      variables since decisions are taken on these fields for those variables.
      Signed-off-by: default avatarNiels Dossche <dossche.niels@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarNiels Dossche <niels.dossche@ugent.be>
      Link: https://lore.kernel.org/r/20220223131954.6570-1-niels.dossche@ugent.beSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c0d8833
    • Fabio M. De Francesco's avatar
      net/smc: Use a mutex for locking "struct smc_pnettable" · 7ff57e98
      Fabio M. De Francesco authored
      smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib()
      which, in turn, calls mutex_lock(&smc_ib_devices.mutex).
      
      read_lock() disables preemption. Therefore, the code acquires a mutex while in
      atomic context and it leads to a SAC bug.
      
      Fix this bug by replacing the rwlock with a mutex.
      
      Reported-and-tested-by: syzbot+4f322a6d84e991c38775@syzkaller.appspotmail.com
      Fixes: 64e28b52 ("net/smc: add pnet table namespace support")
      Confirmed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220223100252.22562-1-fmdefrancesco@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7ff57e98
    • Manish Chopra's avatar
      bnx2x: fix driver load from initrd · e13ad144
      Manish Chopra authored
      Commit b7a49f73 ("bnx2x: Utilize firmware 7.13.21.0") added
      new firmware support in the driver with maintaining older firmware
      compatibility. However, older firmware was not added in MODULE_FIRMWARE()
      which caused missing firmware files in initrd image leading to driver load
      failure from initrd. This patch adds MODULE_FIRMWARE() for older firmware
      version to have firmware files included in initrd.
      
      Fixes: b7a49f73 ("bnx2x: Utilize firmware 7.13.21.0")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215627Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAlok Prasad <palok@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Link: https://lore.kernel.org/r/20220223085720.12021-1-manishc@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e13ad144
    • Marek Marczykowski-Górecki's avatar
      Revert "xen-netback: Check for hotplug-status existence before watching" · e8240add
      Marek Marczykowski-Górecki authored
      This reverts commit 2afeec08.
      
      The reasoning in the commit was wrong - the code expected to setup the
      watch even if 'hotplug-status' didn't exist. In fact, it relied on the
      watch being fired the first time - to check if maybe 'hotplug-status' is
      already set to 'connected'. Not registering a watch for non-existing
      path (which is the case if hotplug script hasn't been executed yet),
      made the backend not waiting for the hotplug script to execute. This in
      turns, made the netfront think the interface is fully operational, while
      in fact it was not (the vif interface on xen-netback side might not be
      configured yet).
      
      This was a workaround for 'hotplug-status' erroneously being removed.
      But since that is reverted now, the workaround is not necessary either.
      
      More discussion at
      https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#uSigned-off-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reviewed-by: default avatarPaul Durrant <paul@xen.org>
      Reviewed-by: default avatarMichael Brown <mbrown@fensystems.co.uk>
      Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e8240add
    • Marek Marczykowski-Górecki's avatar
      Revert "xen-netback: remove 'hotplug-status' once it has served its purpose" · 0f4558ae
      Marek Marczykowski-Górecki authored
      This reverts commit 1f256578.
      
      The 'hotplug-status' node should not be removed as long as the vif
      device remains configured. Otherwise the xen-netback would wait for
      re-running the network script even if it was already called (in case of
      the frontent re-connecting). But also, it _should_ be removed when the
      vif device is destroyed (for example when unbinding the driver) -
      otherwise hotplug script would not configure the device whenever it
      re-appear.
      
      Moving removal of the 'hotplug-status' node was a workaround for nothing
      calling network script after xen-netback module is reloaded. But when
      vif interface is re-created (on xen-netback unbind/bind for example),
      the script should be called, regardless of who does that - currently
      this case is not handled by the toolstack, and requires manual
      script call. Keeping hotplug-status=connected to skip the call is wrong
      and leads to not configured interface.
      
      More discussion at
      https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#uSigned-off-by: default avatarMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
      Reviewed-by: default avatarPaul Durrant <paul@xen.org>
      Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0f4558ae
    • Jens Axboe's avatar
      Merge tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme into block-5.17 · b2750f14
      Jens Axboe authored
      Pull NVMe fixes from Christoph:
      
      "nvme fixes for Linux 5.17
      
       - send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
       - fix passthrough to namespaces with unsupported features (me)"
      
      * tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme:
        nvme-tcp: send H2CData PDUs based on MAXH2CDATA
        nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
        nvme: don't return an error from nvme_configure_metadata
      b2750f14