Commits · 3f96d644976825986a93b7b9fe6a9900a80f2e11 · Kirill Smelkov / linux

28 Jan, 2021 2 commits

net: decnet: fix netdev refcount leaking on error path · 3f96d644

Vadim Fedorenko authored Jan 26, 2021

On building the route there is an assumption that the destination
could be local. In this case loopback_dev is used to get the address.
If the address is still cannot be retrieved dn_route_output_slow
returns EADDRNOTAVAIL with loopback_dev reference taken.

Cannot find hash for the fixes tag because this code was introduced
long time ago. I don't think that this bug has ever fired but the
patch is done just to have a consistent code base.
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/1611619334-20955-1-git-send-email-vfedorenko@novek.ruSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3f96d644

net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP · 20776b46

Rasmus Villemoes authored Jan 25, 2021

It's not true that switchdev_port_obj_notify() only inspects the
->handled field of "struct switchdev_notifier_port_obj_info" if
call_switchdev_blocking_notifiers() returns 0 - there's a WARN_ON()
triggering for a non-zero return combined with ->handled not being
true. But the real problem here is that -EOPNOTSUPP is not being
properly handled.

The wrapper functions switchdev_handle_port_obj_add() et al change a
return value of -EOPNOTSUPP to 0, and the treatment of ->handled in
switchdev_port_obj_notify() seems to be designed to change that back
to -EOPNOTSUPP in case nobody actually acted on the notifier (i.e.,
everybody returned -EOPNOTSUPP).

Currently, as soon as some device down the stack passes the check_cb()
check, ->handled gets set to true, which means that
switchdev_port_obj_notify() cannot actually ever return -EOPNOTSUPP.

This, for example, means that the detection of hardware offload
support in the MRP code is broken: switchdev_port_obj_add() used by
br_mrp_switchdev_send_ring_test() always returns 0, so since the MRP
code thinks the generation of MRP test frames has been offloaded, no
such frames are actually put on the wire. Similarly,
br_mrp_switchdev_set_ring_role() also always returns 0, causing
mrp->ring_role_offloaded to be set to 1.

To fix this, continue to set ->handled true if any callback returns
success or any error distinct from -EOPNOTSUPP. But if all the
callbacks return -EOPNOTSUPP, make sure that ->handled stays false, so
the logic in switchdev_port_obj_notify() can propagate that
information.

Fixes: 9a9f26e8 ("bridge: mrp: Connect MRP API with the switchdev API")
Fixes: f30f0601 ("switchdev: Add helpers to aid traversal through lower devices")
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Link: https://lore.kernel.org/r/20210125124116.102928-1-rasmus.villemoes@prevas.dkSigned-off-by: Jakub Kicinski <kuba@kernel.org>

20776b46

27 Jan, 2021 5 commits

Merge branch 'net-fec-fix-temporary-rmii-clock-reset-on-link-up' · 2bd29748

Jakub Kicinski authored Jan 26, 2021

Laurent Badel says:

====================
net: fec: Fix temporary RMII clock reset on link up

v2: fixed a compilation warning

The FEC drivers performs a "hardware reset" of the MAC module when the
link is reported to be up. This causes a short glitch in the RMII clock
due to the hardware reset clearing the receive control register which
controls the MII mode. It seems that some link partners do not tolerate
this glitch, and invalidate the link, which leads to a never-ending loop
of negotiation-link up-link down events.

This was observed with the iMX28 Soc and LAN8720/LAN8742 PHYs, with two
Intel adapters I218-LM and X722-DA2 as link partners, though a number of
other link partners do not seem to mind the clock glitch. Changing the
hardware reset to a software reset (clearing bit 1 of the ECR register)
cured the issue.

Attempts to optimize fec_restart() in order to minimize the duration of
the glitch were unsuccessful. Furthermore manually producing the glitch by
setting MII mode and then back to RMII in two consecutive instructions,
resulting in a clock glitch <10us in duration, was enough to cause the
partner to invalidate the link. This strongly suggests that the root cause
of the link being dropped is indeed the change in clock frequency.

In an effort to minimize changes to driver, the patch proposes to use
soft reset only for tested SoCs (iMX28) and only if the link is up. This
preserves hardware reset in other situations, which might be required for
proper setup of the MAC.
====================

Link: https://lore.kernel.org/r/20210125100745.5090-1-laurentbadel@eaton.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2bd29748

net: fec: Fix temporary RMII clock reset on link up · c730ab42

Laurent Badel authored Jan 25, 2021

fec_restart() does a hard reset of the MAC module when the link status
changes to up. This temporarily resets the R_CNTRL register which controls
the MII mode of the ENET_OUT clock. In the case of RMII, the clock
frequency momentarily drops from 50MHz to 25MHz until the register is
reconfigured. Some link partners do not tolerate this glitch and
invalidate the link causing failure to establish a stable link when using
PHY polling mode. Since as per IEEE802.3 the criteria for link validity
are PHY-specific, what the partner should tolerate cannot be assumed, so
avoid resetting the MII clock by using software reset instead of hardware
reset when the link is up. This is generally relevant only if the SoC
provides the clock to an external PHY and the PHY is configured for RMII.
Signed-off-by: Laurent Badel <laurentbadel@eaton.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c730ab42

net: lapb: Add locking to the lapb module · b491e6a7

Xie He authored Jan 25, 2021

In the lapb module, the timers may run concurrently with other code in
this module, and there is currently no locking to prevent the code from
racing on "struct lapb_cb". This patch adds locking to prevent racing.

1. Add "spinlock_t lock" to "struct lapb_cb"; Add "spin_lock_bh" and
"spin_unlock_bh" to APIs, timer functions and notifier functions.

2. Add "bool t1timer_stop, t2timer_stop" to "struct lapb_cb" to make us
able to ask running timers to abort; Modify "lapb_stop_t1timer" and
"lapb_stop_t2timer" to make them able to abort running timers;
Modify "lapb_t2timer_expiry" and "lapb_t1timer_expiry" to make them
abort after they are stopped by "lapb_stop_t1timer", "lapb_stop_t2timer",
and "lapb_start_t1timer", "lapb_start_t2timer".

3. Let lapb_unregister wait for other API functions and running timers
to stop.

4. The lapb_device_event function calls lapb_disconnect_request. In
order to avoid trying to hold the lock twice, add a new function named
"__lapb_disconnect_request" which assumes the lock is held, and make
it called by lapb_disconnect_request and lapb_device_event.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Cc: Martin Schiller <ms@dev.tdt.de>
Signed-off-by: Xie He <xie.he.0141@gmail.com>
Link: https://lore.kernel.org/r/20210126040939.69995-1-xie.he.0141@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b491e6a7

team: protect features update by RCU to avoid deadlock · f0947d0d

Ivan Vecera authored Jan 25, 2021

Function __team_compute_features() is protected by team->lock
mutex when it is called from team_compute_features() used when
features of an underlying device is changed. This causes
a deadlock when NETDEV_FEAT_CHANGE notifier for underlying device
is fired due to change propagated from team driver (e.g. MTU
change). It's because callbacks like team_change_mtu() or
team_vlan_rx_{add,del}_vid() protect their port list traversal
by team->lock mutex.

Example (r8169 case where this driver disables TSO for certain MTU
values):
...
[ 6391.348202]  __mutex_lock.isra.6+0x2d0/0x4a0
[ 6391.358602]  team_device_event+0x9d/0x160 [team]
[ 6391.363756]  notifier_call_chain+0x47/0x70
[ 6391.368329]  netdev_update_features+0x56/0x60
[ 6391.373207]  rtl8169_change_mtu+0x14/0x50 [r8169]
[ 6391.378457]  dev_set_mtu_ext+0xe1/0x1d0
[ 6391.387022]  dev_set_mtu+0x52/0x90
[ 6391.390820]  team_change_mtu+0x64/0xf0 [team]
[ 6391.395683]  dev_set_mtu_ext+0xe1/0x1d0
[ 6391.399963]  do_setlink+0x231/0xf50
...

In fact team_compute_features() called from team_device_event()
does not need to be protected by team->lock mutex and rcu_read_lock()
is sufficient there for port list traversal.

Fixes: 3d249d4c ("net: introduce ethernet teaming device")
Cc: Saeed Mahameed <saeed@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20210125074416.4056484-1-ivecera@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f0947d0d

MAINTAINERS: add David Ahern to IPv4/IPv6 maintainers · 5cfeb562

Jakub Kicinski authored Jan 22, 2021

David has been the de-facto maintainer for much of the IP code
for the last couple of years, let's make it official.

Link: https://lore.kernel.org/r/20210122173220.3579491-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5cfeb562

26 Jan, 2021 6 commits

Merge tag 'mac80211-for-net-2021-01-26' of... · c5e9e8d4

Jakub Kicinski authored Jan 26, 2021

Merge tag 'mac80211-for-net-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A couple of fixes:
 * fix 160 MHz channel switch in mac80211
 * fix a staging driver to not deadlock due to some
   recent cfg80211 changes
 * fix NULL-ptr deref if cfg80211 returns -EINPROGRESS
   to wext (syzbot)
 * pause TX in mac80211 in type change to prevent crashes
   (syzbot)

* tag 'mac80211-for-net-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
  staging: rtl8723bs: fix wireless regulatory API misuse
  mac80211: pause TX while changing interface type
  wext: fix NULL-ptr-dereference with cfg80211's lack of commit()
  mac80211: 160MHz with extended NSS BW in CSA
====================

Link: https://lore.kernel.org/r/20210126130529.75225-1-johannes@sipsolutions.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c5e9e8d4

Merge tag 'wireless-drivers-2021-01-26' of... · db22ce68

Jakub Kicinski authored Jan 26, 2021

Merge tag 'wireless-drivers-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for v5.11

Second set of fixes for v5.11. Like in last time we again have more
fixes than usual Actually a bit too much for my liking in this state
of the cycle, but due to unrelated challenges I was only able to
submit them now.

We have few important crash fixes, iwlwifi modifying read-only data
being the most reported issue, and also smaller fixes to iwlwifi.

mt76
 * fix a clang warning about enum usage
 * fix rx buffer refcounting crash

mt7601u
 * fix rx buffer refcounting crash
 * fix crash when unbplugging the device

iwlwifi
 * fix a crash where we were modifying read-only firmware data
 * lots of smaller fixes all over the driver

* tag 'wireless-drivers-2021-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers: (24 commits)
  mt7601u: fix kernel crash unplugging the device
  iwlwifi: queue: bail out on invalid freeing
  iwlwifi: mvm: guard against device removal in reprobe
  iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
  iwlwifi: mvm: clear IN_D3 after wowlan status cmd
  iwlwifi: pcie: add rules to match Qu with Hr2
  iwlwifi: mvm: invalidate IDs of internal stations at mvm start
  iwlwifi: mvm: fix the return type for DSM functions 1 and 2
  iwlwifi: pcie: reschedule in long-running memory reads
  iwlwifi: pcie: use jiffies for memory read spin time limit
  iwlwifi: pcie: fix context info memory leak
  iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap
  iwlwifi: pcie: set LTR on more devices
  iwlwifi: queue: don't crash if txq->entries is NULL
  iwlwifi: fix the NMI flow for old devices
  iwlwifi: pnvm: don't try to load after failures
  iwlwifi: pnvm: don't skip everything when not reloading
  iwlwifi: pcie: avoid potential PNVM leaks
  iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time()
  iwlwifi: mvm: skip power command when unbinding vif during CSA
  ...
====================

Link: https://lore.kernel.org/r/20210126092202.6A367C433CA@smtp.codeaurora.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

db22ce68

iwlwifi: provide gso_type to GSO packets · 81a86e1b

Eric Dumazet authored Jan 25, 2021

net/core/tso.c got recent support for USO, and this broke iwlfifi
because the driver implemented a limited form of GSO.

Providing ->gso_type allows for skb_is_gso_tcp() to provide
a correct result.

Fixes: 3d5b459b ("net: tso: add UDP segmentation support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ben Greear <greearb@candelatech.com>
Tested-by: Ben Greear <greearb@candelatech.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209913
Link: https://lore.kernel.org/r/20210125150949.619309-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

81a86e1b

staging: rtl8723bs: fix wireless regulatory API misuse · 81f153fa

Johannes Berg authored Jan 26, 2021

This code ends up calling wiphy_apply_custom_regulatory(), for which
we document that it should be called before wiphy_register(). This
driver doesn't do that, but calls it from ndo_open() with the RTNL
held, which caused deadlocks.

Since the driver just registers static regdomain data and then the
notifier applies the channel changes if any, there's no reason for
it to call this in ndo_open(), move it earlier to fix the deadlock.
Reported-and-tested-by: Hans de Goede <hdegoede@redhat.com>
Fixes: 51d62f2f ("cfg80211: Save the regulatory domain with a lock")
Link: https://lore.kernel.org/r/20210126115409.d5fd6f8fe042.Ib5823a6feb2e2aa01ca1a565d2505367f38ad246@changeidAcked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

81f153fa

mac80211: pause TX while changing interface type · 054c9939

Johannes Berg authored Jan 22, 2021

syzbot reported a crash that happened when changing the interface
type around a lot, and while it might have been easy to fix just
the symptom there, a little deeper investigation found that really
the reason is that we allowed packets to be transmitted while in
the middle of changing the interface type.

Disallow TX by stopping the queues while changing the type.

Fixes: 34d4bc4d ("mac80211: support runtime interface type changes")
Reported-by: syzbot+d7a3b15976bf7de2238a@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20210122171115.b321f98f4d4f.I6997841933c17b093535c31d29355be3c0c39628@changeidSigned-off-by: Johannes Berg <johannes.berg@intel.com>

054c9939

wext: fix NULL-ptr-dereference with cfg80211's lack of commit() · 51225651

Johannes Berg authored Jan 21, 2021

Since cfg80211 doesn't implement commit, we never really cared about
that code there (and it's configured out w/o CONFIG_WIRELESS_EXT).
After all, since it has no commit, it shouldn't return -EIWCOMMIT to
indicate commit is needed.

However, EIWCOMMIT is actually an alias for EINPROGRESS, which _can_
happen if e.g. we try to change the frequency but we're already in
the process of connecting to some network, and drivers could return
that value (or even cfg80211 itself might).

This then causes us to crash because dev->wireless_handlers is NULL
but we try to check dev->wireless_handlers->standard[0].

Fix this by also checking dev->wireless_handlers. Also simplify the
code a little bit.

Cc: stable@vger.kernel.org
Reported-by: syzbot+444248c79e117bc99f46@syzkaller.appspotmail.com
Reported-by: syzbot+8b2a88a09653d4084179@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20210121171621.2076e4a37d5a.I5d9c72220fe7bb133fb718751da0180a57ecba4e@changeidSigned-off-by: Johannes Berg <johannes.berg@intel.com>

51225651

25 Jan, 2021 21 commits

uapi: fix big endian definition of ipv6_rpl_sr_hdr · 07d46d93

Justin Iurman authored Jan 21, 2021

Following RFC 6554 [1], the current order of fields is wrong for big
endian definition. Indeed, here is how the header looks like:

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Next Header  |  Hdr Ext Len  | Routing Type  | Segments Left |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CmprI | CmprE |  Pad  |               Reserved                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This patch reorders fields so that big endian definition is now correct.

  [1] https://tools.ietf.org/html/rfc6554#section-3

Fixes: cfa933d9 ("include: uapi: linux: add rpl sr header definition")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

07d46d93

mt7601u: fix kernel crash unplugging the device · 0acb20a5

Lorenzo Bianconi authored Jan 17, 2021

The following crash log can occur unplugging the usb dongle since,
after the urb poison in mt7601u_free_tx_queue(), usb_submit_urb() will
always fail resulting in a skb kfree while the skb has been already
queued.

Fix the issue enqueuing the skb only if usb_submit_urb() succeed.

Hardware name: Hewlett-Packard 500-539ng/2B2C, BIOS 80.06 04/01/2015
Workqueue: usb_hub_wq hub_event
RIP: 0010:skb_trim+0x2c/0x30
RSP: 0000:ffffb4c88005bba8 EFLAGS: 00010206
RAX: 000000004ad483ee RBX: ffff9a236625dee0 RCX: 000000000000662f
RDX: 000000000000000c RSI: 0000000000000000 RDI: ffff9a2343179300
RBP: ffff9a2343179300 R08: 0000000000000001 R09: 0000000000000000
R10: ffff9a23748f7840 R11: 0000000000000001 R12: ffff9a236625e4d4
R13: ffff9a236625dee0 R14: 0000000000001080 R15: 0000000000000008
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd410a34ef8 CR3: 00000001416ee001 CR4: 00000000001706f0
Call Trace:
 mt7601u_tx_status+0x3e/0xa0 [mt7601u]
 mt7601u_dma_cleanup+0xca/0x110 [mt7601u]
 mt7601u_cleanup+0x22/0x30 [mt7601u]
 mt7601u_disconnect+0x22/0x60 [mt7601u]
 usb_unbind_interface+0x8a/0x270
 ? kernfs_find_ns+0x35/0xd0
 __device_release_driver+0x17a/0x230
 device_release_driver+0x24/0x30
 bus_remove_device+0xdb/0x140
 device_del+0x18b/0x430
 ? kobject_put+0x98/0x1d0
 usb_disable_device+0xc6/0x1f0
 usb_disconnect.cold+0x7e/0x20a
 hub_event+0xbf3/0x1870
 process_one_work+0x1b6/0x350
 worker_thread+0x53/0x3e0
 ? process_one_work+0x350/0x350
 kthread+0x11b/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30

Fixes: 23377c20 ("mt7601u: fix possible memory leak when the device is disconnected")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/3b85219f669a63a8ced1f43686de05915a580489.1610919247.git.lorenzo@kernel.org

0acb20a5

iwlwifi: queue: bail out on invalid freeing · 0bed6a2a

Johannes Berg authored Jan 22, 2021

If we find an entry without an SKB, we currently continue, but
that will just result in an infinite loop since we won't increment
the read pointer, and will try the same thing over and over again.
Fix this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.abe2dedcc3ac.Ia6b03f9eeb617fd819e56dd5376f4bb8edc7b98a@changeid

0bed6a2a

iwlwifi: mvm: guard against device removal in reprobe · 7a21b1d4

Johannes Berg authored Jan 22, 2021

If we get into a problem severe enough to attempt a reprobe,
we schedule a worker to do that. However, if the problem gets
more severe and the device is actually destroyed before this
worker has a chance to run, we use a free device. Bump up the
reference count of the device until the worker runs to avoid
this situation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.871f0892e4b2.I94819e11afd68d875f3e242b98bef724b8236f1e@changeid

7a21b1d4

iwlwifi: Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit. · 4886460c

Matti Gottlieb authored Jan 22, 2021

The bit that indicates if the device supports 160MHZ
is bit #9. The macro checks bit #8.

Fix IWL_SUBDEVICE_NO_160 macro to use the correct bit.
Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com>
Fixes: d6f2134a ("iwlwifi: add mac/rf types and 160MHz to the device tables")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.bddbf9b57a75.I16e09e2b1404b16bfff70852a5a654aa468579e2@changeid

4886460c

iwlwifi: mvm: clear IN_D3 after wowlan status cmd · 96d2bfb7

Shaul Triebitz authored Jan 22, 2021

In D3 resume flow, avoid the following race where sending
packets before updating the sequence number (sequence
number received from the wowlan status command response):
Thread 1:
__iwl_mvm_resume clears IWL_MVM_STATUS_IN_D3 and is cut
by thread 2 before reaching iwl_mvm_query_wakeup_reasons.
Thread 2:
iwl_mvm_mac_itxq_xmit calls iwl_mvm_tx_skb since
IWL_MVM_STATUS_IN_D3 is not set using a wrong sequence number.
Thread 1:
__iwl_mvm_resume continues and calls iwl_mvm_query_wakeup_reasons
updating the sequence number received from the firmware.

The next packet that will be sent now will cause sysassert 0x1096.

Fix the bug by moving 'clear IWL_MVM_STATUS_IN_D3' to after
sending the wowlan status command and updating the sequence
number.
Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.fe927ec939c6.I103d3321fb55da7e6c6c51582cfadf94eb8b6c58@changeid

96d2bfb7

iwlwifi: pcie: add rules to match Qu with Hr2 · 16062c12

Luca Coelho authored Jan 22, 2021

Until now we have been relying on matching the PCI ID and subsystem
device ID in order to recognize Qu devices with Hr2. Add rules to
match these devices, so that we don't have to add a new rule for every
new ID we get.
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.591ce253ddd8.Ia4b9cc2c535625890c6d6b560db97ee9f2d5ca3b@changeid

16062c12

iwlwifi: mvm: invalidate IDs of internal stations at mvm start · e223e42a

Gregory Greenman authored Jan 22, 2021

Having sta_id not set for aux_sta and snif_sta can potentially lead to a
hard to debug issue in case remove station is called without an add. In
this case sta_id 0, an unrelated regular station, will be removed.

In fact, we do have a FW assert that occures rarely and from the debug
data analysis it looks like sta_id 0 is removed by mistake, though it's
hard to pinpoint the exact flow. The WARN_ON in this patch should help
to find it.
Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.5dc6dd9b22d5.I2add1b5ad24d0d0a221de79d439c09f88fcaf15d@changeid

e223e42a

iwlwifi: mvm: fix the return type for DSM functions 1 and 2 · aefbe5c4

Matt Chen authored Jan 22, 2021

The return type value of functions 1 and 2 were considered to be an
integer inside a buffer, but they can also be only an integer, without
the buffer. Fix the code in iwl_acpi_get_dsm_u8() to handle it as a
single integer value, as well as packed inside a buffer.
Signed-off-by: Matt Chen <matt.chen@intel.com>
Fixes: 9db93491 ("iwlwifi: acpi: support device specific method (DSM)")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210122144849.5757092adcd6.Ic24524627b899c9a01af38107a62a626bdf5ae3a@changeid

aefbe5c4

iwlwifi: pcie: reschedule in long-running memory reads · 3d372c4e

Johannes Berg authored Jan 15, 2021

If we spin for a long time in memory reads that (for some reason in
hardware) take a long time, then we'll eventually get messages such
as

  watchdog: BUG: soft lockup - CPU#2 stuck for 24s! [kworker/2:2:272]

This is because the reading really does take a very long time, and
we don't schedule, so we're hogging the CPU with this task, at least
if CONFIG_PREEMPT is not set, e.g. with CONFIG_PREEMPT_VOLUNTARY=y.

Previously I misinterpreted the situation and thought that this was
only going to happen if we had interrupts disabled, and then fixed
this (which is good anyway, however), but that didn't always help;
looking at it again now I realized that the spin unlock will only
reschedule if CONFIG_PREEMPT is used.

In order to avoid this issue, change the code to cond_resched() if
we've been spinning for too long here.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 04516706 ("iwlwifi: pcie: limit memory read spin time")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130253.217a9d6a6a12.If964cb582ab0aaa94e81c4ff3b279eaafda0fd3f@changeid

3d372c4e

iwlwifi: pcie: use jiffies for memory read spin time limit · 67013174

Johannes Berg authored Jan 15, 2021

There's no reason to use ktime_get() since we don't need any better
precision than jiffies, and since we no longer disable interrupts
around this code (when grabbing NIC access), jiffies will work fine.
Use jiffies instead of ktime_get().

This cleanup is preparation for the following patch "iwlwifi: pcie: reschedule
in long-running memory reads". The code gets simpler with the weird clock use
etc. removed before we add cond_resched().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130253.621c948b1fad.I3ee9f4bc4e74a0c9125d42fb7c35cd80df4698a1@changeid

67013174

iwlwifi: pcie: fix context info memory leak · 2d6bc752

Johannes Berg authored Jan 15, 2021

If the image loader allocation fails, we leak all the previously
allocated memory. Fix this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.97172cbaa67c.I3473233d0ad01a71aa9400832fb2b9f494d88a11@changeid

2d6bc752

iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap · 98c7d21f

Emmanuel Grumbach authored Jan 15, 2021

I hit a NULL pointer exception in this function when the
init flow went really bad.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.2e8da9f2c132.I0234d4b8ddaf70aaa5028a20c863255e05bc1f84@changeid

98c7d21f

iwlwifi: pcie: set LTR on more devices · ed0022da

Johannes Berg authored Jan 15, 2021

To avoid completion timeouts during device boot, set up the
LTR timeouts on more devices - similar to what we had before
for AX210.

This also corrects the AX210 workaround to be done only on
discrete (non-integrated) devices, otherwise the registers
have no effect.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: edb62520 ("iwlwifi: pcie: set LTR to avoid completion timeout")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.fb819e19530b.I0396f82922db66426f52fbb70d32a29c8fd66951@changeid

ed0022da

iwlwifi: queue: don't crash if txq->entries is NULL · 0f8d5656

Emmanuel Grumbach authored Jan 15, 2021

The code was really awkward, we would first dereference
txq->entries when calling iwl_txq_genX_tfd_unmap and then
we would check that txq->entries is non-NULL.
Fix that by exiting if txq->entries is NULL.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.173359fc236d.I75c7c2397d20df8d7fbc24cb16a5232d5c551889@changeid

0f8d5656

iwlwifi: fix the NMI flow for old devices · a800f958

Emmanuel Grumbach authored Jan 15, 2021

I noticed that the flow that triggers an NMI on the firmware
for old devices (tested on 7265) doesn't work.
Apparently, the firmware / device is still in low power when
we write the register that triggers the NMI. We call the
"grab_nic_access" function to make sure the device is awake
but that wasn't enough. I played with this and noticed that
if we wait 1 ms after the device reports it is awake before
we write to the NMI register, the device always sees our
write and the firmware gets properly asserted.

Triggering an NMI to the firmware can be done with the
debugfs hook:
echo 1 > /sys/kernel/debug/iwlwifi/0000\:00\:03.0/iwlmvm/fw_nmi

What happened before is that the firmware would just stall
without running its NMI routine. Because of that the driver
wouldn't get the "firmware crashed" interrupt. After a while
the driver would notice that the firmware is not responding
to some command and it would read the error data from the
firmware, but this data is populated in the NMI service
routine in the firmware which was not called. So in the logs
it looked like:

iwlwifi 0000:00:03.0: Error sending REPLY_ERROR: time out after 2000ms.
iwlwifi 0000:00:03.0: Current CMD queue read_ptr 33 write_ptr 34
iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
iwlwifi 0000:00:03.0: 0x00000000 | ADVANCED_SYSASSERT
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status0
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:00:03.0: 0x00000000 | branchlink2
iwlwifi 0000:00:03.0: 0x00000000 | interruptlink1
iwlwifi 0000:00:03.0: 0x00000000 | interruptlink2
iwlwifi 0000:00:03.0: 0x00000000 | data1
iwlwifi 0000:00:03.0: 0x00000000 | data2
iwlwifi 0000:00:03.0: 0x00000000 | data3
iwlwifi 0000:00:03.0: 0x00000000 | beacon time
iwlwifi 0000:00:03.0: 0x00000000 | tsf low
...

With this fix, immediately after we trigger the NMI to the
firmware, we get the expected:
iwlwifi 0000:00:03.0: Microcode SW error detected.  Restarting 0x2000000.
iwlwifi 0000:00:03.0: Start IWL Error Log Dump:
iwlwifi 0000:00:03.0: Status: 0x00000040, count: 6
iwlwifi 0000:00:03.0: Loaded firmware version: 29.09bd31e1.0 7265D-29.ucode
iwlwifi 0000:00:03.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
iwlwifi 0000:00:03.0: 0x000002F1 | trm_hw_status0
iwlwifi 0000:00:03.0: 0x00000000 | trm_hw_status1
iwlwifi 0000:00:03.0: 0x00043D6C | branchlink2
iwlwifi 0000:00:03.0: 0x0004AFD6 | interruptlink1
iwlwifi 0000:00:03.0: 0x000008C4 | interruptlink2
iwlwifi 0000:00:03.0: 0x00000000 | data1
iwlwifi 0000:00:03.0: 0x00000080 | data2
iwlwifi 0000:00:03.0: 0x07030000 | data3
iwlwifi 0000:00:03.0: 0x003FD4C3 | beacon time
iwlwifi 0000:00:03.0: 0x00C22AC3 | tsf low
iwlwifi 0000:00:03.0: 0x00000000 | tsf hi
iwlwifi 0000:00:03.0: 0x00000000 | time gp1
iwlwifi 0000:00:03.0: 0x00C22AC3 | time gp2
iwlwifi 0000:00:03.0: 0x00000001 | uCode revision type
iwlwifi 0000:00:03.0: 0x0000001D | uCode version major

Notice the first line: "Microcode SW error detected:" which
is printed in the driver's ISR, which means that the driver
actually got an interrupt from the firmware saying that it
crashed. And then we have the properly populated error data.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.70e67cc75d88.I6615cad4361862e7f3c9f2d3cafb6a8c61e16781@changeid

a800f958

iwlwifi: pnvm: don't try to load after failures · 82a08d0c

Johannes Berg authored Jan 15, 2021

If loading the PNVM file failed on the first try during the
interface up, the file is unlikely to show up later, and we
already don't try to reload it if it changes, so just don't
try loading it again and again.

This also fixes some issues where we may try to load it at
resume time, which may not be possible yet.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 69725928 ("iwlwifi: read and parse PNVM file")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.5ac6828a0bbe.I7d308358b21d3c0c84b1086999dbc7267f86e219@changeid

82a08d0c

iwlwifi: pnvm: don't skip everything when not reloading · 1c58bed4

Johannes Berg authored Jan 15, 2021

Even if we don't reload the file from disk, we still need to
trigger the PNVM load flow with the device; fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 69725928 ("iwlwifi: read and parse PNVM file")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.85ef56c4ef8c.I3b853ce041a0755d45e448035bef1837995d191b@changeid

1c58bed4

iwlwifi: pcie: avoid potential PNVM leaks · 34b9434c

Johannes Berg authored Jan 15, 2021

If we erroneously try to set the PNVM data again after it has
already been set, we could leak the old DMA memory. Avoid that
and warn, we shouldn't be doing this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Fixes: 69725928 ("iwlwifi: read and parse PNVM file")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.929c2d680429.I086b9490e6c005f3bcaa881b617e9f61908160f3@changeid

34b9434c

iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time() · 5c56d862

Johannes Berg authored Jan 15, 2021

We need to take the mutex to call iwl_mvm_get_sync_time(), do it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.4bb5ccf881a6.I62973cbb081e80aa5b0447a5c3b9c3251a65cf6b@changeid

5c56d862

iwlwifi: mvm: skip power command when unbinding vif during CSA · bf544e9a

Sara Sharon authored Jan 15, 2021

In the new CSA flow, we remain associated during CSA, but
still do a unbind-bind to the vif. However, sending the power
command right after when vif is unbound but still associated
causes FW to assert (0x3400) since it cannot tell the LMAC id.

Just skip this command, we will send it again in a bit, when
assigning the new context.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/iwlwifi.20210115130252.64a2254ac5c3.Iaa3a9050bf3d7c9cd5beaf561e932e6defc12ec3@changeid

bf544e9a

24 Jan, 2021 2 commits

tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN · 62d9f1a6

Pengcheng Yang authored Jan 24, 2021

Upon receiving a cumulative ACK that changes the congestion state from
Disorder to Open, the TLP timer is not set. If the sender is app-limited,
it can only wait for the RTO timer to expire and retransmit.

The reason for this is that the TLP timer is set before the congestion
state changes in tcp_ack(), so we delay the time point of calling
tcp_set_xmit_timer() until after tcp_fastretrans_alert() returns and
remove the FLAG_SET_XMIT_TIMER from ack_flag when the RACK reorder timer
is set.

This commit has two additional benefits:
1) Make sure to reset RTO according to RFC6298 when receiving ACK, to
avoid spurious RTO caused by RTO timer early expires.
2) Reduce the xmit timer reschedule once per ACK when the RACK reorder
timer is set.

Fixes: df92c839 ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
Link: https://lore.kernel.org/netdev/1611311242-6675-1-git-send-email-yangpc@wangsu.comSigned-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1611464834-23030-1-git-send-email-yangpc@wangsu.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

62d9f1a6

tcp: make TCP_USER_TIMEOUT accurate for zero window probes · 344db93a

Enke Chen authored Jan 22, 2021

The TCP_USER_TIMEOUT is checked by the 0-window probe timer. As the
timer has backoff with a max interval of about two minutes, the
actual timeout for TCP_USER_TIMEOUT can be off by up to two minutes.

In this patch the TCP_USER_TIMEOUT is made more accurate by taking it
into account when computing the timer value for the 0-window probes.

This patch is similar to and builds on top of the one that made
TCP_USER_TIMEOUT accurate for RTOs in commit b701a99e ("tcp: Add
tcp_clamp_rto_to_user_timeout() helper to improve accuracy").

Fixes: 9721e709 ("tcp: simplify window probe aborting on USER_TIMEOUT")
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210122191306.GA99540@localhost.localdomainSigned-off-by: Jakub Kicinski <kuba@kernel.org>

344db93a

23 Jan, 2021 4 commits

NFC: fix resource leak when target index is invalid · 3a30537c

Pan Bian authored Jan 21, 2021

Goto to the label put_dev instead of the label error to fix potential
resource leak on path that the target index is invalid.

Fixes: c4fbb651 ("NFC: The core part should generate the target index")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210121152748.98409-1-bianpan2016@163.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3a30537c

NFC: fix possible resource leak · d8f923c3

Pan Bian authored Jan 21, 2021

Put the device to avoid resource leak on path that the polling flag is
invalid.

Fixes: a831b913 ("NFC: Do not return EBUSY when stopping a poll that's already stopped")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210121153745.122184-1-bianpan2016@163.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

d8f923c3

doc: networking: ip-sysctl: Document conf/all/disable_ipv6 and conf/default/disable_ipv6 · fc024c5c

Pali Rohár authored Jan 21, 2021

This patch adds documentation for sysctl conf/all/disable_ipv6 and
conf/default/disable_ipv6 settings which is currently missing.
Signed-off-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20210121150244.20483-1-pali@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

fc024c5c

chtls: Fix potential resource leak · b6011966

Pan Bian authored Jan 21, 2021

The dst entry should be released if no neighbour is found. Goto label
free_dst to fix the issue. Besides, the check of ndev against NULL is
redundant.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210121145738.51091-1-bianpan2016@163.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b6011966