Commits · 4373a5e2606b4eda14fa096caf93dc2efc22689f · Kirill Smelkov / linux

15 Jun, 2019 12 commits

David S. Miller authored Jun 14, 2019

Eric Dumazet says:

====================
net/packet: better behavior under DDOS

Using tcpdump (or other af_packet user) on a busy host can lead to
catastrophic consequences, because suddenly, potentially all cpus
are spinning on a contended spinlock.

Both packet_rcv() and tpacket_rcv() grab the spinlock
to eventually find there is no room for an additional packet.

This patch series align packet_rcv() and tpacket_rcv() to both
check if the queue is full before grabbing the spinlock.

If the queue is full, they both increment a new atomic counter
placed on a separate cache line to let readers drain the queue faster.

There is still false sharing on this new atomic counter,
we might in the future make it per cpu if there is interest.
====================
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4373a5e2

net/packet: introduce packet_rcv_try_clear_pressure() helper · 9bb6cd65

Eric Dumazet authored Jun 12, 2019

There are two places where we want to clear the pressure
if possible, add a helper to make it more obvious.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Willem de Bruijn <willemb@google.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9bb6cd65

net/packet: remove locking from packet_rcv_has_room() · 3a2bb84e

Eric Dumazet authored Jun 12, 2019

__packet_rcv_has_room() can now be run without lock being held.

po->pressure is only a non persistent hint, we can mark
all read/write accesses with READ_ONCE()/WRITE_ONCE()
to document the fact that the field could be written
without any synchronization.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a2bb84e

net/packet: implement shortcut in tpacket_rcv() · 2c51c627

Eric Dumazet authored Jun 12, 2019

tpacket_rcv() can be hit under DDOS quite hard, since
it will always grab a socket spinlock, to eventually find
there is no room for an additional packet.

Using tcpdump [1] on a busy host can lead to catastrophic consequences,
because of all cpus spinning on a contended spinlock.

This replicates a similar strategy used in packet_rcv()

[1] Also some applications mistakenly use af_packet socket
bound to ETH_P_ALL only to send packets.
Receive queue is never drained and immediately full.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2c51c627

net/packet: make tp_drops atomic · 8e8e2951

Eric Dumazet authored Jun 12, 2019

Under DDOS, we want to be able to increment tp_drops without
touching the spinlock. This will help readers to drain
the receive queue slightly faster :/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e8e2951

net/packet: constify __packet_rcv_has_room() · 0338a145

Eric Dumazet authored Jun 12, 2019

Goal is use the helper without lock being held.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0338a145

net/packet: constify prb_lookup_block() and __tpacket_v3_has_room() · dcf70cef

Eric Dumazet authored Jun 12, 2019

Goal is to be able to use __tpacket_v3_has_room() without holding
a lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dcf70cef

net/packet: constify packet_lookup_frame() and __tpacket_has_room() · d4b5bd98

Eric Dumazet authored Jun 12, 2019

Goal is to be able to use __tpacket_has_room() without holding a lock.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d4b5bd98

net/packet: constify __packet_get_status() argument · 96f657e6

Eric Dumazet authored Jun 12, 2019

struct packet_sock  is only read.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96f657e6

net: phy: Add more 1000BaseX support detection · f30e33bc

Robert Hancock authored Jun 11, 2019

Commit "net: phy: Add detection of 1000BaseX link mode support" added
support for not filtering out 1000BaseX mode from the PHY's supported
modes in genphy_config_init, but we have to make a similar change in
genphy_read_abilities in order to actually detect it as a supported mode
in the first place. Add this in.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f30e33bc

net: ethernet: ti: cpsw_ethtool: simplify slave loops · 9126e75e

Ivan Khoronzhuk authored Jun 12, 2019

Only for consistency reasons, do it like in main cpsw.c module
and use ndev reference but not by means of slave.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9126e75e

net: ethernet: ti: cpsw: use cpsw as drv data · bfe59032

Ivan Khoronzhuk authored Jun 12, 2019

No need to set ndev for drvdata when mainly cpsw reference is needed,
so correct this legacy decision.
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bfe59032

14 Jun, 2019 28 commits

Merge branch 'net-mlx5-use-indirect-call-wrappers' · eea9e3a4

David S. Miller authored Jun 14, 2019

Paolo Abeni says:

====================
net/mlx5: use indirect call wrappers

The mlx5_core driver uses several indirect calls in fast-path, some of them
are invoked on each ingress packet, even for the XDP-only traffic.

This series leverage the indirect call wrappers infrastructure the avoid
the expansive RETPOLINE overhead for 2 indirect calls in fast-path.

Each call is addressed on a different patch, plus we need to introduce a couple
of additional helpers to cope with the higher number of possible direct-call
alternatives.

v2 -> v3:
 - do not add more INDIRECT_CALL_* macros
 - use only the direct calls always available regardless of
   the mlx5 build options in the last patch

v1 -> v2:
 - update the direct call list and use a macro to define it,
   as per Saeed suggestion. An intermediated additional
   macro is needed to allow arg list expansion
 - patch 2/3 is unchanged, as the generated code looks better this way than
   with possible alternative (dropping BP hits)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

eea9e3a4

net/mlx5e: use indirect calls wrapper for the rx packet handler · 55f96872

Paolo Abeni authored Jun 12, 2019

We can avoid another indirect call per packet wrapping the rx
handler call with the proper helper.

To ensure that even the last listed direct call experience
measurable gain, despite the additional conditionals we must
traverse before reaching it, I tested reversing the order of the
listed options, with performance differences below noise level.

Together with the previous indirect call patch, this gives
~6% performance improvement in raw UDP tput.

v2 -> v3:
 - use only the direct calls always available regardless of
   the mlx5 build options
 - drop the direct call list macro, to keep the code as simple
   as possible for future rework

v1 -> v2:
 - update the direct call list and use a macro to define it,
   as per Saeed suggestion. An intermediated additional
   macro is needed to allow arg list expansion
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

55f96872

net/mlx5e: use indirect calls wrapper for skb allocation · b3c04e83

Paolo Abeni authored Jun 12, 2019

We can avoid an indirect call per packet wrapping the skb creation
with the appropriate helper.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b3c04e83

Merge tag 'mac80211-next-for-davem-2019-06-14' of... · d96ec975

David S. Miller authored Jun 14, 2019

Merge tag 'mac80211-next-for-davem-2019-06-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Many changes all over:
 * HE (802.11ax) work continues
 * WPA3 offloads
 * work on extended key ID handling continues
 * fixes to honour AP supported rates with auth/assoc frames
 * nl80211 netlink policy improvements to fix some issues
   with strict validation on new commands with old attrs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d96ec975

sched: act_ctinfo: use extack error reporting · 733f0766

Kevin Darbyshire-Bryant authored Jun 14, 2019

Use extack error reporting mechanism in addition to returning -EINVAL

NL_SET_ERR_* code shamelessy copy/paste/adjusted from act_pedit &
sch_cake and used as reference as to what I should have done in the
first place.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

733f0766

l2tp: no need to check return value of debugfs_create functions · 3adcfa44

Greg Kroah-Hartman authored Jun 14, 2019

When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Also, there is no need to store the individual debugfs file name, just
remove the whole directory all at once, saving a local variable.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guillaume Nault <g.nault@alphalink.fr>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3adcfa44

Merge branch 'r8169-add-and-use-helper-rtl_is_8168evl_up' · 0b55b630

David S. Miller authored Jun 14, 2019

Heiner Kallweit says:

====================
r8169: add and use helper rtl_is_8168evl_up

Few registers have been added or changed its purpose with version
RTL8168e-vl, so create a helper for identifying chip versions from
RTL8168e-vl.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0b55b630

r8169: use helper rtl_is_8168evl_up for setting register MaxTxPacketSize · 272b2265

Heiner Kallweit authored Jun 14, 2019

>From RTL8168e-vl the value in register MaxTxPacketSize is interpreted
differently, therefore use new helper rtl_is_8168evl_up to set this
register.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

272b2265

r8169: add helper rtl_is_8168evl_up · 9e9f33ba

Heiner Kallweit authored Jun 14, 2019

Add helper rtl_is_8168evl_up to make the code better readable and to
simplify it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e9f33ba

mac80211: notify offchannel expire on mgmt_tx · ddb754aa

James Prestwood authored Jun 12, 2019

When the offchannel TX wait time expires, send the appropriate event.
Signed-off-by: James Prestwood <james.prestwood@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

ddb754aa

nl80211: send event when CMD_FRAME duration expires · 1c38c7f2

James Prestwood authored Jun 12, 2019

cfg80211_remain_on_channel_expired is used to notify userspace when
the remain on channel duration expired by sending an event. There is
no such equivalent to CMD_FRAME, where if offchannel and a duration
is provided, the card will go offchannel for that duration. Currently
there is no way for userspace to tell when that duration expired
apart from setting an independent timeout. This timeout is quite
erroneous as the kernel may not immediately send out the frame
because of scheduling or work queue delays. In testing, it was found
this timeout had to be quite large to accomidate any potential delays.

A better solution is to have the kernel send an event when this
duration has expired. There is already NL80211_CMD_FRAME_WAIT_CANCEL
which can be used to cancel a NL80211_CMD_FRAME offchannel. Using this
command matches perfectly to how NL80211_CMD_CANCEL_REMAIN_ON_CHANNEL
works, where its both used to cancel and notify if the duration has
expired.
Signed-off-by: James Prestwood <james.prestwood@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

1c38c7f2

mac80211: no need to check return value of debugfs_create functions · 5a7bb7ce

Greg Kroah-Hartman authored Jun 14, 2019

When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

5a7bb7ce

mac80211: extend __rate_control_send_low warning · 163a7cdd

Johannes Berg authored May 29, 2019

This appears to happen occasionally, and if it does we
really want even more information than we have now.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

163a7cdd

mac80211: fill low rate even for HAS_RATE_CONTROL · 583a7a34

Johannes Berg authored May 29, 2019

If HW advertises it has rate control, we skip all of the
rate control assignments, but sometimes the data we have
here is useful, especially so that we don't have to do
the lookups again on which rates are configured and are
supported.

So do the low rate assignment anyway to help out drivers
that might need it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

583a7a34

mac80211: use STA info in rate_control_send_low() · bd718fc1

Johannes Berg authored May 29, 2019

Even if we have a station, we currently call rate_control_send_low()
with the NULL station unless further rate control (driver, minstrel)
has been initialized.

Change this so we can use more information about the station to use
a better rate. For example, when we associate with an AP, we will
now use the lowest rate it advertised as supported (that we can)
rather than the lowest mandatory rate. This aligns our behaviour
with most other 802.11 implementations.

To make this possible, we need to also ensure that we have non-zero
rates at all times, so in case we really have *nothing* pre-fill
the supp_rates bitmap with the very lowest mandatory bitmap (11b
and 11a on 2.4 and 5 GHz respectively).

Additionally, hostapd appears to be giving us an empty supported
rates bitmap (it can and should do better, since the STA must have
supported for at least the basic rates in the BSS), so ignore any
such bitmaps that would actually zero out the supp_rates, and in
that case just keep the pre-filled mandatory rates.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

bd718fc1

mac80211: call rate_control_send_low() internally · 1e87fec9

Johannes Berg authored May 16, 2019

There's no rate control algorithm that *doesn't* want to call
it internally, and calling it internally will let us modify
its behaviour in the future.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

1e87fec9

ieee80211: Add a missing extended capability flag definition · cd6f3411

Ilan Peer authored May 29, 2019

Add the "OBSS Narrow Bandwidth RU In OFDMA Tolerance Support" flag
definition to the definitions of the flags covered by the Extended
Capability IE.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cd6f3411

cfg80211: Add a function to iterate all BSS entries · 4770c8f9

Ilan Peer authored May 29, 2019

Add a function that iterates over the BSS entries associated with a
given wiphy and calls a callback for each iterated BSS. This can be
used by drivers in various ways, e.g., to evaluate some property for
all the BSSs in the medium.
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

4770c8f9

mac80211: allow turning TWT responder support on and off via netlink · a0de1ca3

John Crispin authored May 28, 2019

Allow the userland daemon to en/disable TWT support for an AP.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
[simplify parsing code]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

a0de1ca3

mac80211: dynamically enable the TWT requester support on STA interfaces · c9d3245e

John Crispin authored May 28, 2019

Turn TWT for STA interfaces when they associate and/or receive a
beacon where the twt_responder bit has changed.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

c9d3245e

nl80211: require and validate vendor command policy · 901bb989

Johannes Berg authored May 28, 2019

Require that each vendor command give a policy of its sub-attributes
in NL80211_ATTR_VENDOR_DATA, and then (stricly) check the contents,
including the NLA_F_NESTED flag that we couldn't check on the outer
layer because there we don't know yet.

It is possible to use VENDOR_CMD_RAW_DATA for raw data, but then no
nested data can be given (NLA_F_NESTED flag must be clear) and the
data is just passed as is to the command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

901bb989

mac80211: add ieee80211_get_he_iftype_cap() helper · d7edf40c

John Crispin authored May 21, 2019

This function is similar to ieee80211_get_he_sta_cap() but allows passing
the iftype. Also make ieee80211_get_he_sta_cap() use the new helper
rather than duplicating the code.
Signed-off-by: Shashidhar Lakkavalli <slakkavalli@datto.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

d7edf40c

nl80211: add support for SAE authentication offload · 26f7044e

Chung-Hsien Hsu authored May 09, 2019

Let drivers advertise support for station-mode SAE authentication
offload with a new NL80211_EXT_FEATURE_SAE_OFFLOAD flag.
Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com>
Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

26f7044e

nl80211: add WPA3 definition for SAE authentication · cc3e14c2

Chung-Hsien Hsu authored May 09, 2019

Add definition of WPA version 3 for SAE authentication.
Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com>
Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

cc3e14c2

nl80211: add NL80211_ATTR_IFINDEX to port authorized event · f4d75993

Chung-Hsien Hsu authored May 09, 2019

Add NL80211_ATTR_IFINDEX attribute to port authorized event to indicate
the operating interface of the device. Also put NL80211_ATTR_WIPHY
attribute in it to be consistent with the other MLME notifications.
Signed-off-by: Chung-Hsien Hsu <stanley.hsu@cypress.com>
Signed-off-by: Chi-Hsien Lin <chi-hsien.lin@cypress.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

f4d75993

mac80211: AMPDU handling for Extended Key ID · 90cc4bd6

Alexander Wetzel authored May 06, 2019

IEEE 802.11 - 2016 forbids mixing MPDUs with different keyIDs in one
A-MPDU. Drivers supporting A-MPDUs and Extended Key ID must actively
enforce that requirement due to the available two unicast keyIDs.

Allow driver to signal mac80211 that they will not check the keyID in
MPDUs when aggregating them and that they expect mac80211 to stop Tx
aggregation when rekeying a connection using Extended Key ID.
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

90cc4bd6

r8169: improve rtl_coalesce_info · 20023d3e

Heiner Kallweit authored Jun 11, 2019

tp->coalesce_info is used in rtl_coalesce_info() only, so we can
remove this member. In addition replace phy_ethtool_get_link_ksettings
with a direct access to tp->phydev->speed.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20023d3e

r8169: let mdio read functions return -ETIMEDOUT · 9b994b4a

Heiner Kallweit authored Jun 11, 2019

In case of a timeout currently ~0 is returned. Callers often just check
whether a certain bit is set and therefore may behave incorrectly.
So let's return -ETIMEDOUT in case of a timeout.

r8168_phy_ocp_read is used in r8168g_mdio_read only, therefore we can
apply the same change.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b994b4a