Commits · f9e06c45cb28beb30a6a474952ead7da2b8940f3 · nexedi / linux

17 Nov, 2018 39 commits

tuntap: free XDP dropped packets in a batch · f9e06c45

Jason Wang authored Nov 15, 2018

Thanks to the batched XDP buffs through msg_control. Instead of
calling put_page() for each page which involves a atomic operation,
let's batch them by record the last page that needs to be freed and
its refcnt count and free them in a batch.

Testpmd(virtio-user + vhost_net) + XDP_DROP shows 3.8% improvement.

Before: 4.71Mpps
After : 4.89Mpps
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9e06c45

vhost_net: mitigate page reference counting during page frag refill · e4dab1e6

Jason Wang authored Nov 15, 2018

We do a get_page() which involves a atomic operation. This patch tries
to mitigate a per packet atomic operation by maintaining a reference
bias which is initially USHRT_MAX. Each time a page is got, instead of
calling get_page() we decrease the bias and when we find it's time to
use a new page we will decrease the bias at one time through
__page_cache_drain_cache().

Testpmd(virtio_user + vhost_net) + XDP_DROP on TAP shows about 1.6%
improvement.

Before: 4.63Mpps
After:  4.71Mpps
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4dab1e6

Merge branch 'net-sched-gred-introduce-per-virtual-queue-attributes' · b8b9618a

David S. Miller authored Nov 16, 2018

Jakub Kicinski says:

====================
net: sched: gred: introduce per-virtual queue attributes

This series updates the GRED Qdisc.  The Qdisc matches nfp offload very
well, but before we can offload it there are a number of improvements
to make.

First few patches add extack messages to the Qdisc and pass extack
to netlink validation.

Next a new netlink attribute group is added, to allow GRED to be
extended more easily.  Currently GRED passes C structures as attributes,
and even an array of C structs for virtual queue configuration.  User
space has hard coded the expected length of that array, so adding new
fields is not possible.

New two-level attribute group is added:

  [TCA_GRED_VQ_LIST]
    [TCA_GRED_VQ_ENTRY]
      [TCA_GRED_VQ_DP]
      [TCA_GRED_VQ_FLAGS]
      [TCA_GRED_VQ_STAT_*]
    [TCA_GRED_VQ_ENTRY]
      [TCA_GRED_VQ_DP]
      [TCA_GRED_VQ_FLAGS]
      [TCA_GRED_VQ_STAT_*]
    [TCA_GRED_VQ_ENTRY]
       ...

Statistics are dump only. Patch 4 switches the byte counts to be 64 bit,
and patch 5 introduces the new stats attributes for dump.  Patch 6
switches RED flags to be per-virtual queue, and patch 7 allows them
to be dumped and set at virtual queue granularity.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b8b9618a

net: sched: gred: allow manipulating per-DP RED flags · 72111015

Jakub Kicinski authored Nov 14, 2018

Allow users to set and dump RED flags (ECN enabled and harddrop)
on per-virtual queue basis.  Validation of attributes is split
from changes to make sure we won't have to undo previous operations
when we find out configuration is invalid.

The objective is to allow changing per-Qdisc parameters without
overwriting the per-vq configured flags.

Old user space will not pass the TCA_GRED_VQ_FLAGS attribute and
per-Qdisc flags will always get propagated to the virtual queues.

New user space which wants to make use of per-vq flags should set
per-Qdisc flags to 0 and then configure per-vq flags as it
sees fit.  Once per-vq flags are set per-Qdisc flags can't be
changed to non-zero.  Vice versa - if the per-Qdisc flags are
non-zero the TCA_GRED_VQ_FLAGS attribute has to either be omitted
or set to the same value as per-Qdisc flags.

Update per-Qdisc parameters:
per-Qdisc | per-VQ | result
        0 |      0 | all vq flags updated
	0 |  non-0 | error (vq flags in use)
    non-0 |      0 | -- impossible --
    non-0 |  non-0 | all vq flags updated

Update per-VQ state (flags parameter not specified):
   no change to flags

Update per-VQ state (flags parameter set):
per-Qdisc | per-VQ | result
        0 |   any  | per-vq flags updated
    non-0 |      0 | -- impossible --
    non-0 |  non-0 | error (per-Qdisc flags in use)
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

72111015

net: sched: gred: store red flags per virtual queue · 25fc1989

Jakub Kicinski authored Nov 14, 2018

Right now ECN marking and HARD drop (the common RED flags) can only
be configured for the entire Qdisc. In preparation for per-vq flags
store the values in the virtual queue structure. Setting per-vq
flags will only be allowed when no flags are set for the entire Qdisc.
For the new flags we will also make sure undefined bits are 0.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

25fc1989

net: sched: gred: provide a better structured dump and expose stats · 80e22e96

Jakub Kicinski authored Nov 14, 2018

Currently all GRED's virtual queue data is dumped in a single
array in a single attribute. This makes it pretty much impossible
to add new fields. In order to expose more detailed stats add a
new set of attributes. We can now expose the 64 bit value of bytesin
and all the mark stats which were not part of the original design.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80e22e96

net: sched: gred: store bytesin as a 64 bit value · 9f5cd0c8

Jakub Kicinski authored Nov 14, 2018

32 bit counters for bytes are not really going to last long in modern
world.  Make sch_gred count bytes on a 64 bit counter.  It will still
get truncated during dump but follow up patch will add set of new
stat dump attributes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f5cd0c8

net: sched: gred: use extack to provide more details on configuration errors · 4777be08

Jakub Kicinski authored Nov 14, 2018

Add extack messages to -EINVAL errors, to help users identify
their mistakes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4777be08

net: sched: gred: pass extack to nla_parse_nested() · 79c59fe0

Jakub Kicinski authored Nov 14, 2018

In case netlink wants to provide parsing error pass extack
to nla_parse_nested().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

79c59fe0

net: sched: gred: separate error and non-error path in gred_change() · 255f4803

Jakub Kicinski authored Nov 14, 2018

We will soon want to add more code to the non-error path, separate
it from the error handling flow.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

255f4803

selftests: add explicit test for multiple concurrent GRO sockets · 9c549a6b

Paolo Abeni authored Nov 15, 2018

This covers for proper accounting of encap needed static keys
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c549a6b

isdn/hisax: remove set but not used variable 'total' · b24b767f

YueHaibing authored Nov 15, 2018

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/isdn/hisax/hfc_pci.c:277:6: warning:
 variable ‘total’ set but not used [-Wunused-but-set-variable]

It never used since git history start.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b24b767f

udp: fix jump label misuse · 9c480601

Paolo Abeni authored Nov 15, 2018

The commit 60fb9567 ("udp: implement complete book-keeping for
encap_needed") introduced a severe misuse of jump label APIs, which
syzbot, as reported by Eric, was able to exploit.

When multiple sockets/process can concurrently request (and than
disable) the udp encap, we need to track the activation counter with
*_inc()/*_dec() jump label variants, or we can experience bad things
at disable time.

Fixes: 60fb9567 ("udp: implement complete book-keeping for encap_needed")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c480601

etf: Drop all expired packets · 37342bda

Jesus Sanchez-Palencia authored Nov 14, 2018

Currently on dequeue() ETF only drops the first expired packet, which
causes a problem if the next packet is already expired. When this
happens, the watchdog will be configured with a time in the past, fire
straight way and the packet will finally be dropped once the dequeue()
function of the qdisc is called again.

We can save quite a few cycles and improve the overall behavior of the
qdisc if we drop all expired packets if the next packet is expired.
This should allow ETF to recover faster from bad situations. But
packet drops are still a very serious warning that the requirements
imposed on the system aren't reasonable.

This was inspired by how the implementation of hrtimers use the
rb_tree inside the kernel.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

37342bda

etf: Split timersortedlist_erase() · cbeeb8ef

Jesus Sanchez-Palencia authored Nov 14, 2018

This is just a refactor that will simplify the implementation of the
next patch in this series which will drop all expired packets on the
dequeue flow.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cbeeb8ef

etf: Use cached rb_root · 09fd4860

Jesus Sanchez-Palencia authored Nov 14, 2018

ETF's peek() operation is heavily used so use an rb_root_cached instead
and leverage rb_first_cached() which will run in O(1) instead of
O(log n).

Even if on 'timesortedlist_clear()' we could be using rb_erase(), we
choose to use rb_erase_cached(), because if in the future we allow
runtime changes to ETF parameters, and need to do a '_clear()', this
might cause some hard to debug issues.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

09fd4860

etf: Cancel timer if there are no pending skbs · 3fcbdaee

Jesus Sanchez-Palencia authored Nov 14, 2018

There is no point in firing the qdisc watchdog if there are no future
skbs pending in the queue and the watchdog had been set previously.
Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fcbdaee

tcp: clean up STATE_TRACE · 213d7767

Yafang Shao authored Nov 14, 2018

Currently we can use bpf or tcp tracepoint to conveniently trace the tcp
state transition at the run time.
So we don't need to do this stuff at the compile time anymore.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

213d7767

Merge branch 'SMSC95xx-driver-updates' · e119a369

David S. Miller authored Nov 16, 2018

Ben Dooks says:

====================
SMSC95xx driver updates (round 1)

This is a series of a few driver cleanups and some fixups of the code
for the SMSC95XX driver. There have been a few reviews, and the issues
have been fixed so this should be ready for merging.

I will work on the tx-alignment and the other bits of usbnet changes
and produce at least two more patch series for this later.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e119a369

usbnet: smsc95xx: check for csum being in last four bytes · 75938f77

Ben Dooks authored Nov 14, 2018

The manual states that the checksum cannot lie in the last DWORD of the
transmission, so add a basic check for this and fall back to software
checksumming the packet.

This only seems to trigger for ACK packets with no options or data to
return to the other end, and the use of the tx-alignment option makes
it more likely to happen.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

75938f77

usbnet: smsc95xx: fix memcpy for accessing rx-data · 6809d216

Ben Dooks authored Nov 14, 2018

Change the RX code to use get_unaligned_le32() instead of the combo
of memcpy and cpu_to_le32s(&var).
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

6809d216

usbnet: smsc95xx: simplify tx_fixup code · 0c8b2655

Ben Dooks authored Nov 14, 2018

The smsc95xx_tx_fixup is doing multiple calls to skb_push() to
put an 8-byte command header onto the packet. It would be easier
to do one skb_push() and then copy the data in once the push is
done.

We also make the code smaller by using proper unaligned puts for
the header. This merges in the CPU to LE32 conversion as well and
makes the whole sequence easier to understand hopefully.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c8b2655

usbnet: smsc95xx: fix rx packet alignment · 810eeb1f

Ben Dooks authored Nov 14, 2018

The smsc95xx driver already takes into account the NET_IP_ALIGN
parameter when setting up the receive packet data, which means
we do not need to worry about aligning the packets in the usbnet
driver.

Adding the EVENT_NO_IP_ALIGN means that the IPv4 header is now
passed to the ip_rcv() routine with the start on an aligned address.

Tested on Raspberry Pi B3.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

810eeb1f

Merge branch 'dpaa2-eth-add-bql-support' · 9cd821b7

David S. Miller authored Nov 16, 2018

Ioana Ciocoi Radulescu says:

====================
dpaa2-eth: add bql support

The first two patches make minor tweaks to the driver to
simplify bql implementation. The third patch adds the actual
bql support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9cd821b7

dpaa2-eth: bql support · 569dac6a

Ioana Ciocoi Radulescu authored Nov 14, 2018

Add support for byte queue limit.

On NAPI poll, we save the total number of Tx confirmed frames/bytes
and register them with bql at the end of the poll function.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

569dac6a

dpaa2-eth: Update callback signature · dbcdf728

Ioana Ciocoi Radulescu authored Nov 14, 2018

Change the frame consume callback signature:
* the entire FQ structure is passed to the callback instead
of just the queue index
* the NAPI structure can be easily obtained from the channel
it is associated to, so we don't need to pass it explicitly
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dbcdf728

dpaa2-eth: Don't use multiple queues per channel · b0e4f37b

Ioana Ciocoi Radulescu authored Nov 14, 2018

The DPNI object on which we build a network interface has a
certain number of {Rx, Tx, Tx confirmation} frame queues as
resources. The default hardware setup offers one queue of each
type, as well as one DPCON channel, for each core available
in the system.

There are however cases where the number of queues is greater
than the number of cores or channels. Until now, we configured
and used all the frame queues associated with a DPNI, even if it
meant assigning multiple queues of one type to the same channel.

Update the driver to only use a number of queues equal to the
number of channels, ensuring each channel will contain exactly
one Rx and one Tx confirmation queue.

>From the user viewpoint, this change is completely transparent.
Performance wise there is no impact in most scenarios. In case
the number of queues is larger than and not a multiple of the
number of channels, Rx hash distribution offers now better load
balancing between cores, which can have a positive impact on
overall system performance.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b0e4f37b

net: 8021q: move vlan offload registrations into vlan_core · 32764c66

Jiri Pirko authored Nov 13, 2018

Currently, the vlan packet offloads are registered only upon 8021q module
load. However, even without this module loaded, the offloads could be
utilized, for example by openvswitch datapath. As reported by Michael,
that causes 2x to 5x performance improvement, depending on a testcase.

So move the vlan offload registrations into vlan_core and make this
available even without 8021q module loaded.
Reported-by: Michael Shteinbok <michaelsh86@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Michael Shteinbok <michaelsh86@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

32764c66

net/decnet: add missing indentation · 99310e73

Colin Ian King authored Nov 13, 2018

There is a missing indentation before the declaration of port. Add
it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99310e73

net: hns3: fix spelling mistake "failded" -> "failed" · 790cd1a8

Colin Ian King authored Nov 13, 2018

Trivial fix, the spelling of "failded" is incorrect in dev_err and
dev_warn messages. Fix this.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

790cd1a8

net: remove unused skb_send_sock() · 7f600f14

Cong Wang authored Nov 12, 2018

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7f600f14

net: phy: check for implementation of both callbacks in phy_drv_supports_irq · a21ff3c8

Heiner Kallweit authored Nov 12, 2018

Now that the icplus driver has been fixed all PHY drivers supporting
interrupts have both callbacks (config_intr and ack_interrupt)
implemented - as it should be. Therefore phy_drv_supports_irq()
can be changed now to check for both callbacks being implemented.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a21ff3c8

Merge branch 'Remove-VLAN-CFI-overload' · 6551971e

David S. Miller authored Nov 16, 2018

Michał Mirosław says:

====================
Remove VLAN.CFI overload

Fix BPF code/JITs to allow for separate VLAN_PRESENT flag
storage and finally move the flag to separate storage in skbuff.

This is final step to make CLAN.CFI transparent to core Linux
networking stack.

An #ifdef is introduced temporarily to mark fragments masking
VLAN_TAG_PRESENT. This is removed altogether in the final patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6551971e

net: remove VLAN_TAG_PRESENT · 0c4b2d37

Michał Mirosław authored Nov 10, 2018

Replace VLAN_TAG_PRESENT with single bit flag and free up
VLAN.CFI overload. Now VLAN.CFI is visible in networking stack
and can be passed around intact.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c4b2d37

net/bpf_jit: SPARC: split VLAN_PRESENT bit handling from VLAN_TCI · 4b50d231
Michał Mirosław authored Nov 10, 2018
```
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
4b50d231
net/bpf_jit: MIPS: split VLAN_PRESENT bit handling from VLAN_TCI · 3955dec5
Michał Mirosław authored Nov 10, 2018
```
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
3955dec5
net/bpf_jit: PPC: split VLAN_PRESENT bit handling from VLAN_TCI · 4ef3a142
Michał Mirosław authored Nov 10, 2018
```
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
4ef3a142

net/bpf: split VLAN_PRESENT bit handling from VLAN_TCI · 9c212255

Michał Mirosław authored Nov 10, 2018

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c212255

net/skbuff: add macros for VLAN_PRESENT bit · 5109f9fd

Michał Mirosław authored Nov 10, 2018

Wrap VLAN_PRESENT bit using macro like PKT_TYPE_* and CLONED_*,
as used by BPF code.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>

5109f9fd

16 Nov, 2018 1 commit

Merge tag 'batadv-next-for-davem-20181114' of git://git.open-mesh.org/linux-merge · 5aa25c05

David S. Miller authored Nov 15, 2018

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - Bump version strings, by Simon Wunderlich

 - Fixup includes, by Sven Eckelmann (3 patches)

 - Separate BATMAN_ADV_DEBUG from DEBUGFS, by Sven Eckelmann

 - Fixup tracing log documentation, by Sven Eckelmann

 - Use exclusive locks to secure netlink information dump transfers,
   by Sven Eckelmann (8 patches)

 - Move CRC16 dependency, by Sven Eckelmann

 - Enable MCAST by default, by Linus Luessing
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5aa25c05