Commits · ca1b17b7e843123f5a1e4c8bd2d7b6596ffe6e93 · nexedi / linux

08 Nov, 2017 5 commits

Merge branch 'hv_netvsc-fix-a-hang-on-channel-mtu-changes' · ca1b17b7

David S. Miller authored Nov 08, 2017

Vitaly Kuznetsov says:

====================
hv_netvsc: fix a hang on channel/mtu changes

It was found that netvsc driver doesn't survive e.g.

test. I was able to identify a hang in guest/host communication, it is
fixed by PATCH1 of this series. PATCH2 is a cosmetic change masking
unneeded messages.

Changes since v1:
- Throw away patches 2 and 3 of the original series as one is unneeded and
  the other is not justified [Eric Dumazet, Stephen Hemminger] so I'm only
  fixing the hang now, the crash doesn't reproduce. Will keep an eye on it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ca1b17b7

hv_netvsc: hide warnings about uninitialized/missing rndis device · b5eb819d

Vitaly Kuznetsov authored Nov 02, 2017

Hyper-V hosts are known to send RNDIS messages even after we halt the
device in rndis_filter_halt_device(). Remove user visible messages
as they are not really useful.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b5eb819d

hv_netvsc: netvsc_teardown_gpadl() split · 0cf73780

Vitaly Kuznetsov authored Nov 02, 2017

It was found that in some cases host refuses to teardown GPADL for send/
receive buffers (probably when some work with these buffere is scheduled or
ongoing). Change the teardown logic to be:
1) Send NVSP_MSG1_TYPE_REVOKE_* messages
2) Close the channel
3) Teardown GPADLs.
This seems to work reliably.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cf73780

net: phy: leds: Add support for "link" trigger · 3928ee64

Maciej S. Szmigiero authored Nov 02, 2017

Currently, we create a LED trigger for any link speed known to a PHY.
These triggers only fire when their exact link speed had been negotiated
(they aren't cumulative, that is, they don't fire for "their or any higher"
link speed).

What we are missing, however, is a trigger which will fire on any link
speed known to the PHY. Such trigger can then be used for implementing a
poor man's substitute of the "link" LED on boards that lack it.
Let's add it.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

3928ee64

net: phy: leds: Refactor "no link" handler into a separate function · ff198cdb

Maciej S. Szmigiero authored Nov 02, 2017

Currently, phy_led_trigger_change_speed() is handling a "no link" condition
like it was some kind of an error (using "goto" to a code at the function
end).

However, having no link at PHY is an ordinary operational state, so let's
handle it in an appropriately named separate function so it is more obvious
what the code is doing.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff198cdb

07 Nov, 2017 1 commit

ipv6: addrconf: fix a lockdep splat · fffcefe9

Eric Dumazet authored Nov 06, 2017

Fixes a case where GFP_ATOMIC allocation must be used instead of
GFP_KERNEL one.

[   54.891146]  lock_acquire+0xb3/0x2f0
[   54.891153]  ? fs_reclaim_acquire.part.60+0x5/0x30
[   54.891165]  fs_reclaim_acquire.part.60+0x29/0x30
[   54.891170]  ? fs_reclaim_acquire.part.60+0x5/0x30
[   54.891178]  kmem_cache_alloc_trace+0x3f/0x500
[   54.891186]  ? cyc2ns_read_end+0x1e/0x30
[   54.891196]  ipv6_add_addr+0x15a/0xc30
[   54.891217]  ? ipv6_create_tempaddr+0x2ea/0x5d0
[   54.891223]  ipv6_create_tempaddr+0x2ea/0x5d0
[   54.891238]  ? manage_tempaddrs+0x195/0x220
[   54.891249]  ? addrconf_prefix_rcv_add_addr+0x1c0/0x4f0
[   54.891255]  addrconf_prefix_rcv_add_addr+0x1c0/0x4f0
[   54.891268]  addrconf_prefix_rcv+0x2e5/0x9b0
[   54.891279]  ? neigh_update+0x446/0xb90
[   54.891298]  ? ndisc_router_discovery+0x5ab/0xf00
[   54.891303]  ndisc_router_discovery+0x5ab/0xf00
[   54.891311]  ? retint_kernel+0x2d/0x2d
[   54.891331]  ndisc_rcv+0x1b6/0x270
[   54.891340]  icmpv6_rcv+0x6aa/0x9f0
[   54.891345]  ? ipv6_chk_mcast_addr+0x176/0x530
[   54.891351]  ? do_csum+0x17b/0x260
[   54.891360]  ip6_input_finish+0x194/0xb20
[   54.891372]  ip6_input+0x5b/0x2c0
[   54.891380]  ? ip6_rcv_finish+0x320/0x320
[   54.891389]  ip6_mc_input+0x15a/0x250
[   54.891396]  ipv6_rcv+0x772/0x1050
[   54.891403]  ? consume_skb+0xbe/0x2d0
[   54.891412]  ? ip6_make_skb+0x2a0/0x2a0
[   54.891418]  ? ip6_input+0x2c0/0x2c0
[   54.891425]  __netif_receive_skb_core+0xa0f/0x1600
[   54.891436]  ? process_backlog+0xac/0x400
[   54.891441]  process_backlog+0xfa/0x400
[   54.891448]  ? net_rx_action+0x145/0x1130
[   54.891456]  net_rx_action+0x310/0x1130
[   54.891524]  ? RTUSBBulkReceive+0x11d/0x190 [mt7610u_sta]
[   54.891538]  __do_softirq+0x140/0xaba
[   54.891553]  irq_exit+0x10b/0x160
[   54.891561]  do_IRQ+0xbb/0x1b0

Fixes: f3d9832e ("ipv6: addrconf: cleanup locking in ipv6_add_addr")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fffcefe9

05 Nov, 2017 34 commits

Merge branch 'eBPF-based-device-cgroup-controller' · 2798b80b

David S. Miller authored Nov 05, 2017

Roman Gushchin says:

====================
eBPF-based device cgroup controller

This patchset introduces an eBPF-based device controller for cgroup v2.

Patches (1) and (2) are a preparational work required to share some code
  with the existing device controller implementation.
Patch (3) is the main patch, which introduces a new bpf prog type
  and all necessary infrastructure.
Patch (4) moves cgroup_helpers.c/h to use them by patch (4).
Patch (5) implements an example of eBPF program which controls access
  to device files and corresponding userspace test.

v3:
  Renamed constants introduced by patch (3) to BPF_DEVCG_*

v2:
  Added patch (1).

v1:
  https://lkml.org/lkml/2017/11/1/363
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

2798b80b

selftests/bpf: add a test for device cgroup controller · 37f1ba09

Roman Gushchin authored Nov 05, 2017

Add a test for device cgroup controller.

The test loads a simple bpf program which logs all
device access attempts using trace_printk() and forbids
all operations except operations with /dev/zero and
/dev/urandom.

Then the test creates and joins a test cgroup, and attaches
the bpf program to it.

Then it tries to perform some simple device operations
and checks the result:

  create /dev/null (should fail)
  create /dev/zero (should pass)
  copy data from /dev/urandom to /dev/zero (should pass)
  copy data from /dev/urandom to /dev/full (should fail)
  copy data from /dev/random to /dev/zero (should fail)
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

37f1ba09

bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/ · 9d1f1594

Roman Gushchin authored Nov 05, 2017

The purpose of this move is to use these files in bpf tests.
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d1f1594

bpf, cgroup: implement eBPF-based device controller for cgroup v2 · ebc614f6

Roman Gushchin authored Nov 05, 2017

Cgroup v2 lacks the device controller, provided by cgroup v1.
This patch adds a new eBPF program type, which in combination
of previously added ability to attach multiple eBPF programs
to a cgroup, will provide a similar functionality, but with some
additional flexibility.

This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebc614f6

device_cgroup: prepare code for bpf-based device controller · ecf8fecb

Roman Gushchin authored Nov 05, 2017

This is non-functional change to prepare the device cgroup code
for adding eBPF-based controller for cgroups v2.

The patch performs the following changes:
1) __devcgroup_inode_permission() and devcgroup_inode_mknod()
   are moving to the device-cgroup.h and converting into static inline.
2) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
   by both existing and new bpf-based implementations.
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecf8fecb

device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants · 67e306fd

Roman Gushchin authored Nov 05, 2017

Rename device type and access type constants defined in
security/device_cgroup.c by adding the DEVCG_ prefix.

The reason behind this renaming is to make them global namespace
friendly, as they will be moved to the corresponding header file
by following patches.
Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

67e306fd

Merge tag 'mlx5-updates-2017-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 488e5b30

David S. Miller authored Nov 05, 2017

Saeed Mahameed says:

====================
mlx5-updates-2017-11-04

This series includes:

From Huy: dscp to priority mapping for Ethernet packet.

===================================================
First six patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.

Firmware interface:
Mellanox firmware provides two control knobs for this feature:
  QPTS register allow changing the trust state between dscp and
  pcp mode. The default is pcp mode. Once in dscp mode, firmware will
  route the packet based on its dscp value if the dscp field exists.

  QPDPM register allow mapping a specific dscp (0 to 63) to a
  specific priority (0 to 7). By default, all the dscps are mapped to
  priority zero.

Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.

The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.

When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.

When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
===================================================

Plus to the dscp series, some small misc changes are include as well:

From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic
From Or Gerlitz, Enlarge the NIC TC offload table size
From Rabie, Initialize destination_flow struct to 0
From Feras, Add inner TTC table to IPoIB flow steering
From Tal, Enable CQE based moderation on TX CQ
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

488e5b30

Merge branch 'nfp-ethtool-and-related-improvements' · bfe26ba9

David S. Miller authored Nov 05, 2017

Simon Horman says:

====================
nfp: ethtool and related improvements

Dirk van der Merwe says:

This patch series throws a couple of loosely related items into a single
series.

Patch 1: Clang compilation fix reported by
  Matthias Kaehlcke <mka@chromium.org>

Patch 2: Driver can now do MAC reinit on load when there has been a
  media override set in the NSP.

Patch 3: Refactor the nfp_app_reprs_set API.

Patch 4: Similar to vNICs, representors must be able to deal with media
  override changes in the NSP.

Patch 5: Since representors can now handle media overrides, we can
  allocate the get/set link ndo's to them.

Patch 6 & 7: Add support for FEC mode modification.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bfe26ba9

nfp: implement ethtool FEC mode settings · 0d087093

Dirk van der Merwe authored Nov 04, 2017

Add support in the driver ethtool ops to modify the NFP FEC modes.

The FEC modes can be set for vNIC associated with physical ports or
for MAC representor netdevs.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d087093

nfp: add helpers for FEC support · b471232e

Dirk van der Merwe authored Nov 04, 2017

Implement helpers to determine and modify FEC modes via the NSP.
The NSP advertises FEC capabilities on a per port basis and provides
support for:
* Auto mode selection
* Reed Solomon
* BaseR
* None/Off
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b471232e

nfp: add get/set link settings ndos to representors · a564d30e

Dirk van der Merwe authored Nov 04, 2017

Since it is now safe to modify link settings for representors, we can
attach the get/set link settings ndos to it. The get/set link settings
are nfp_port based operations.

If a port becomes invalid, the representor will be removed in the same
way a vnic would be.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a564d30e

nfp: resync repr state when port table sync · 5fa27d59

Dirk van der Merwe authored Nov 04, 2017

If the NSP port table has been refreshed, resync the representor state
with the new port information. At the moment, this only entails looking
for invalid ports and killing off representors associated with them.

The repr instance becomes NULL which is safe since the app accessor
function for reprs returns NULL when it cannot access a repr.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5fa27d59

nfp: refactor nfp_app_reprs_set · 51ccc37d

Dirk van der Merwe authored Nov 04, 2017

The criteria that reprs cannot be replaced with another new set of reprs
has been removed. This check is not needed since the only use case that
could exercise this at the moment, would be to modify the number of
SRIOV VFs without first disabling them. This case is explicitly
disallowed in any case and subsequent patches in this series
need to be able to replace the running set of reprs.

All cases where the return code used to be checked for the
nfp_app_reprs_set function have been removed.
As stated above, it is not possible for the current code to encounter a
case where reprs exist and need to be replaced.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

51ccc37d

nfp: make use of MAC reinit · 7717c319

Jakub Kicinski authored Nov 04, 2017

Recent management FW images can perform full reinit of MAC cores
without requiring a reboot.  When loading the driver check if there
are changes pending and if so call NSP MAC reinit.  Full application
FW reload is still required, and all MACs need to be reinited at the
same time (not only the ones which have been reconfigured, and thus
potentially causing disruption to unrelated netdevs) therefore for
now changing MAC config without reloading the driver still remains
future work.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7717c319

nfp: don't depend on compiler constant propagation · 4e595325

Jakub Kicinski authored Nov 04, 2017

Matthias reports:

  nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to
  identify the 'mask' parameter as known to be constant at compile time,
  which is required to use the FIELD_GET() macro.

  The forced inlining does the trick for gcc, but for kernel builds with
  clang it results in undefined symbols:

  drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.o: In function
    `__nfp_eth_set_aneg':

drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x787):
    undefined reference to `__compiletime_assert_492'

drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x7b1):
    undefined reference to `__compiletime_assert_496'

  These __compiletime_assert_xyx() calls would have been optimized away
if
  the compiler had seen 'mask' as a constant.

Add a macro to extract the mask and shift and pass those to
nfp_eth_set_bit_config() separately.
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e595325

tcp: higher throughput under reordering with adaptive RACK reordering wnd · 1f255691

Priyaranjan Jha authored Nov 03, 2017

Currently TCP RACK loss detection does not work well if packets are
being reordered beyond its static reordering window (min_rtt/4).Under
such reordering it may falsely trigger loss recoveries and reduce TCP
throughput significantly.

This patch improves that by increasing and reducing the reordering
window based on DSACK, which is now supported in major TCP implementations.
It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries.

- If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded
  by srtt), since there is possibility that spurious retransmission was
  due to reordering delay longer than reo_wnd.

- Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16)
  no. of successful recoveries (accounts for full DSACK-based loss
  recovery undo). After that, reset it to default (min_rtt/4).

- At max, reo_wnd is incremented only once per rtt. So that the new
  DSACK on which we are reacting, is due to the spurious retx (approx)
  after the reo_wnd has been updated last time.

- reo_wnd is tracked in terms of steps (of min_rtt/4), rather than
  absolute value to account for change in rtt.

In our internal testing, we observed significant increase in throughput,
in scenarios where reordering exceeds min_rtt/4 (previous static value).
Signed-off-by: Priyaranjan Jha <priyarjha@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f255691

Merge branch 'dsa-parsing-stage' · 6c49b5e2

David S. Miller authored Nov 05, 2017

Vivien Didelot says:

====================
net: dsa: parsing stage

When registering a DSA switch, there is basically two stages.

The first stage is the parsing of the switch device, from either device
tree or platform data. It fetches the DSA tree to which it belongs, and
validates its ports. The switch device is then added to the tree, and
the second stage is called if this was the last switch of the tree.

The second stage is the setup of the tree, which validates that the tree
is complete, sets up the routing tables, the default CPU port for user
ports, sets up the switch drivers and finally the master interfaces,
which makes the whole switch fabric functional.

This patch series covers the first parsing stage. It fixes the type of
the switch and tree indexes to unsigned int, simplifies the tree
reference counting and the switch and CPU ports parsing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6c49b5e2

net: dsa: resolve tagging protocol at parse time · 7354fcb0

Vivien Didelot authored Nov 03, 2017

Extend the dsa_port_parse_cpu() function to resolve the tagging protocol
at port parsing time, instead of waiting for the whole tree to be
complete.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7354fcb0

net: dsa: add one port parsing function per type · 06e24d08

Vivien Didelot authored Nov 03, 2017

Add dsa_port_parse_user, dsa_port_parse_dsa and dsa_port_parse_cpu
functions to factorize the code shared by both OF and pdata parsing.

They don't do much for the moment but will be extended later to support
tagging protocol resolution for example.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06e24d08

net: dsa: only check presence of link property · 54df6fa9

Vivien Didelot authored Nov 03, 2017

When parsing a port, simply use of_property_read_bool which checks the
presence of a given property, instead of parsing the link phandle.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

54df6fa9

net: dsa: rework switch parsing · 975e6e32

Vivien Didelot authored Nov 03, 2017

When parsing a switch, we have to identify to which tree it belongs and
parse its ports. Provide two functions to separate the OF and platform
data specific paths.

Also use the of_property_read_variable_u32_array function to parse the
OF member array instead of calling of_property_read_u32_index twice.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

975e6e32

net: dsa: get tree before parsing ports · 0eefe2c1

Vivien Didelot authored Nov 03, 2017

We will need a reference to the dsa_switch_tree when parsing a CPU port,
so fetch it right after parsing the member and before parsing ports.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0eefe2c1

net: dsa: rework switch addition and removal · 6da2a940

Vivien Didelot authored Nov 03, 2017

This patch removes the unnecessary index argument from the
dsa_dst_add_ds and dsa_dst_del_ds functions and renames them to
dsa_tree_add_switch and dsa_tree_remove_switch respectively.

In addition to a more explicit scope, we now check the presence of an
existing switch with the same index directly within dsa_tree_add_switch.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6da2a940

net: dsa: provide a find or new tree helper · 1ca28ec9

Vivien Didelot authored Nov 03, 2017

Rename dsa_get_dst to dsa_tree_find since it doesn't increment the
reference counter, rename dsa_add_dst to dsa_tree_alloc for symmetry
with dsa_tree_free, and provide a convenient dsa_tree_touch function to
find or allocate a new tree.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1ca28ec9

net: dsa: get and put tree reference counting · 65254108

Vivien Didelot authored Nov 03, 2017

Provide convenient dsa_tree_get and dsa_tree_put functions scoping a DSA
tree used to increment and decrement its reference counter, instead of
poking directly its kref structure.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65254108

net: dsa: simplify tree reference counting · 8e5bf975

Vivien Didelot authored Nov 03, 2017

DSA trees have a refcount used to automatically free the dsa_switch_tree
structure once there is no switch devices inside of it.

The refcount is incremented when a switch is added to the tree, and
decremented when it is removed from it.

But because of kref_init, the refcount is also incremented at
initialization, and when looking up the tree from the list for symmetry.

Thus the current code stores the number of switches plus one, and makes
the switch registration more complex.

To simplify the switch registration function, we reset the refcount to
zero after initialization and don't increment it when looking up a tree.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e5bf975

net: dsa: make tree index unsigned · 49463b7f

Vivien Didelot authored Nov 03, 2017

Similarly to a DSA switch and port, rename the tree index from "tree" to
"index" and make it an unsigned int because it isn't supposed to be less
than 0.

u32 is an OF specific data used to retrieve the value and has no need to
be propagated up to the tree index.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

49463b7f

net: dsa: make switch index unsigned · 99feaafc

Vivien Didelot authored Nov 03, 2017

Define the DSA switch index as an unsigned int, because it will never be
less than 0.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99feaafc

liquidio: do not consider packets dropped by network stack as driver Rx dropped · 95248461

Intiyaz Basha authored Nov 03, 2017

netdev->rx_dropped was including packets dropped by napi_gro_receive.
If a packet is dropped by network stack, it should not be counted under
driver Rx dropped.

Made necessary changes to not include network stack drops under
netdev->rx_dropped.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

95248461

tools: bpftool: move p_err() and p_info() from main.h to common.c · 0b1c27db

Quentin Monnet authored Nov 03, 2017

The two functions were declared as static inline in a header file. There
is no particular reason why they should be inlined, they just happened to
remain in the same header file when they were turned from macros to
functions in a precious commit.

Make them non-inlined functions and move them to common.c file instead.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b1c27db

Merge branch 'bpf-add-offload-as-a-first-class-citizen' · 8a3b718a

David S. Miller authored Nov 05, 2017

Jakub Kicinski says:

====================
bpf: add offload as a first class citizen

This series is my stab at what was discussed at a recent IOvisor
bi-weekly call.  The idea is to make the device translator run at
the program load time.  This makes the offload more explicit to
the user space.  It also makes it easy for the device translator
to insert information into the original verifier log.

v2:
 - include linux/bug.h instead of asm/bug.h;
 - rebased on top of Craig's verifier fix (no changes, the last patch
   just removes more code now).  I checked the set doesn't conflict
   with Jiri's, Josef's or Roman's patches, but missed Craig's fix :(
v1:
 - rename the ifindex member on load;
 - improve commit messages;
 - split nfp patches more.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8a3b718a

bpf: remove old offload/analyzer · b37a5306

Jakub Kicinski authored Nov 03, 2017

Thanks to the ability to load a program for a specific device,
running verifier twice is no longer needed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b37a5306

nfp: bpf: move to new BPF program offload infrastructure · c6c580d7

Jakub Kicinski authored Nov 03, 2017

Following steps are taken in the driver to offload an XDP program:

XDP_SETUP_PROG:
 * prepare:
   - allocate program state;
   - run verifier (bpf_analyzer());
   - run translation;
 * load:
   - stop old program if needed;
   - load program;
   - enable BPF if not enabled;
 * clean up:
   - free program image.

With new infrastructure the flow will look like this:

BPF_OFFLOAD_VERIFIER_PREP:
  - allocate program state;
BPF_OFFLOAD_TRANSLATE:
   - run translation;
XDP_SETUP_PROG:
   - stop old program if needed;
   - load program;
   - enable BPF if not enabled;
BPF_OFFLOAD_DESTROY:
   - free program image.

Take advantage of the new infrastructure.  Allocation of driver
metadata has to be moved from jit.c to offload.c since it's now
done at a different stage.  Since there is no separate driver
private data for verification step, move temporary nfp_meta
pointer into nfp_prog.  We will now use user space context
offsets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c6c580d7

nfp: bpf: move translation prepare to offload.c · 9314c442

Jakub Kicinski authored Nov 03, 2017

struct nfp_prog is currently only used internally by the translator.
This means there is a lot of parameter passing going on, between
the translator and different stages of offload.  Simplify things
by allocating nfp_prog in offload.c already.

We will now use kmalloc() to allocate the program area and only
DMA map it for the time of loading (instead of allocating DMA
coherent memory upfront).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9314c442