Commits · 4607f6d26950ffb3c4c8e5b2db42f364f19dd26c · Kirill Smelkov / linux

04 Sep, 2017 22 commits

mlxsw: spectrum_router: Support IPv4 underlay decap · 4607f6d2

Petr Machata authored Sep 02, 2017

Unlike encapsulation, which is represented by a next hop forwarding to
an IPIP tunnel, decapsulation is a type of local route. It is created
for local routes whose prefix corresponds to the local address of one of
offloaded IPIP tunnels. When the tunnel is removed (i.e. all the encap
next hops are removed), the decap offload is migrated back to a trap for
resolution in slow path.

This patch assumes that decap route is already present when encap route
is added. A follow-up patch will fix this issue.

Note that this patch only supports IPv4 underlay. Support for IPv6
underlay will be subject to follow-up work apart from this patchset.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4607f6d2

mlxsw: spectrum_router: Support IPv6 overlay encap · 8f28a309

Petr Machata authored Sep 02, 2017

Add the missing bits to recognize IPv6 next hops as IPIP ones to enable
offloading of IPv6 overlay encapsulation.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8f28a309

mlxsw: spectrum_router: Support IPv4 overlay encap · 1012b9ac

Petr Machata authored Sep 02, 2017

This introduces some common code for tracking of offloaded IP-in-IP
tunnels, and support for offloading IPv4 overlay encapsulating routes in
particular. A follow-up patch will introduce IPv6 overlay as well.

Offloaded tunnels are kept in a linked list of mlxsw_sp_ipip_entry
objects hooked up in mlxsw_sp_router. A network device that represents
the tunnel is used as a key to look up the corresponding IPIP entry.
Note that in the future, more general keying mechanism will be needed,
because parts of the tunnel information can be provided by the route.

IPIP entries are reference counted, because several next hops may end up
using the same tunnel, and we only want to offload it once.

Encapsulation path hooks into next hop handling. Routes that forward to
a tunnel are now considered gateway routes, thus giving them the same
treatment that other remote routes get. An IPIP next hop type is
introduced.

Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. If a tunnel type doesn't match any of the
known tunnel types, the next-hop is not considered an IPIP next hop.

The list of IPIP tunnel types is currently empty, follow-up patches will
add support for GRE. Traffic to IPIP tunnel types that are not
explicitly recognized by the driver traps and is handled in slow path.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1012b9ac

mlxsw: spectrum_router: Make nexthops typed · 35225e47

Petr Machata authored Sep 02, 2017

In the router, some next hops may reference an encapsulating netdevice,
such as GRE or IPIP. To properly offload these next hops, mlxsw needs to
keep track of whether a given next hop is a regular Ethernet entry, or
an IP-in-IP tunneling entry.

To facilitate this book-keeping, add a type field to struct
mlxsw_sp_nexthop. There is, as of this patch, only one next hop type:
MLXSW_SP_NEXTHOP_TYPE_ETH. Follow-up patches will introduce the IP-in-IP
variant.

There are several places where next hops are initialized in the IPv4
path. Instead of replicating the logic at every one of them, factor it
out to a function mlxsw_sp_nexthop4_type_init(). The corresponding fini
is actually protocol-neutral, so put it to mlxsw_sp_nexthop_type_fini(),
but create a corresponding protocoled _fini function that dispatches to
the protocol-neutral one.

The IPv6 path is simpler, but for symmetry with IPv4, create the same
suite of functions with corresponding logic.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35225e47

mlxsw: spectrum_router: Extract mlxsw_sp_rt6_is_gateway() · f6050ee6

Petr Machata authored Sep 02, 2017

IPv6 counterpart of the previous patch: introduce a function to
determine whether a given route is a gateway route.

The new function takes a mlxsw_sp argument which follow-up patches will
use. Thus mlxsw_sp_fib6_entry_type_set() got that argument as well.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6050ee6

mlxsw: spectrum_router: Extract mlxsw_sp_fi_is_gateway() · 9b01451a

Petr Machata authored Sep 02, 2017

For IPv4 IP-in-IP offload, routes that direct traffic to IP-in-IP
devices need to be considered gateway routes as well. That involves a
bit more logic, so extract the current test to a separate function,
where the logic can be later added.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b01451a

mlxsw: spectrum_router: Introduce loopback RIFs · 6ddb7426

Petr Machata authored Sep 02, 2017

When offloading L3 tunnels, an adjacency entry is created that loops the
packet back into the underlay router. Loopback interfaces then hold the
corresponding information and are created for IP-in-IP netdevices.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6ddb7426

mlxsw: spectrum_router: Support FID-less RIFs · 010cadf9

Petr Machata authored Sep 02, 2017

Loopback RIFs, which will be introduced in a follow-up patch, differ
from other RIFs in that they do not have a FID associated with them.

To support this, demote FID allocation from mlxsw_sp_rif_create to
configure op of the existing RIF types, and likewise the FID release
from mlxsw_sp_rif_destroy to deconfigure op.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

010cadf9

mlxsw: spectrum_router: Add mlxsw_sp_ipip_ops · 38ebc0f4

Petr Machata authored Sep 02, 2017

Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. Follow-up patches will use the list to
determine whether a constructed RIF should be a loopback, and to decide
whether a next hop references a tunnel.

The list is currently empty, follow-up patches will add support for GRE.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

38ebc0f4

mlxsw: spectrum_router: Publish mlxsw_sp_l3proto · ff1f06ce

Petr Machata authored Sep 02, 2017

The spectrum_ipip module that will be introduced in the follow-up
patches needs to know the data type.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff1f06ce

mlxsw: reg: Give mlxsw_reg_ratr_pack a type parameter · 89e41982

Petr Machata authored Sep 02, 2017

To support IPIP, the driver needs to be able to construct an IPIP
adjacency. Change mlxsw_reg_ratr_pack to take an adjacency type as an
argument. Adjust the one existing caller.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

89e41982

mlxsw: reg: Extract mlxsw_reg_ritr_mac_pack() · 9571e828

Petr Machata authored Sep 02, 2017

Unlike other interface types, loopback RIFs do not have MAC address. So
drop the corresponding argument from mlxsw_reg_ritr_pack() and move it
to a new function. Call that from callers of mlxsw_reg_ritr_pack.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9571e828

mlxsw: reg: Add Routing Tunnel Decap Properties Register · 1e659ebf

Petr Machata authored Sep 02, 2017

The RTDP register is used for configuring the tunnel decap properties of
NVE and IPinIP.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1e659ebf

mlxsw: reg: Add mlxsw_reg_ralue_act_ip2me_tun_pack() · a43da820

Petr Machata authored Sep 02, 2017

To implement IP-in-IP decapsulation, Spectrum uses LPM entries of type
IP2ME with tunnel validity bit and tunnel pointer set. The necessary
register fields are already available, so add a function to pack the
RALUE as appropriate.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a43da820

mlxsw: reg: Move enum mlxsw_reg_ratr_trap_id · 6c4153b1

Petr Machata authored Sep 02, 2017

This enum is used with reg_ratr_trap_id, so move it next to the register
definition.

While at it, drop the enumerator initializers.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c4153b1

mlxsw: reg: Update RATR to support IP-in-IP tunnels · 7c819de4

Petr Machata authored Sep 02, 2017

So far, adjacencies have always been of type Ethernet (with value of 0),
and thus there was no need to explicitly support RATR type. However to
support IP-in-IP adjacencies, this type and a suite of IP-in-IP-specific
attributes need to be added.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c819de4

mlxsw: reg: Update RITR to support loopback device · 99ae8e3e

Petr Machata authored Sep 02, 2017

Update the register so that loopback RIFs can be created and loopback
properties specified.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

99ae8e3e

Merge branch 'mvpp2-improve-the-mac-address-retrieval-logic' · 45f79291

David S. Miller authored Sep 03, 2017

Antoine Tenart says:

====================
net: mvpp2: improve the mac address retrieval logic

This series aims at fixing the logic behind the MAC address retrieval in the
PPv2 driver. A possible issue is also fixed in patch 3/3 to introduce fallbacks
when the address given in the device tree isn't valid.

Thanks!
Antoine

Since v2:
  - Patch 1/4 from v2 was applied on net (and net was merged in net-next).
  - Rebased on net-next.

Since v1:
  - Rebased onto net (was on net-next).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

45f79291

net: mvpp2: fallback using h/w and random mac if the dt one isn't valid · 688cbaf2

Antoine Tenart authored Sep 02, 2017

When using a mac address described in the device tree, a check is made
to see if it is valid. When it's not, no fallback is defined. This
patches tries to get the mac address from h/w (or use a random one if
the h/w one isn't valid) when the dt mac address isn't valid.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

688cbaf2

net: mvpp2: fix use of the random mac address for PPv2.2 · d2a6e48e

Antoine Tenart authored Sep 02, 2017

The MAC retrieval logic is using a variable to store an h/w stored mac
address and checks this mac against invalid ones before using it. But
the mac address is only read from h/w when using PPv2.1. So when using
PPv2.2 it defaults to its init state.

This patches fixes the logic to only check if the h/w mac is valid when
actually retrieving a mac from h/w.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2a6e48e

net: mvpp2: move the mac retrieval/copy logic into its own function · 3ba8c81e

Antoine Tenart authored Sep 02, 2017

The MAC retrieval has a quite complicated logic (which is broken). Moves
it to its own function to prepare for patches fixing its logic, so that
reviews are easier.
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ba8c81e

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · b63f6044

David S. Miller authored Sep 03, 2017

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree. Basically, updates to the conntrack core, enhancements for
nf_tables, conversion of netfilter hooks from linked list to array to
improve memory locality and asorted improvements for the Netfilter
codebase. More specifically, they are:

1) Add expection to hashes after timer initialization to prevent
   access from another CPU that walks on the hashes and calls
   del_timer(), from Florian Westphal.

2) Don't update nf_tables chain counters from hot path, this is only
   used by the x_tables compatibility layer.

3) Get rid of nested rcu_read_lock() calls from netfilter hook path.
   Hooks are always guaranteed to run from rcu read side, so remove
   nested rcu_read_lock() where possible. Patch from Taehee Yoo.

4) nf_tables new ruleset generation notifications include PID and name
   of the process that has updated the ruleset, from Phil Sutter.

5) Use skb_header_pointer() from nft_fib, so we can reuse this code from
   the nf_family netdev family. Patch from Pablo M. Bermudo.

6) Add support for nft_fib in nf_tables netdev family, also from Pablo.

7) Use deferrable workqueue for conntrack garbage collection, to reduce
   power consumption, from Patch from Subash Abhinov Kasiviswanathan.

8) Add nf_ct_expect_iterate_net() helper and use it. From Florian
   Westphal.

9) Call nf_ct_unconfirmed_destroy only from cttimeout, from Florian.

10) Drop references on conntrack removal path when skbuffs has escaped via
    nfqueue, from Florian.

11) Don't queue packets to nfqueue with dying conntrack, from Florian.

12) Constify nf_hook_ops structure, from Florian.

13) Remove neededlessly branch in nf_tables trace code, from Phil Sutter.

14) Add nla_strdup(), from Phil Sutter.

15) Rise nf_tables objects name size up to 255 chars, people want to use
    DNS names, so increase this according to what RFC 1035 specifies.
    Patch series from Phil Sutter.

16) Kill nf_conntrack_default_on, it's broken. Default on conntrack hook
    registration on demand, suggested by Eric Dumazet, patch from Florian.

17) Remove unused variables in compat_copy_entry_from_user both in
    ip_tables and arp_tables code. Patch from Taehee Yoo.

18) Constify struct nf_conntrack_l4proto, from Julia Lawall.

19) Constify nf_loginfo structure, also from Julia.

20) Use a single rb root in connlimit, from Taehee Yoo.

21) Remove unused netfilter_queue_init() prototype, from Taehee Yoo.

22) Use audit_log() instead of open-coding it, from Geliang Tang.

23) Allow to mangle tcp options via nft_exthdr, from Florian.

24) Allow to fetch TCP MSS from nft_rt, from Florian. This includes
    a fix for a miscalculation of the minimal length.

25) Simplify branch logic in h323 helper, from Nick Desaulniers.

26) Calculate netlink attribute size for conntrack tuple at compile
    time, from Florian.

27) Remove protocol name field from nf_conntrack_{l3,l4}proto structure.
    From Florian.

28) Remove holes in nf_conntrack_l4proto structure, so it becomes
    smaller. From Florian.

29) Get rid of print_tuple() indirection for /proc conntrack listing.
    Place all the code in net/netfilter/nf_conntrack_standalone.c.
    Patch from Florian.

30) Do not built in print_conntrack() if CONFIG_NF_CONNTRACK_PROCFS is
    off. From Florian.

31) Constify most nf_conntrack_{l3,l4}proto helper functions, from
    Florian.

32) Fix broken indentation in ebtables extensions, from Colin Ian King.

33) Fix several harmless sparse warning, from Florian.

34) Convert netfilter hook infrastructure to use array for better memory
    locality, joint work done by Florian and Aaron Conole. Moreover, add
    some instrumentation to debug this.

35) Batch nf_unregister_net_hooks() calls, to call synchronize_net once
    per batch, from Florian.

36) Get rid of noisy logging in ICMPv6 conntrack helper, from Florian.

37) Get rid of obsolete NFDEBUG() instrumentation, from Varsha Rao.

38) Remove unused code in the generic protocol tracker, from Davide
    Caratti.

I think I will have material for a second Netfilter batch in my queue if
time allow to make it fit in this merge window.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b63f6044

03 Sep, 2017 2 commits

net/mlx4_core: fix incorrect size allocation for dev->caps.spec_qps · 942e7e5f

Colin Ian King authored Aug 31, 2017

The current allocation for dev->caps.spec_qps is for the size of the
pointer and not the size of the actual mlx4_spec_qps structure. Fix
this by using the correct size. Also splint allocation over a few
lines to make it cppcheck clean on overly wide lines.

Detected by CoverityScan, CID#1455222 ("Wrong sizeof argument")

Fixes: c73c8b1e ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

942e7e5f

net/mlx4_core: fix memory leaks on error exit path · 542deb88

Colin Ian King authored Aug 31, 2017

The structures hca_param and func_cap are not being kfree'd on an error
exit path causing two memory leaks. Fix this by jumping to the existing
free memory error exit path.

Detected by CoverityScan, CID#1455219, CID#1455224 ("Resource Leak")

Fixes: c73c8b1e ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

542deb88

02 Sep, 2017 16 commits

Merge branch 'hv_netvsc-channel-settings-cleanups-and-fixes' · 32d9b70a

David S. Miller authored Sep 01, 2017

Haiyang Zhang says:

====================
hv_netvsc: cleanups and fixes of channel settings

This patch set cleans up some unused variables, unnecessary checks.
Also fixed some limit checking of channel number.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

32d9b70a

hv_netvsc: Fix the channel limit in netvsc_set_rxfh() · db3cd7af

Haiyang Zhang authored Sep 01, 2017

The limit of setting receive indirection table value should be
the current number of channels, not the VRSS_CHANNEL_MAX.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db3cd7af

hv_netvsc: Simplify the limit check in netvsc_set_channels() · 06be580a

Haiyang Zhang authored Sep 01, 2017

Because of the following code, net->num_tx_queues equals to
VRSS_CHANNEL_MAX, and max_chn is less than or equals to VRSS_CHANNEL_MAX.

netvsc_drv.c:
alloc_etherdev_mq(sizeof(struct net_device_context),
                                VRSS_CHANNEL_MAX);
rndis_filter.c:
net_device->max_chn = min_t(u32, VRSS_CHANNEL_MAX, num_possible_rss_qs);

So this patch removes the unnecessary limit check before comparing
with "max_chn".
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06be580a

hv_netvsc: Simplify num_chn checking in rndis_filter_device_add() · 5c4217d0

Haiyang Zhang authored Sep 01, 2017

The minus one and assignment to a local variable is not necessary.
This patch simplifies it.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c4217d0

hv_netvsc: Clean up an unused parameter in rndis_filter_set_rss_param() · 715e2ec5

Haiyang Zhang authored Sep 01, 2017

This patch removes the parameter, num_queue in
rndis_filter_set_rss_param(), which is no longer in use.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

715e2ec5

net: Add module reference to FIB notifiers · 864150df

Ido Schimmel authored Sep 01, 2017

When a listener registers to the FIB notification chain it receives a
dump of the FIB entries and rules from existing address families by
invoking their dump operations.

While we call into these modules we need to make sure they aren't
removed. Do that by increasing their reference count before invoking
their dump operations and decrease it afterwards.

Fixes: 04b1d4e5 ("net: core: Make the FIB notification chain generic")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

864150df

Merge branch 'netvsc-vf-cleanups' · 9e2cf36d

David S. Miller authored Sep 01, 2017

Stephen Hemminger says:

====================
netvsc: transparent VF related cleanups

The first gets rid of unnecessary ref counting, and second
allows removing hv_netvsc driver even if VF present.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9e2cf36d

netvsc: allow driver to be removed even if VF is present · ec158f77

Stephen Hemminger authored Aug 31, 2017

If VF is attached then can still allow netvsc driver module to
be removed. Just have to make sure and do the cleanup.

Also, avoid extra rtnl round trip when calling unregister.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ec158f77

netvsc: cleanup datapath switch · 9a0c48df

Stephen Hemminger authored Aug 31, 2017

Use one routine for datapath up/down. Don't need to reopen
the rndis layer.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a0c48df

bpf: sockmap update/simplify memory accounting scheme · 90a9631c

John Fastabend authored Sep 01, 2017

Instead of tracking wmem_queued and sk_mem_charge by incrementing
in the verdict SK_REDIRECT paths and decrementing in the tx work
path use skb_set_owner_w and sock_writeable helpers. This solves
a few issues with the current code. First, in SK_REDIRECT inc on
sk_wmem_queued and sk_mem_charge were being done without the peers
sock lock being held. Under stress this can result in accounting
errors when tx work and/or multiple verdict decisions are working
on the peer psock.

Additionally, this cleans up the code because we can rely on the
default destructor to decrement memory accounting on kfree_skb. Also
this will trigger sk_write_space when space becomes available on
kfree_skb() which wasn't happening before and prevent __sk_free
from being called until all in-flight packets are completed.

Fixes: 174a79ff ("bpf: sockmap with sk redirect support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

90a9631c

Merge branch 'net-ubuf_info-refcnt-conversion' · 250b0f78

David S. Miller authored Sep 01, 2017

Eric Dumazet says:

====================
net: ubuf_info.refcnt conversion

Yet another atomic_t -> refcount_t conversion, split in two patches.

First patch prepares the automatic conversion done in the second patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

250b0f78

net: convert (struct ubuf_info)->refcnt to refcount_t · c1d1b437

Eric Dumazet authored Aug 31, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

v2: added the change in drivers/vhost/net.c as spotted
by Willem.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c1d1b437

net: prepare (struct ubuf_info)->refcnt conversion · db5bce32

Eric Dumazet authored Aug 31, 2017

In order to convert this atomic_t refcnt to refcount_t,
we need to init the refcount to one to not trigger
a 0 -> 1 transition.

This also removes one atomic operation in fast path.

v2: removed dead code in sock_zerocopy_put_abort()
as suggested by Willem.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

db5bce32

net: systemport: Correctly set TSB endian for host · 487234cc

Florian Fainelli authored Sep 01, 2017

Similarly to how we configure the RSB (Receive Status Block) we also
need to set the TSB (Transmit Status Block) based on the host endian.
This was missing from the commit indicated below.

Fixes: 389a06bc ("net: systemport: Set correct RSB endian bits based on host")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

487234cc

Merge branch 'inet_diag-TCP-MD5' · d49e3a9f

David S. Miller authored Sep 01, 2017

Ivan Delalande says:

====================
inet_diag: report TCP MD5 signing keys and addresses

Allow userspace to retrieve MD5 signature keys and addresses configured
on TCP sockets through inet_diag.

Thanks to Eric Dumazet and Stephen Hemminger for their useful
explanations and feedback.

v5: - memset the whole netlink payload after it has been nla_reserve-d
      in tcp_diag_put_md5sig (a third memset had to be added for
      tcpm_key so we might as well have just one for entire region).
    - move the nla_total_size call from inet_sk_attr_size to the
      idiag_get_aux_size defined by protocols as they could add multiple
      netlink attributes,
    - add check for net_admin in tcp_diag_get_aux_size.

v4: - add new struct tcp_diag_md5sig to report the data instead of
      tcp_md5sig to avoid wasting 112 bytes on every tcpm_addr,
    - memset tcpm_addr on IPv4 addresses to avoid leaks,
    - style fix in inet_diag_dump_one_icsk.

v3: - rename inet_diag_*md5sig in tcp_diag.c to tcp_diag_* for
      consistency,
    - don't lock the socket in tcp_diag_put_md5sig,
    - add checks on md5sig_count in tcp_diag_put_md5sig to not create
      the netlink attribute if the list is empty, and to avoid overflows
      or memory leaks if the list has changed in the meantime.

v2: - move changes to tcp_diag.c and extend inet_diag_handler to allow
      protocols to provide additional data on INET_DIAG_INFO,
    - lock socket before calling tcp_diag_put_md5sig.

I also have a patch for iproute2/ss to test this change, making it print
this new attribute. I'm planning to polish and send it if this series
gets applied.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d49e3a9f

tcp_diag: report TCP MD5 signing keys and addresses · c03fa9bc

Ivan Delalande authored Aug 31, 2017

Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to
processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is
not possible to retrieve these from the kernel once they have been
configured on sockets.
Signed-off-by: Ivan Delalande <colona@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c03fa9bc