Commits · 3fd22af808f4d7455ba91596d334438c7ee0f889 · Kirill Smelkov / linux

16 Jun, 2015 17 commits

sock_diag: specify info_size per inet protocol · 3fd22af8

Craig Gallek authored Jun 15, 2015

Previously, there was no clear distinction between the inet protocols
that used struct tcp_info to report information and those that didn't.
This change adds a specific size attribute to the inet_diag_handler
struct which defines these interfaces.  This will make dispatching
sock_diag get_info requests identical for all inet protocols in a
following patch.

Tested: ss -au
Tested: ss -at
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3fd22af8

sock_diag: define destruction multicast groups · eb4cb008

Craig Gallek authored Jun 15, 2015

These groups will contain socket-destruction events for
AF_INET/AF_INET6, IPPROTO_TCP/IPPROTO_UDP.

Near the end of socket destruction, a check for listeners is
performed.  In the presence of a listener, rather than completely
cleanup the socket, a unit of work will be added to a private
work queue which will first broadcast information about the socket
and then finish the cleanup operation.
Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eb4cb008

Merge branch 'mlx4-vf-counters' · 916035dd

David S. Miller authored Jun 15, 2015

Or Gerlitz says:

====================
mlx4 driver update (+ new VF ndo)

This series from Eran and Hadar is further dealing with traffic
counters in the mlx4 driver, this time mostly around SRIOV.

We added a new ndo to read the VF counters through the PF netdev
netlink infrastructure plus mlx4 implementation for that ndo.

changes from V0:
  - applied feedback from John to use nested netlink encoding
    for the VF counters so we can extend it later
  - add handling of single ported VFs in the mlx4_en driver new ndo
  - avoid chopping the FW counters from 64 to 32 bits in mlx4_en PF flow
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

916035dd

net/mlx4_en: Support ndo_get_vf_stats · 62a89055

Eran Ben Elisha authored Jun 15, 2015

Implement the ndo to gather VF statistics through the PF.

All counters related to this VF are stored in a per slave
list, run over the slave's list and collect all statistics.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

62a89055

net/core: Add reading VF statistics through the PF netdevice · 3b766cd8

Eran Ben Elisha authored Jun 15, 2015

Add ndo_get_vf_stats where the PF retrieves and fills the VFs traffic
statistics. We encode the VF stats in a nested manner to allow for
future extensions.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3b766cd8

net/mlx4_en: Show PF own statistics via ethtool · b42de4d0

Eran Ben Elisha authored Jun 15, 2015

Allow the user to observe the PF own statistics using ethtool with pf_
prefixed counter names.

Those counters are the PF statistics out of the overall port statistics.
Every PF QP is attached to a counter and the summary of those counters
is the PF statistics.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b42de4d0

net/mlx4_core: Add helper to query counters · 9616982f

Eran Ben Elisha authored Jun 15, 2015

This is an infrastructure step for querying VF and PF counters.

This code was in the IB driver, move it to the mlx4 core driver
so it will be accessible for more use cases.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9616982f

IB/mlx4: Set VF to read from QP counters · 7193a141

Eran Ben Elisha authored Jun 15, 2015

As IB VFs are not capable to read the port counters through MADs,
move there to read their own QP counters to gather statistics.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7193a141

IB/mlx4: Add RoCE/IB dedicated counters · c3abb51b

Eran Ben Elisha authored Jun 15, 2015

This is an infrastructure step to attach all the QPs opened from the
IB driver to a counter in order to collect VF stats from the PF using
those counters.

If the port's type is Ethernet, the counter policy demands two counters
per port (one for RoCE and one for Ethernet). The port default counter
(allocated in mlx4_core) is used for the Ethernet netdev QPs and we
allocate another counter for RoCE.

If the port's traffic is Infiniband, the counter policy demands
one counter per port, so it can use the port's default counter.

Also, Add 'allocated' flag for each counter in order to clean it at
unload.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3abb51b

net/mlx4_core: Allocate default counter per port · 6de5f7f6

Eran Ben Elisha authored Jun 15, 2015

Default counter per port will be allocated at the mlx4 core driver load.

Every QP opened by the Ethernet driver will be attached to the port's default
counter. This is an infrastructure step to collect VF statistics from the PF.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6de5f7f6

net/mlx4_core: Add port attribute when tracking counters · 68230242

Eran Ben Elisha authored Jun 15, 2015

Counter will get its port attribute within the resource tracker when
the first QP attached to it is modified to RTR. If a QP is counter-less,
an attempt to create a new counter with assigned port will be made.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

68230242

net/mlx4_core: Adjust counter grant policy in the resource tracker · 9de92c60

Eran Ben Elisha authored Jun 15, 2015

Each physical function has a guarantee of two counters per port, one
for a default counter and one for the IB driver.

Each virtual function has a guarantee of one counter per port.
All other counters are free and can be obtained on demand.

This is a preparation step for supporting a get_vf_stats ndo call,
so we can promise a counter for every VF in order to collect their
statistics from the PF context.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9de92c60

net/mlx4_core: Remove counters table allocation from VF flow · 2632d18d

Eran Ben Elisha authored Jun 15, 2015

Since virtual functions get their counters indices allocation from the PF,
allocate counters indices bitmap only in case the function isn't virtual.

Also, check that the device has counters to allocate before creating the
indices bitmap table.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2632d18d

net/mlx4_core: Add sink counter · 47d8417f

Eran Ben Elisha authored Jun 15, 2015

Reserve the last valid counter index for "sink" counter, when a
new counter cannot be allocated, the driver will use this counter.

In order to avoid allocating this counter on any other flow, fix the
indices bitmap allocation range, and reserve the sink counter index.

Add macro for the sink counter index and replace all appearences of the
index with the macro.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47d8417f

net/mlx4_core: Reset counters data when freed · b72ca7e9

Eran Ben Elisha authored Jun 15, 2015

Add resetting the counter data to the free counter flow, so the counter's
data won't be accessible anymore if querying the counter. Also, on next
counter allocation (to another VM for example), it will be fresh and clear.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b72ca7e9

net/mlx4_core: Check before cleaning counters bitmap · efa6bc91

Eran Ben Elisha authored Jun 15, 2015

If counters are not supported by the device. The indices bitmap table is not
allocated during initialization. Add the symmetrical check before cleaning
the counters bitmap table or freeing a counter.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

efa6bc91

bridge: del external_learned fdbs from device on flush or ageout · b4ad7baa

Scott Feldman authored Jun 14, 2015

We need to delete from offload the device externally learnded fdbs when any
one of these events happen:

1) Bridge ages out fdb.  (When bridge is doing ageing vs. device doing
ageing.  If device is doing ageing, it would send SWITCHDEV_FDB_DEL
directly).

2) STP state change flushes fdbs on port.

3) User uses sysfs interface to flush fdbs from bridge or bridge port:

	echo 1 >/sys/class/net/BR_DEV/bridge/flush
	echo 1 >/sys/class/net/BR_PORT/brport/flush

4) Offload driver send event SWITCHDEV_FDB_DEL to delete fdb entry.

For rocker, we can now get called to delete fdb entry in wait and nowait
contexts, so set NOWAIT flag when deleting fdb entry.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4ad7baa

15 Jun, 2015 23 commits

Merge tag 'nfc-next-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next · 023033b1

David S. Miller authored Jun 15, 2015

Samuel Ortiz says:

====================
NFC 4.2 pull request

This is the NFC pull request for 4.2.

- NCI drivers can now define their own handlers for processing
  proprietary NCI responses and notifications.

- NFC vendors can use a dedicated netlink API to send their own
  proprietary commands, like e.g. all commands needed to implement
  vendor specific manufacturing tools.

- A new generic NCI over UART driver against which any NCI chipset
  running on top of a serial interface can register.

- The st21nfcb driver is renamed to st-nci as it can and will support
  most of ST Microelectronics NCI chipsets.

- The st21nfcb driver can put its CLF in hibernate mode and save
  significant amount of power.

- A few st21nfcb minor fixes.

- The NXP NCI driver now supports ACPI enumeration.

- The Marvell NCI driver now supports both USB and serial
  physical interfaces.

- The Marvell NCI drivers also supports NCI frames being muxed
  over HCI. This is a setting that can be defined by a DT property.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

023033b1

Merge branch 'bond-netlink-3ad-attrs' · cadfaf43

David S. Miller authored Jun 15, 2015

Nikolay Aleksandrov says:

====================
bonding: extend the 3ad exported attributes

These are two small patches that export actor_oper_port_state and
partner_oper_port_state via netlink and sysfs, until now they were only
exported via bond's proc entry. If this set gets accepted I have an iproute2
patch prepared that will export them with which I tested these changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cadfaf43

bonding: export slave's partner_oper_port_state via sysfs and netlink · 46ea297e

Nikolay Aleksandrov authored Jun 14, 2015

Export the partner_oper_port_state of each port via sysfs and netlink.
In 802.3ad mode it is valuable for the user to be able to check the
partner_oper state, it is already exported via bond's proc entry.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46ea297e

bonding: export slave's actor_oper_port_state via sysfs and netlink · 254cb6db

Nikolay Aleksandrov authored Jun 14, 2015

Export the actor_oper_port_state of each port via sysfs and netlink.
In 802.3ad mode it is valuable for the user to be able to check the
actor_oper state, it is already exported via bond's proc entry.
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

254cb6db

Merge branch 'rocker-no-wait' · 4d367963

David S. Miller authored Jun 15, 2015

Scott Feldman says:

====================
rocker: revert back to support for nowait processes

One of the items removed from the rocker driver in the Spring Cleanup patch
series was the ability to mark processing in the driver as "no wait" for
those contexts where we cannot sleep.  Turns out, we have "no wait"
contexts where we want to program the device and we don't want to defer the
processing to a process context.  So re-add the ROCKER_OP_FLAG_NOWAIT flag
to mark such processes, and propagate flags to mem allocator and to the
device cmd executor.  With NOWAIT, mem allocs are GFP_ATOMIC and device
cmds are queued to the device, but the driver will not wait (sleep) for the
response back from the device.

My bad for removing NOWAIT support in the first place; I thought we could
swing non-sleep contexts to process context using a work queue, for
example, but there is push-back to keep processing in original context.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4d367963

rocker: move port stop to 'no wait' processing · f66feaa9

Scott Feldman authored Jun 12, 2015

rocker_port_stop can be called from atomic and non-atomic contexts.  Since
we can't test what context we're getting called in, do the processing as
'no wait', which will cover all cases.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f66feaa9

rocker: move MAC learn event back to 'no wait' processing · 92014b97
Scott Feldman authored Jun 12, 2015
```
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
92014b97

rocker: mark STP update as 'no wait' processing · ac28393e

Scott Feldman authored Jun 12, 2015

We can get STP updates from the bridge driver in atomic and non-atomic
contexts.  Since we can't test what context we're getting called in,
do the STP processing as 'no wait', which will cover all cases.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac28393e

rocker: mark neigh update event processing as 'no wait' · 02a9fbfc

Scott Feldman authored Jun 12, 2015

Neigh update event handler runs in a context where we can't sleep, so mark
processing in driver with ROCKER_OP_FLAG_NOWAIT. NOWAIT will use
GFP_ATOMIC for allocations and will queue cmds to the device's cmd ring but
will not wait (sleep) for cmd response back from device.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

02a9fbfc

rocker: revert back to support for nowait processes · 179f9a25

Scott Feldman authored Jun 12, 2015

One of the items removed from the rocker driver in the Spring Cleanup patch
series was the ability to mark processing in the driver as "no wait" for
those contexts where we cannot sleep. Turns out, we have "no wait"
contexts where we want to program the device. So re-add the
ROCKER_OP_FLAG_NOWAIT flag to mark such processes, and propagate flags to
mem allocator and to the device cmd executor. With NOWAIT, mem allocs are
GFP_ATOMIC and device cmds are queued to the device, but the driver will
not wait (sleep) for the response back from the device.

My bad for removing NOWAIT support in the first place; I thought we could
swing non-sleep contexts to process context using a work queue, for
example, but there is push-back to keep processing in original context.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

179f9a25

rocker: fix neigh tbl index increment race · 4d81db41

Scott Feldman authored Jun 12, 2015

rocker->neigh_tbl_next_index is used to generate unique indices for neigh
entries programmed into the device.  The way new indices were generated was
racy with the new prepare-commit transaction model.  A simple fix here
removes the race.  The race was with two processes getting the same index,
one process using prepare-commit, the other not:

Proc A					Proc B

PREPARE phase
get neigh_tbl_next_index

					NONE phase
					get neigh_tbl_next_index
					neigh_tbl_next_index++

COMMIT phase
neigh_tbl_next_index++

Both A and B got the same index.  The fix is to store and increment
neigh_tbl_next_index in the PREPARE (or NONE) phase and use value in COMMIT
phase:

Proc A					Proc B

PREPARE phase
get neigh_tbl_next_index
neigh_tbl_next_index++

					NONE phase
					get neigh_tbl_next_index
					neigh_tbl_next_index++

COMMIT phase
// use value stashed in PREPARE phase
Reported-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d81db41

rocker: gaurd against NULL rocker_port when removing ports · a0720310

Scott Feldman authored Jun 12, 2015

The ports array is filled in as ports are probed, but if probing doesn't
finish, we need to stop only those ports that where probed successfully.
Check the ports array for NULL to skip un-probed ports when stopping.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

a0720310

net: make u64_stats_init() a function · 9464ca65

Eric Dumazet authored Jun 12, 2015

Using a function instead of a macro is cleaner and remove
following W=1 warnings (extract)

In file included from net/ipv6/ip6_vti.c:29:0:
net/ipv6/ip6_vti.c: In function ‘vti6_dev_init_gen’:
include/linux/netdevice.h:2029:18: warning: variable ‘stat’ set but not
used [-Wunused-but-set-variable]
    typeof(type) *stat;   \
                  ^
net/ipv6/ip6_vti.c:862:16: note: in expansion of macro
‘netdev_alloc_pcpu_stats’
  dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
                ^
  CC [M]  net/ipv6/sit.o
In file included from net/ipv6/sit.c:30:0:
net/ipv6/sit.c: In function ‘ipip6_tunnel_init’:
include/linux/netdevice.h:2029:18: warning: variable ‘stat’ set but not
used [-Wunused-but-set-variable]
    typeof(type) *stat;   \
                  ^
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9464ca65

bridge: use either ndo VLAN ops or switchdev VLAN ops to install MASTER vlans · 7f109539

Scott Feldman authored Jun 12, 2015

v2:

Move struct switchdev_obj automatics to inner scope where there used.

v1:

To maintain backward compatibility with the existing iproute2 "bridge vlan"
command, let bridge's setlink/dellink handler call into either the port
driver's 8021q ndo ops or the port driver's bridge_setlink/dellink ops.

This allows port driver to choose 8021q ops or the newer
bridge_setlink/dellink ops when implementing VLAN add/del filtering on the
device.  The iproute "bridge vlan" command does not need to be modified.

To summarize using the "bridge vlan" command examples, we have:

1) bridge vlan add|del vid VID dev DEV

Here iproute2 sets MASTER flag.  Bridge's bridge_setlink/dellink is called.
Vlan is set on bridge for port.  If port driver implements ndo 8021q ops,
call those to port driver can install vlan filter on device.  Otherwise, if
port driver implements bridge_setlink/dellink ops, call those to install
vlan filter to device.  This option only works if port is bridged.

2) bridge vlan add|del vid VID dev DEV master

Same as 1)

3) bridge vlan add|del vid VID dev DEV self

Bridge's bridge_setlink/dellink isn't called.  Port driver's
bridge_setlink/dellink is called, if implemented.  This option works if
port is bridged or not.  If port is not bridged, a VLAN can still be
added/deleted to device filter using this variant.

4) bridge vlan add|del vid VID dev DEV master self

This is a combination of 1) and 3), but will only work if port is bridged.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7f109539

Merge branch 'bpf-share-helpers' · 9f42c8b3

David S. Miller authored Jun 15, 2015

Alexei Starovoitov says:

====================
v1->v2: switched to init_user_ns from current_user_ns as suggested by Andy

Introduce new helpers to access 'struct task_struct'->pid, tgid, uid, gid, comm
fields in tracing and networking.

Share bpf_trace_printk() and bpf_get_smp_processor_id() helpers between
tracing and networking.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9f42c8b3

bpf: let kprobe programs use bpf_get_smp_processor_id() helper · ab1973d3

Alexei Starovoitov authored Jun 12, 2015

It's useful to do per-cpu histograms.
Suggested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab1973d3

bpf: allow networking programs to use bpf_trace_printk() for debugging · 0756ea3e

Alexei Starovoitov authored Jun 12, 2015

bpf_trace_printk() is a helper function used to debug eBPF programs.
Let socket and TC programs use it as well.
Note, it's DEBUG ONLY helper. If it's used in the program,
the kernel will print warning banner to make sure users don't use
it in production.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0756ea3e

bpf: introduce current->pid, tgid, uid, gid, comm accessors · ffeedafb

Alexei Starovoitov authored Jun 12, 2015

eBPF programs attached to kprobes need to filter based on
current->pid, uid and other fields, so introduce helper functions:

u64 bpf_get_current_pid_tgid(void)
Return: current->tgid << 32 | current->pid

u64 bpf_get_current_uid_gid(void)
Return: current_gid << 32 | current_uid

bpf_get_current_comm(char *buf, int size_of_buf)
stores current->comm into buf

They can be used from the programs attached to TC as well to classify packets
based on current task fields.

Update tracex2 example to print histogram of write syscalls for each process
instead of aggregated for all.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ffeedafb

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · ada6c1de

David S. Miller authored Jun 15, 2015

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

This a bit large (and late) patchset that contains Netfilter updates for
net-next. Most relevantly br_netfilter fixes, ipset RCU support, removal of
x_tables percpu ruleset copy and rework of the nf_tables netdev support. More
specifically, they are:

1) Warn the user when there is a better protocol conntracker available, from
   Marcelo Ricardo Leitner.

2) Fix forwarding of IPv6 fragmented traffic in br_netfilter, from Bernhard
   Thaler. This comes with several patches to prepare the change in first place.

3) Get rid of special mtu handling of PPPoE/VLAN frames for br_netfilter. This
   is not needed anymore since now we use the largest fragment size to
   refragment, from Florian Westphal.

4) Restore vlan tag when refragmenting in br_netfilter, also from Florian.

5) Get rid of the percpu ruleset copy in x_tables, from Florian. Plus another
   follow up patch to refine it from Eric Dumazet.

6) Several ipset cleanups, fixes and finally RCU support, from Jozsef Kadlecsik.

7) Get rid of parens in Netfilter Kconfig files.

8) Attach the net_device to the basechain as opposed to the initial per table
   approach in the nf_tables netdev family.

9) Subscribe to netdev events to detect the removal and registration of a
   device that is referenced by a basechain.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ada6c1de

netfilter: nf_tables_netdev: unregister hooks on net_device removal · 835b8033

Pablo Neira Ayuso authored Jun 15, 2015

In case the net_device is gone, we have to unregister the hooks and put back
the reference on the net_device object. Once it comes back, register them
again. This also covers the device rename case.

This patch also adds a new flag to indicate that the basechain is disabled, so
their hooks are not registered. This flag is used by the netdev family to
handle the case where the net_device object is gone. Currently this flag is not
exposed to userspace.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

835b8033

netfilter: nf_tables: add nft_register_basechain() and nft_unregister_basechain() · d8ee8f7c
Pablo Neira Ayuso authored Jun 15, 2015
```
This wrapper functions take care of hook registration for basechains.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
```
d8ee8f7c

netfilter: nf_tables: attach net_device to basechain · 2cbce139

Pablo Neira Ayuso authored Jun 12, 2015

The device is part of the hook configuration, so instead of a global
configuration per table, set it to each of the basechain that we create.

This patch reworks ebddf1a8 ("netfilter: nf_tables: allow to bind table to
net_device").

Note that this adds a dev_name field in the nft_base_chain structure which is
required the netdev notification subscription that follows up in a patch to
handle gone net_devices.
Suggested-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

2cbce139

netfilter: x_tables: remove XT_TABLE_INFO_SZ and a dereference. · 711bdde6

Eric Dumazet authored Jun 15, 2015

After Florian patches, there is no need for XT_TABLE_INFO_SZ anymore :
Only one copy of table is kept, instead of one copy per cpu.

We also can avoid a dereference if we put table data right after
xt_table_info. It reduces register pressure and helps compiler.

Then, we attempt a kmalloc() if total size is under order-3 allocation,
to reduce TLB pressure, as in many cases, rules fit in 32 KB.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

711bdde6