Commits · 9888faefe1327909f3acf34d1feda87a368bb858 · nexedi / linux

13 Sep, 2014 3 commits

net: sched: cls_basic use RCU · 9888faef

John Fastabend authored Sep 12, 2014

Enable basic classifier for RCU.

Dereferencing tp->root may look a bit strange here but it is needed
by my accounting because it is allocated at init time and needs to
be kfree'd at destroy time. However because it may be referenced in
the classify() path we must wait an RCU grace period before free'ing
it. We use kfree_rcu() and rcu_ APIs to enforce this. This pattern
is used in all the classifiers.

Also the hgenerator can be incremented without concern because it
is always incremented under RTNL.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9888faef

net: rcu-ify tcf_proto · 25d8c0d5

John Fastabend authored Sep 12, 2014

rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

25d8c0d5

net: qdisc: use rcu prefix and silence sparse warnings · 46e5da40

John Fastabend authored Sep 12, 2014

Add __rcu notation to qdisc handling by doing this we can make
smatch output more legible. And anyways some of the cases should
be using rcu_dereference() see qdisc_all_tx_empty(),
qdisc_tx_chainging(), and so on.

Also *wake_queue() API is commonly called from driver timer routines
without rcu lock or rtnl lock. So I added rcu_read_lock() blocks
around netif_wake_subqueue and netif_tx_wake_queue.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46e5da40

12 Sep, 2014 8 commits

sunvnet: Avoid sending superfluous LDC messages. · d1015645

Sowmini Varadhan authored Sep 11, 2014

When sending out a burst of packets across multiple descriptors,
it is sufficient to send one LDC "start" trigger for
the first descriptor, so do not send an LDC "start" for every
pass through vnet_start_xmit. Similarly, it is sufficient to send
one "DRING_STOPPED" trigger for the last dring (and if that
fails, hold off and send the trigger later).

Optimizations to the number of LDC messages helps avoid
filling up the LDC channel with superfluous LDC messages
that risk triggering flow-control on the channel,
and also boosts performance.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Raghuram Kothakota <raghuram.kothakota@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1015645

net: axienet: remove unnecessary ether_setup after alloc_etherdev · c706471b

Subbaraya Sundeep Bhatta authored Sep 11, 2014

calling ether_setup is redundant since alloc_etherdev calls
it.
Signed-off-by: Subbaraya Sundeep Bhatta <sbhatta@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c706471b

ethernet: amd: use pr_info_once() · e9c3f99f

Varka Bhadram authored Sep 11, 2014

It will use pr_info_one() to print the version info of the
driver in probe function only once. No need to use the static
variable here.
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>

e9c3f99f

udp: Fix inverted NAPI_GRO_CB(skb)->flush test · 2d8f7e2c

Scott Wood authored Sep 10, 2014

Commit 2abb7cdc ("udp: Add support for doing checksum unnecessary
conversion") caused napi_gro_cb structs with the "flush" field zero to
take the "udp_gro_receive" path rather than the "set flush to 1" path
that they would previously take.  As a result I saw booting from an NFS
root hang shortly after starting userspace, with "server not
responding" messages.

This change to the handling of "flush == 0" packets appears to be
incidental to the goal of adding new code in the case where
skb_gro_checksum_validate_zero_check() returns zero.  Based on that and
the fact that it breaks things, I'm assuming that it is unintentional.

Fixes: 2abb7cdc ("udp: Add support for doing checksum unnecessary conversion")
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d8f7e2c

Merge branch 'sock_queue_err_skb' · c5306726

David S. Miller authored Sep 12, 2014

Alexander Duyck says:

====================
Address reference counting issues with sock_queue_err_skb

After looking over the code for skb_clone_sk after some comments made by
Eric Dumazet I have come to the conclusion that skb_clone_sk is taking the
correct approach in how to handle the sk_refcnt when creating a buffer that
is eventually meant to be returned to the socket via the sock_queue_err_skb
function.

However upon review of other callers I found what I believe to be a
possible reference count issue in the path for handling "wifi ack" packets.
To address this I have applied the same logic that is currently in place so
that the sk_refcnt will be forced to stay at least 1, or we will not
provide an skb to return in the sk_error_queue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c5306726

mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path · bf7fa551

Alexander Duyck authored Sep 10, 2014

There is a possible issue with the use, or lack thereof of sk_refcnt and
sk_wmem_alloc in the wifi ack status functionality.

Specifically if a socket were to request acknowledgements, and the socket
were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc
to reach 0 it would be possible to have sock_queue_err_skb orphan the last
buffer, resulting in __sk_free being called on the socket. After this the
buffer is enqueued on sk_error_queue, however the queue has already been
flushed resulting in at least a memory leak, if not a data corruption.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf7fa551

skb: Add documentation for skb_clone_sk · cab41c47

Alexander Duyck authored Sep 10, 2014

This change adds some documentation to the call skb_clone_sk. This is
meant to help clarify the purpose of the function for other developers.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cab41c47

Revert "ipv4: Clarify in docs that accept_local requires rp_filter." · 72b126a4

Sébastien Barré authored Sep 10, 2014

This reverts commit c801e3cc ("ipv4: Clarify in docs that accept_local requires rp_filter.").
It is not needed anymore since commit 1dced6a8 ("ipv4: Restore accept_local behaviour in fib_validate_source()").
Suggested-by: Julian Anastasov <ja@ssi.bg>
Cc: Gregory Detal <gregory.detal@uclouvain.be>
Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Hannes Frederic Sowa <hannes@redhat.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

72b126a4

10 Sep, 2014 29 commits

net: bpf: only build bpf_jit_binary_{alloc, free}() when jit selected · b954d834

Daniel Borkmann authored Sep 10, 2014

Since BPF JIT depends on the availability of module_alloc() and
module_free() helpers (HAVE_BPF_JIT and MODULES), we better build
that code only in case we have BPF_JIT in our config enabled, just
like with other JIT code. Fixes builds for arm/marzen_defconfig
and sh/rsk7269_defconfig.

====================
kernel/built-in.o: In function `bpf_jit_binary_alloc':
/home/cwang/linux/kernel/bpf/core.c:144: undefined reference to `module_alloc'
kernel/built-in.o: In function `bpf_jit_binary_free':
/home/cwang/linux/kernel/bpf/core.c:164: undefined reference to `module_free'
make: *** [vmlinux] Error 1
====================
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: 738cbe72 ("net: bpf: consolidate JIT binary allocator")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b954d834

Merge branch 'cxgb4-next' · 17fa1f98

David S. Miller authored Sep 10, 2014

Hariprasad Shenai says:

====================
cxgb4: Allow FW size upto 1MB, support for S25FL032P flash and misc. fixes

This patch series adds support to allow FW size upto 1MB, support for S25FL032P
flash. Fix t4_flash_erase_sectors to throw an error, when erase sector aren't in
the flash and also warning message when adapters have flashes less than 2Mb.
Adds device id of new adapter and removes device id of debug adapter.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

17fa1f98

cxgb4/cxgb4vf: Add device ID for new adapter and remove for dbg adapter · 56e03e51
Hariprasad Shenai authored Sep 10, 2014
```
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
56e03e51

cxgb4: Add warning msg when attaching to adapters which have FLASHes smaller than 2Mb · c290607e

Hariprasad Shenai authored Sep 10, 2014

Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c290607e

cxgb4: Fix t4_flash_erase_sectors() to throw an error when requested to erase... · c0d5b8cf

Hariprasad Shenai authored Sep 10, 2014

cxgb4: Fix t4_flash_erase_sectors() to throw an error when requested to erase sectors which aren't in the FLASH

Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0d5b8cf

cxgb4: Add support to S25FL032P flash · fe2ee139

Hariprasad Shenai authored Sep 10, 2014

Add support for Spansion S25FL032P flash
Based on original work by Dimitris Michailidis
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe2ee139

cxgb4: Allow T4/T5 firmware sizes up to 1MB · 60d42bf6

Hariprasad Shenai authored Sep 10, 2014

Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

60d42bf6

tipc: fix sparse warnings · 0fc4dffa

Erik Hugne authored Sep 10, 2014

This fixes the following sparse warnings:
sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static?
Also, the function is changed to return bool upon success, rather than a
potentially freed pointer.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fc4dffa

net: ethernet: arc: Don't free Rockchip resources before disconnect from phy · cf98192d

Romain Perier authored Sep 10, 2014

Free resources before being disconnected from phy and calling core driver is
wrong and should not happen. It avoids a delay of 4-5s caused by the timeout of
phy_disconnect().
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf98192d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 0aac3833

David S. Miller authored Sep 10, 2014

Pablo Neira Ayuso says:

====================
nf-next pull request

The following patchset contains Netfilter/IPVS updates for your
net-next tree. Regarding nf_tables, most updates focus on consolidating
the NAT infrastructure and adding support for masquerading. More
specifically, they are:

1) use __u8 instead of u_int8_t in arptables header, from
   Mike Frysinger.

2) Add support to match by skb->pkttype to the meta expression, from
   Ana Rey.

3) Add support to match by cpu to the meta expression, also from
   Ana Rey.

4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from
   Vytas Dauksa.

5) Fix netnet and netportnet hash types the range support for IPv4,
   from Sergey Popovich.

6) Fix missing-field-initializer warnings resolved, from Mark Rustad.

7) Dan Carperter reported possible integer overflows in ipset, from
   Jozsef Kadlecsick.

8) Filter out accounting objects in nfacct by type, so you can
   selectively reset quotas, from Alexey Perevalov.

9) Move specific NAT IPv4 functions to the core so x_tables and
   nf_tables can share the same NAT IPv4 engine.

10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4.

11) Move specific NAT IPv6 functions to the core so x_tables and
    nf_tables can share the same NAT IPv4 engine.

12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6.

13) Refactor code to add nft_delrule(), which can be reused in the
    enhancement of the NFT_MSG_DELTABLE to remove a table and its
    content, from Arturo Borrero.

14) Add a helper function to unregister chain hooks, from
    Arturo Borrero.

15) A cleanup to rename to nft_delrule_by_chain for consistency with
    the new nft_*() functions, also from Arturo.

16) Add support to match devgroup to the meta expression, from Ana Rey.

17) Reduce stack usage for IPVS socket option, from Julian Anastasov.

18) Remove unnecessary textsearch state initialization in xt_string,
    from Bojan Prtvar.

19) Add several helper functions to nf_tables, more work to prepare
    the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero.

20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from
    Arturo Borrero.

21) Support NAT flags in the nat expression to indicate the flavour,
    eg. random fully, from Arturo.

22) Add missing audit code to ebtables when replacing tables, from
    Nicolas Dichtel.

23) Generalize the IPv4 masquerading code to allow its re-use from
    nf_tables, from Arturo.

24) Generalize the IPv6 masquerading code, also from Arturo.

25) Add the new masq expression to support IPv4/IPv6 masquerading
    from nf_tables, also from Arturo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0aac3833

netfilter: Convert pr_warning to pr_warn · b167a37c

Joe Perches authored Sep 09, 2014

Use the more common pr_warn.

Other miscellanea:

o Coalesce formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b167a37c

iucv: Convert pr_warning to pr_warn · 47c4cfc3

Joe Perches authored Sep 09, 2014

Use the more common pr_warn.
Coalesce formats.
Realign arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47c4cfc3

pktgen: Convert pr_warning to pr_warn · 294a0b7f

Joe Perches authored Sep 09, 2014

Use the more common pr_warn.
Realign arguments.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

294a0b7f

atm: Convert pr_warning to pr_warn · ef423a41

Joe Perches authored Sep 09, 2014

Use the more common pr_warn.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ef423a41

Merge branch 'ipip_sit_gro' · 6618ec6f

David S. Miller authored Sep 09, 2014

Tom Herbert says:

====================
net: enable GRO for IPIP and SIT

This patch sets populates the IPIP and SIT offload structures with
gro_receive and gro_complete functions. This enables use of GRO
for these. Also, fixed a problem in IPv6 where we were not properly
initializing flush_id.

Peformance results are below. Note that these tests were done on bnx2x
which doesn't provide RX checksum offload of IPIP or SIT (i.e. does
not give CHEKCSUM_COMPLETE). Also, we don't get 4-tuple hash for RSS
only 2-tuple in this case so all the packets between two hosts are
winding up on the same queue. Net result is the interrupting CPU is
the bottleneck in GRO (checksumming every packet there).

Testing:

netperf TCP_STREAM between two hosts using bnx2x.

* Before fix

IPIP
  1 connection
    6.53% CPU utilization
    6544.71 Mbps
  20 connections
    13.79% CPU utilization
    9284.54 Mbps

SIT
  1 connection
    6.68% CPU utilization
    5653.36 Mbps
  20 connections
    18.88% CPU utilization
    9154.61 Mbps

* After fix

IPIP
  1 connection
    5.73% CPU utilization
    9279.53 Mbps
  20 connections
    7.14% CPU utilization
    7279.35 Mbps

SIT
  1 connection
    2.95% CPU utilization
    9143.36 Mbps
  20 connections
    7.09% CPU utilization
    6255.3 Mbps
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6618ec6f

sit: Add gro callbacks to sit_offload · 19424e05

Tom Herbert authored Sep 09, 2014

Add ipv6_gro_receive and ipv6_gro_complete to sit_offload to
support GRO.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

19424e05

ipip: Add gro callbacks to ipip offload · 9667e9bb

Tom Herbert authored Sep 09, 2014

Add inet_gro_receive and inet_gro_complete to ipip_offload to
support GRO.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9667e9bb

ipv6: Clear flush_id to make GRO work · 03d56daa

Tom Herbert authored Sep 09, 2014

In TCP gro we check flush_id which is derived from the IP identifier.
In IPv4 gro path the flush_id is set with the expectation that every
matched packet increments IP identifier. In IPv6, the flush_id is
never set and thus is uinitialized. What's worse is that in IPv6
over IPv4 encapsulation, the IP identifier is taken from the outer
header which is currently not incremented on every packet for Linux
stack, so GRO in this case never matches packets (identifier is
not increasing).

This patch clears flush_id for every time for a matched packet in
IPv6 gro_receive. We need to do this each time to overwrite the
setting that would be done in IPv4 gro_receive per the outer
header in IPv6 over Ipv4 encapsulation.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

03d56daa

drivers/net: Convert remaining uses of pr_warning to pr_warn · fe3881cf

Joe Perches authored Sep 09, 2014

Use the much more common pr_warn instead of pr_warning.

Other miscellanea:

o Typo fixes submiting/submitting
o Coalesce formats
o Realign arguments
o Add missing terminating '\n' to formats
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe3881cf

net: use kfree_skb_list() helper in more places · 46cfd725

Florian Westphal authored Sep 10, 2014

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

46cfd725

ipv4: udp4_gro_complete() is static · 72bb17b3

Eric Dumazet authored Sep 09, 2014

net/ipv4/udp_offload.c:339:5: warning: symbol 'udp4_gro_complete' was
not declared. Should it be static?
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Fixes: 57c67ff4 ("udp: additional GRO support")
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

72bb17b3

netns: remove one sparse warning · 416c51e1

Eric Dumazet authored Sep 09, 2014

net/core/net_namespace.c:227:18: warning: incorrect type in argument 1
(different address spaces)
net/core/net_namespace.c:227:18:    expected void const *<noident>
net/core/net_namespace.c:227:18:    got struct net_generic [noderef]
<asn:4>*gen

We can use rcu_access_pointer() here as read-side access to the pointer
was removed at least one grace period ago.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

416c51e1

ipv6: udp6_gro_complete() is static · cc9c668a

Eric Dumazet authored Sep 09, 2014

net/ipv6/udp_offload.c:159:5: warning: symbol 'udp6_gro_complete' was
not declared. Should it be static?
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 57c67ff4 ("udp: additional GRO support")
Cc: Tom Herbert <therbert@google.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cc9c668a

ipv4: rcu cleanup in ip_ra_control() · 8e380f00

Eric Dumazet authored Sep 09, 2014

Remove one sparse warning :
net/ipv4/ip_sockglue.c:328:22: warning: incorrect type in assignment (different address spaces)
net/ipv4/ip_sockglue.c:328:22: expected struct ip_ra_chain [noderef] <asn:4>*next
net/ipv4/ip_sockglue.c:328:22: got struct ip_ra_chain *[assigned] ra

And replace one rcu_assign_ptr() by RCU_INIT_POINTER() where applicable.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e380f00

ipv6: mcast: remove dead debugging defines · cbeddd5d

Daniel Borkmann authored Sep 09, 2014

It's not used anywhere, so just remove these.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cbeddd5d

irda: vlsi_ir: use %*ph specifier · be07b79d

Andy Shevchenko authored Sep 09, 2014

Instead of looping in the code let's use kernel extension to dump small
buffers.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be07b79d

r8152: use usleep_range · 8ddfa077

hayeswang authored Sep 09, 2014

Replace mdelay with usleep_range to avoid busy loop.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ddfa077

net-timestamp: optimize sock_tx_timestamp default path · 67cc0d40

Willem de Bruijn authored Sep 08, 2014

Few packets have timestamping enabled. Exit sock_tx_timestamp quickly
in this common case.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

67cc0d40

net_sched: sfq: remove unused macro · 17448e5f

Florian Westphal authored Sep 08, 2014

not used anymore since ddecf0f4
(net_sched: sfq: add optional RED on top of SFQ).
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17448e5f