Commits · c02b05011fadf8e409e41910217ca689f2fc9d91 · Kirill Smelkov / linux

28 Oct, 2015 11 commits

net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes · c02b0501

Carol L Soto authored Oct 27, 2015

When doing memcpy/memset of EQEs, we should use sizeof struct
mlx4_eqe as the base size and not caps.eqe_size which could be bigger.

If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt
data in the master context.

When using a 64 byte stride, the memcpy copied over 63 bytes to the
slave_eq structure.  This resulted in copying over the entire eqe of
interest, including its ownership bit -- and also 31 bytes of garbage
into the next WQE in the slave EQ -- which did NOT include the ownership
bit (and therefore had no impact).

However, once the stride is increased to 128, we are overwriting the
ownership bits of *three* eqes in the slave_eq struct.  This results
in an incorrect ownership bit for those eqes, which causes the eq to
seem to be full. The issue therefore surfaced only once 128-byte EQEs
started being used in SRIOV and (overarchitectures that have 128/256
byte cache-lines such as PPC) - e.g after commit 77507aa2
"net/mlx4_core: Enable CQE/EQE stride support".

Fixes: 08ff3235 ('mlx4: 64-byte CQE/EQE support')
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c02b0501

net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present · 092bf0fc

Jack Morgenstein authored Oct 27, 2015

We do not set the ins_vlan field to zero when no vlan id is present in the packet.

Since WQEs in the TX ring are not zeroed out between uses, this oversight
could result in having vlan flags present in the WQE ctrl segment when no
vlan is preset.

Fixes: e38af4fa ('net/mlx4_en: Add support for hardware accelerated 802.1ad vlan')
Reported-by: Gideon Naim <gideonn@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

092bf0fc

vhost: fix performance on LE hosts · e407f39a

Michael S. Tsirkin authored Oct 27, 2015

commit 2751c988 ("vhost: cross-endian
support for legacy devices") introduced a minor regression: even with
cross-endian disabled, and even on LE host, vhost_is_little_endian is
checking is_le flag so there's always a branch.

To fix, simply check virtio_legacy_is_little_endian first.

Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e407f39a

bpf: sample: define aarch64 specific registers · 85ff8a43

Yang Shi authored Oct 26, 2015

Define aarch64 specific registers for building bpf samples correctly.
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

85ff8a43

amd-xgbe: Fix race between access of desc and desc index · 20986ed8

Lendacky, Thomas authored Oct 26, 2015

During Tx cleanup it's still possible for the descriptor data to be
read ahead of the descriptor index. A memory barrier is required between
the read of the descriptor index and the start of the Tx cleanup loop.
This allows a change to a lighter-weight barrier in the Tx transmit
routine just before updating the current descriptor index.

Since the memory barrier does result in extra overhead on arm64, keep
the previous change to not chase the current descriptor value. This
prevents the execution of the barrier for each loop performed.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

20986ed8

RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv · 8ce675ff

Sowmini Varadhan authored Oct 26, 2015

Either of pskb_pull() or pskb_trim() may fail under low memory conditions.
If rds_tcp_data_recv() ignores such failures, the application will
receive corrupted data because the skb has not been correctly
carved to the RDS datagram size.

Avoid this by handling pskb_pull/pskb_trim failure in the same
manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and
retry via the deferred call to rds_send_worker() that gets set up on
ENOMEM from rds_tcp_read_sock()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8ce675ff

forcedeth: fix unilateral interrupt disabling in netpoll path · 0b7c8743

Neil Horman authored Oct 26, 2015

Forcedeth currently uses disable_irq_lockdep and enable_irq_lockdep, which in
some configurations simply calls local_irq_disable.  This causes errant warnings
in the netpoll path as in netpoll_send_skb_on_dev, where we disable irqs using
local_irq_save, leading to the following warning:

WARNING: at net/core/netpoll.c:352 netpoll_send_skb_on_dev+0x243/0x250() (Not
tainted)
Hardware name:
netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll
(nv_start_xmit_optimized+0x0/0x860 [forcedeth])
Modules linked in: netconsole(+) configfs ipv6 iptable_filter ip_tables ppdev
parport_pc parport sg microcode serio_raw edac_core edac_mce_amd k8temp
snd_hda_codec_realtek snd_hda_codec_generic forcedeth snd_hda_intel
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore
snd_page_alloc i2c_nforce2 i2c_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod
crc_t10dif pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash
dm_log dm_mod [last unloaded: scsi_wait_scan]
Pid: 1940, comm: modprobe Not tainted 2.6.32-573.7.1.el6.x86_64.debug #1
Call Trace:
 [<ffffffff8107bbc1>] ? warn_slowpath_common+0x91/0xe0
 [<ffffffff8107bcc6>] ? warn_slowpath_fmt+0x46/0x60
 [<ffffffffa00fe5b0>] ? nv_start_xmit_optimized+0x0/0x860 [forcedeth]
 [<ffffffff814b3593>] ? netpoll_send_skb_on_dev+0x243/0x250
 [<ffffffff814b37c9>] ? netpoll_send_udp+0x229/0x270
 [<ffffffffa02e3299>] ? write_msg+0x39/0x110 [netconsole]
 [<ffffffffa02e331b>] ? write_msg+0xbb/0x110 [netconsole]
 [<ffffffff8107bd55>] ? __call_console_drivers+0x75/0x90
 [<ffffffff8107bdba>] ? _call_console_drivers+0x4a/0x80
 [<ffffffff8107c445>] ? release_console_sem+0xe5/0x250
 [<ffffffff8107d200>] ? register_console+0x190/0x3e0
 [<ffffffffa02e71a6>] ? init_netconsole+0x1a6/0x216 [netconsole]
 [<ffffffffa02e7000>] ? init_netconsole+0x0/0x216 [netconsole]
 [<ffffffff810020d0>] ? do_one_initcall+0xc0/0x280
 [<ffffffff810d4933>] ? sys_init_module+0xe3/0x260
 [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
---[ end trace f349c7af88e6a6d5 ]---
console [netcon0] enabled
netconsole: network logging started

Fix it by modifying the forcedeth code to use
disable_irq_nosync_lockdep_irqsavedisable_irq_nosync_lockdep_irqsave instead,
which saves and restores irq state properly.  This also saves us a little code
in the process

Tested by the reporter, with successful restuls

Patch applies to the head of the net tree
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Reported-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b7c8743

openvswitch: Fix skb leak using IPv6 defrag · 6f5cadee

Joe Stringer authored Oct 25, 2015

nf_ct_frag6_gather() makes a clone of each skb passed to it, and if the
reassembly is successful, expects the caller to free all of the original
skbs using nf_ct_frag6_consume_orig(). This call was previously missing,
meaning that the original fragments were never freed (with the exception
of the last fragment to arrive).

Fix this by ensuring that all original fragments except for the last
fragment are freed via nf_ct_frag6_consume_orig(). The last fragment
will be morphed into the head, so it must not be freed yet. Furthermore,
retain the ->next pointer for the head after skb_morph().

Fixes: 7f8a436e ("openvswitch: Add conntrack action")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6f5cadee

ipv6: Export nf_ct_frag6_consume_orig() · 190b8ffb

Joe Stringer authored Oct 25, 2015

This is needed in openvswitch to fix an skb leak in the next patch.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

190b8ffb

openvswitch: Fix double-free on ip_defrag() errors · 74c16618

Joe Stringer authored Oct 25, 2015

If ip_defrag() returns an error other than -EINPROGRESS, then the skb is
freed. When handle_fragments() passes this back up to
do_execute_actions(), it will be freed again. Prevent this double free
by never freeing the skb in do_execute_actions() for errors returned by
ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead.

Fixes: 7f8a436e ("openvswitch: Add conntrack action")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74c16618

fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key · c2229fe1

Alexander Duyck authored Oct 27, 2015

We were computing the child index in cases where the key value we were
looking for was actually less than the base key of the tnode. As a result
we were getting incorrect index values that would cause us to skip over
some children.

To fix this I have added a test that will force us to use child index 0 if
the key we are looking for is less than the key of the current tnode.

Fixes: 8be33e95 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
Reported-by: Brian Rak <brak@gameservers.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2229fe1

27 Oct, 2015 11 commits

Merge branch 'net_of_node_put' · 8fe01296

David S. Miller authored Oct 26, 2015

Julia Lawall says:

====================
add missing of_node_put

The various for_each device_node iterators performs an of_node_get on each
iteration, so a break out of the loop requires an of_node_put.

The complete semantic patch that fixes this problem is
(http://coccinelle.lip6.fr):

// <smpl>
@r@
local idexpression n;
expression e1,e2;
iterator name for_each_node_by_name, for_each_node_by_type,
for_each_compatible_node, for_each_matching_node,
for_each_matching_node_and_match, for_each_child_of_node,
for_each_available_child_of_node, for_each_node_with_property;
iterator i;
statement S;
expression list [n1] es;
@@

(
(
for_each_node_by_name(n,e1) S
|
for_each_node_by_type(n,e1) S
|
for_each_compatible_node(n,e1,e2) S
|
for_each_matching_node(n,e1) S
|
for_each_matching_node_and_match(n,e1,e2) S
|
for_each_child_of_node(e1,n) S
|
for_each_available_child_of_node(e1,n) S
|
for_each_node_with_property(n,e1) S
)
&
i(es,n,...) S
)

@@
local idexpression r.n;
iterator r.i;
expression e;
expression list [r.n1] es;
@@

 i(es,n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
   return n;
|
+  of_node_put(n);
?  return ...;
)
   ...
 }

@@
local idexpression r.n;
iterator r.i;
expression e;
expression list [r.n1] es;
@@

 i(es,n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n

@@
local idexpression r.n;
iterator r.i;
expression e;
identifier l;
expression list [r.n1] es;
@@

 i(es,n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  goto l;
)
   ...
 }
...
l: ... when != n// </smpl>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8fe01296

net: mv643xx_eth: add missing of_node_put · 26b7974d

Julia Lawall authored Oct 25, 2015

for_each_available_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
expression root,e;
local idexpression child;
@@

 for_each_available_child_of_node(root, child) {
   ... when != of_node_put(child)
       when != e = child
(
   return child;
|
+  of_node_put(child);
?  return ...;
)
   ...
 }
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

26b7974d

ath6kl: add missing of_node_put · 81a57703

Julia Lawall authored Oct 25, 2015

for_each_compatible_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
expression e;
local idexpression n;
@@

 for_each_compatible_node(n,...) {
   ... when != of_node_put(n)
       when != e = n
(
   return n;
|
+  of_node_put(n);
?  return ...;
)
   ...
 }
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

81a57703

net: phy: mdio: add missing of_node_put · 02862341

Julia Lawall authored Oct 25, 2015

for_each_available_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
expression root,e;
local idexpression child;
@@

 for_each_available_child_of_node(root, child) {
   ... when != of_node_put(child)
       when != e = child
(
   return child;
|
+  of_node_put(child);
?  return ...;
)
   ...
 }
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

02862341

netdev/phy: add missing of_node_put · 447ed736

Julia Lawall authored Oct 25, 2015

for_each_available_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
local idexpression r.n;
expression r,e;
@@

 for_each_available_child_of_node(r,n) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

447ed736

net: netcp: add missing of_node_put · bd252796

Julia Lawall authored Oct 25, 2015

for_each_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
local idexpression r.n;
expression r,e;
@@

 for_each_child_of_node(r,n) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd252796

net: thunderx: add missing of_node_put · 8c387ebb

Julia Lawall authored Oct 25, 2015

for_each_child_of_node performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
local idexpression r.n;
expression r,e;
@@

 for_each_child_of_node(r,n) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c387ebb

ipv6: gre: support SIT encapsulation · 7e3b6e74

Eric Dumazet authored Oct 24, 2015

gre_gso_segment() chokes if SIT frames were aggregated by GRO engine.

Fixes: 61c1db7f ("ipv6: sit: add GSO/TSO support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7e3b6e74

Merge branch 'sh_eth-fixes' · 3242c9ee

David S. Miller authored Oct 26, 2015

Sergei Shtylyov says:

====================
sh_eth: RX buffer alignment fixes

Here's a set of 2 patches against DaveM's 'net.git' repo which are the
fixes to the RX buffer size calculation.

[1/2] sh_eth: fix RX buffer size alignment
[2/2] sh_eth: fix RX buffer size calculation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3242c9ee

sh_eth: fix RX buffer size calculation · cb368595

Sergei Shtylyov authored Oct 24, 2015

The RX buffer size calulation failed to account for the length granularity
(which is now 32 bytes)...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cb368595

sh_eth: fix RX buffer size alignment · ab857916

Sergei Shtylyov authored Oct 24, 2015

Both Renesas R-Car and RZ/A1 manuals state that RX buffer length must be
a multiple of 32 bytes, while the driver only uses 16 byte granularity...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab857916

26 Oct, 2015 10 commits

Merge branch 'gianfar-fixes' · f7e1b37e

David S. Miller authored Oct 25, 2015

Claudiu Manoil says:

====================
gianfar: Misc. fixes and updates

Various fixes for some older issues, including having a
MAINTAINERS entry for this driver.
I'd recommend applying them on top of net, thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f7e1b37e

MAINTAINERS: Add entry for gianfar ethernet driver · abb1ed7b

Claudiu Manoil authored Oct 23, 2015

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abb1ed7b

gianfar: Fix Rx BSY error handling · 1de65a5e

Claudiu Manoil authored Oct 23, 2015

The Rx BSY error interrupt indicates that a frame was
received and discarded due to lack of buffers, so it's
a rx ring overflow condition and has nothing to do with
with bad rx packets.  Use the right counter.

BSY conditions happen when the SoC is under performance
stress.  Doing *more* work in stress situations by trying
to schedule NAPI is not a good idea as the stressed system
becomes still more stressed.  The Rx interrupt is already
at work making sure the NAPI is scheduled.
So calling gfar_receive() here does not help.  This issue
was present since day 1.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1de65a5e

gianfar: Don't enable the Filer w/o the Parser · 15bf176d

Claudiu Manoil authored Oct 23, 2015

Under one unusual circumstance it's possible to wrongly set
FILREN without enabling PRSDEP as well in the RCTRL register,
against the hardware specifications.  With the default config
this does not happen because the default Rx offloads (Rx csum
and Rx VLAN) properly enable PRSDEP.  But if anyone disables
all these offloads (via ethtool), we get a wrong configuration
were the Rx flow classification and hashing, and other Filer
based features (e.g. wake-on-filer interrupt) won't work.
This patch fixes the issue.
Also, account for Rx FCB insertion which happens every time
PRSDEP is set.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

15bf176d

gianfar: Remove duplicated argument to bitwise OR · 5188f7e5

Claudiu Manoil authored Oct 23, 2015

RQFCR_AND is duplicated.
Add missing space as well.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5188f7e5

Merge branch 'thunderx-fixes' · 30aa7b18

David S. Miller authored Oct 25, 2015

David Daney says:

====================
net: thunderx: Support pass-2 revision hardware.

With the availability of a new revision of the ThunderX NIC hardware a
few changes to the driver are required.  With these, the driver works
on all currently available hardware revisions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

30aa7b18

net: thunderx: Incorporate pass2 silicon CPI index configuration changes · 34411b68

Thanneeru Srinivasulu authored Oct 23, 2015

Add support for ThunderX pass2 CPI and MPI configuration changes.
MPI_ALG is not enabled i.e MCAM parsing is disabled.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

34411b68

net: thunderx: Rewrite silicon revision tests. · 88ed2377

David Daney authored Oct 23, 2015

The test for pass-1 silicon was incorrect, it should be for all
revisions less than 8.  Also the revision is already present in the
pci_dev, so there is no need to read and keep a private copy.

Remove rev_id and code to read it from struct nicpf.  Create new
static inline function pass1_silicon() to be used to testing the
silicon version.  Use pass1_silicon() for revision checks, this will
be more widely used in follow on patches.
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

88ed2377

net: thunderx: Fix incorrect subsystem devid of VF on pass2 silicon · 4e85777f

Sunil Goutham authored Oct 23, 2015

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e85777f

net: thunderx: Remove PF soft reset. · f9bf45e0

Sunil Goutham authored Oct 23, 2015

In some silicon revisions, the soft reset clobbers PCI config space,
so quit doing the reset.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f9bf45e0

23 Oct, 2015 8 commits

net: sysctl: fix a kmemleak warning · ce9d9b8e

Li RongQing authored Oct 23, 2015

the returned buffer of register_sysctl() is stored into net_header
variable, but net_header is not used after, and compiler maybe
optimise the variable out, and lead kmemleak reported the below warning

	comm "swapper/0", pid 1, jiffies 4294937448 (age 267.270s)
	hex dump (first 32 bytes):
	90 38 8b 01 c0 ff ff ff 00 00 00 00 01 00 00 00 .8..............
	01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
	backtrace:
	[<ffffffc00020f134>] create_object+0x10c/0x2a0
	[<ffffffc00070ff44>] kmemleak_alloc+0x54/0xa0
	[<ffffffc0001fe378>] __kmalloc+0x1f8/0x4f8
	[<ffffffc00028e984>] __register_sysctl_table+0x64/0x5a0
	[<ffffffc00028eef0>] register_sysctl+0x30/0x40
	[<ffffffc00099c304>] net_sysctl_init+0x20/0x58
	[<ffffffc000994dd8>] sock_init+0x10/0xb0
	[<ffffffc0000842e0>] do_one_initcall+0x90/0x1b8
	[<ffffffc000966bac>] kernel_init_freeable+0x218/0x2f0
	[<ffffffc00070ed6c>] kernel_init+0x1c/0xe8
	[<ffffffc000083bfc>] ret_from_fork+0xc/0x50
	[<ffffffffffffffff>] 0xffffffffffffffff <<end check kmemleak>>

Before fix, the objdump result on ARM64:
0000000000000000 <net_sysctl_init>:
   0:   a9be7bfd        stp     x29, x30, [sp,#-32]!
   4:   90000001        adrp    x1, 0 <net_sysctl_init>
   8:   90000000        adrp    x0, 0 <net_sysctl_init>
   c:   910003fd        mov     x29, sp
  10:   91000021        add     x1, x1, #0x0
  14:   91000000        add     x0, x0, #0x0
  18:   a90153f3        stp     x19, x20, [sp,#16]
  1c:   12800174        mov     w20, #0xfffffff4                // #-12
  20:   94000000        bl      0 <register_sysctl>
  24:   b4000120        cbz     x0, 48 <net_sysctl_init+0x48>
  28:   90000013        adrp    x19, 0 <net_sysctl_init>
  2c:   91000273        add     x19, x19, #0x0
  30:   9101a260        add     x0, x19, #0x68
  34:   94000000        bl      0 <register_pernet_subsys>
  38:   2a0003f4        mov     w20, w0
  3c:   35000060        cbnz    w0, 48 <net_sysctl_init+0x48>
  40:   aa1303e0        mov     x0, x19
  44:   94000000        bl      0 <register_sysctl_root>
  48:   2a1403e0        mov     w0, w20
  4c:   a94153f3        ldp     x19, x20, [sp,#16]
  50:   a8c27bfd        ldp     x29, x30, [sp],#32
  54:   d65f03c0        ret
After:
0000000000000000 <net_sysctl_init>:
   0:   a9bd7bfd        stp     x29, x30, [sp,#-48]!
   4:   90000000        adrp    x0, 0 <net_sysctl_init>
   8:   910003fd        mov     x29, sp
   c:   a90153f3        stp     x19, x20, [sp,#16]
  10:   90000013        adrp    x19, 0 <net_sysctl_init>
  14:   91000000        add     x0, x0, #0x0
  18:   91000273        add     x19, x19, #0x0
  1c:   f90013f5        str     x21, [sp,#32]
  20:   aa1303e1        mov     x1, x19
  24:   12800175        mov     w21, #0xfffffff4                // #-12
  28:   94000000        bl      0 <register_sysctl>
  2c:   f9002260        str     x0, [x19,#64]
  30:   b40001a0        cbz     x0, 64 <net_sysctl_init+0x64>
  34:   90000014        adrp    x20, 0 <net_sysctl_init>
  38:   91000294        add     x20, x20, #0x0
  3c:   9101a280        add     x0, x20, #0x68
  40:   94000000        bl      0 <register_pernet_subsys>
  44:   2a0003f5        mov     w21, w0
  48:   35000080        cbnz    w0, 58 <net_sysctl_init+0x58>
  4c:   aa1403e0        mov     x0, x20
  50:   94000000        bl      0 <register_sysctl_root>
  54:   14000004        b       64 <net_sysctl_init+0x64>
  58:   f9402260        ldr     x0, [x19,#64]
  5c:   94000000        bl      0 <unregister_sysctl_table>
  60:   f900227f        str     xzr, [x19,#64]
  64:   2a1503e0        mov     w0, w21
  68:   f94013f5        ldr     x21, [sp,#32]
  6c:   a94153f3        ldp     x19, x20, [sp,#16]
  70:   a8c37bfd        ldp     x29, x30, [sp],#48
  74:   d65f03c0        ret

Add the possible error handle to free the net_header to remove the
kmemleak warning
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ce9d9b8e

ppp: fix pppoe_dev deletion condition in pppoe_release() · 1acea4f6

Guillaume Nault authored Oct 22, 2015

We can't rely on PPPOX_ZOMBIE to decide whether to clear po->pppoe_dev.
PPPOX_ZOMBIE can be set by pppoe_disc_rcv() even when po->pppoe_dev is
NULL. So we have no guarantee that (sk->sk_state & PPPOX_ZOMBIE) implies
(po->pppoe_dev != NULL).
Since we're releasing a PPPoE socket, we want to release the pppoe_dev
if it exists and reset sk_state to PPPOX_DEAD, no matter the previous
value of sk_state. So we can just check for po->pppoe_dev and avoid any
assumption on sk->sk_state.

Fixes: 2b018d57 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

1acea4f6

af_key: fix two typos · f6b8dec9

Li RongQing authored Oct 22, 2015

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6b8dec9

amd-xgbe: Use wmb before updating current descriptor count · 20a41fba

Lendacky, Thomas authored Oct 21, 2015

The code currently uses the lightweight dma_wmb barrier before updating
the current descriptor count. Under heavy load, the Tx cleanup routine
was seeing the updated current descriptor count before the updated
descriptor information. As a result, the Tx descriptor was being cleaned
up before it was used because it was not "owned" by the hardware yet,
resulting in a Tx queue hang.

Using the wmb barrier insures that the descriptor is updated before the
descriptor counter preventing the Tx queue hang. For extra insurance,
the Tx cleanup routine is changed to grab the current decriptor count on
entry and uses that initial value in the processing loop rather than
trying to chase the current value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

20a41fba

net/phy: micrel: Add workaround for bad autoneg · d2fd719b

Nathan Sullivan authored Oct 21, 2015

Very rarely, the KSZ9031 will appear to complete autonegotiation, but
will drop all traffic afterwards.  When this happens, the idle error
count will read 0xFF after autonegotiation completes.  Reset the PHY
when in that state.
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2fd719b

Merge branch 'ipv6-overflow-arith' · ec3661b4

David S. Miller authored Oct 23, 2015

Hannes Frederic Sowa says:

====================
overflow-arith: begin to add support for overflow builtins functions

I add a new header, linux/overflow-arith.h, as the central place to add
overflow and wrap-around checking functions. The reason I am doing so
is that it can make use of compiler supported builtin functions which
can leverage hardware.

As I need this for a fix in the ipv6 stack, which is also included in
this series, I propose to add it sooner than later over Davem's net
tree. This is also the reason why I start slowly with only the one
function I need at this time.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ec3661b4

ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues · b72a2b01

Hannes Frederic Sowa authored Oct 16, 2015

Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.

Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b72a2b01

overflow-arith: begin to add support for overflow builtin functions · 79907146

Hannes Frederic Sowa authored Oct 16, 2015

The idea of the overflow-arith.h header is to collect overflow checking
functions in one central place.

If gcc compiler supports the __builtin_overflow_* builtins we use them
because they might give better performance, otherwise the code falls
back to normal overflow checking functions.

The builtin_overflow functions are supported by gcc-5 and clang. The
matter of supporting clang is to just provide a corresponding
CC_HAVE_BUILTIN_OVERFLOW, because the specific overflow checking builtins
don't differ between gcc and clang.

I just provide overflow_usub function here as I intend this to get merged
into net, more functions will definitely follow as they are needed.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

79907146