Commits · 061c1a6e367855a9ed1110ba059bc2e7634fd429 · Kirill Smelkov / linux

27 Feb, 2015 12 commits

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net · 061c1a6e

David S. Miller authored Feb 27, 2015

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-02-26

This series contains fixes for i40e and i40evf only.

Alexey Khoroshilov found a possible leak of 'cmd_buf' when copy_from_user()
failed in i40e_dbg_command_write(), so resolved by calling kfree().

Shannon provides a fix to ensure the shift and bitwise precedences do not
work backwards for us by adding parans.  Fixed the driver by preventing
the driver from allowing stray interrupts or causing system logs from
un-handled interrupts by combining the ICR0 shutdown with the standard
interrupt shutdown and add the interrupt clearing to the PCI shutdown
path.  Fixed an issue where a NVM write times out before a transaction
can complete, so Shannon added logic to make another attempt by
reacquiring the semaphore, then retry the write, if the one retry fails,
we will then give up.  Adds checks to pointers before their use to ensure
we do not try to dereference NULL pointers when returning values from the
AdminQ calls.

Akeem adds a check to bail out if the device is already down when checking
for Tx hang subtask.

Anjali fixes TSO with more than 8 frags per segment issue.  The hardware
has some limitations which the driver needs to adhere to:
  1) no more than 8 descriptors per packet on the wire
  2) no header can span more than 3 descriptors
If one of these events happens, the hardware will generate an internal
error and freeze the Tx queue, so Anjali fixes this by linearizes the skb
to avoid these situations.  Fixed an issue where the per Traffic Class
queue count was higher than queues enabled, which will fix a warning
with multiple function mode where systems regularly have more cores than
vectors.  Fixed TCP/IPv6 over VXLAN Tx checksum offload, where we were
checking the outer protocol flags and deciding the flow for the inner
header.

Jesse fixes a race condition in the transmit hang detection.  Before we
were having issues of false Tx hang detection, no the driver makes more
direct with the checks for progress forward by directly checking the head
write back address and tail register when determining progress.  This
avoids Tx hangs where the software gets behind, because we are directly
checking hardware state when determining a hang state.

Neerav fixes the transmit ring Qset handle when DCB reconfigures. The issue
was when DCB is reconfigured to a single traffic class (TC) and the driver
did not reset the Tx ring Qset handle to correct the mapping, which caused
the Tx queue to disable timeouts.  Also as part of DCB reconfiguration flow
if the Tx queue disable times out, then issue a PF reset to do some level
of recovery.

Mitch stops flow director on shutdown because, in some cases, the hardware
would continue to try to access the FDIR ring after entering D3Hot state,
which would cause either PCIe errors or NMIs, depending upon the system
configuration.

* NOTE * I have verified that this series of patches for net will not cause
any merge issues when you sync up your net tree with your net-next tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

061c1a6e

amd-xgbe: Request IRQs only after driver is fully setup · c30e76a7

Lendacky, Thomas authored Feb 25, 2015

It is possible that the hardware may not have been properly shutdown
before this driver gets control, through use by firmware, for example.
Until the driver is loaded, interrupts associated with the hardware
could go pending. When the IRQs are requested napi support has not
been initialized yet, but the ISR will get control and schedule napi
processing resulting in a kernel panic because the poll routine has not
been set.

Adjust the code so that the driver is fully ready to handle and process
interrupts as soon as the IRQs are requested. This involves requesting
and freeing IRQs during start and stop processing and ordering the napi
add and delete calls appropriately.

Also adjust the powerup and powerdown routines to match the start and
stop routines in regards to the ordering of tasks, including napi
related calls.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c30e76a7

net: asix: add support for the Sitecom LN-028 USB adapter · 7488c3e3

Luca Ceresoli authored Feb 26, 2015

Just another AX88178-based 10/100/1000 USB-to-Ethernet dongle. This one
shows up in lsusb as: "Sitecom Europe B.V. LN-028 Network USB 2.0 Adapter".
Signed-off-by: Luca Ceresoli <luca@lucaceresoli.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-usb@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

7488c3e3

Merge branch 'rhashtable' · c0eebfa3

David S. Miller authored Feb 27, 2015

Daniel Borkmann says:

====================
rhashtable updates

As discussed, I'm sending out rhashtable fixups for -net.

I have a couple of more patches I was working on last week pending,
i.e. to get rid of ht->nelems and ht->shift atomic operations which
speed-up pure insertions/deletions, e.g. on my laptop I have 2 threads,
inserting 7M entries each, that will reduce insertion time from ~1,450 ms
to 865 ms (performance should even be better after removing the
grow/shrink indirections). I guess that however is rather something
for net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c0eebfa3

rhashtable: remove indirection for grow/shrink decision functions · 4c4b52d9

Daniel Borkmann authored Feb 25, 2015

Currently, all real users of rhashtable default their grow and shrink
decision functions to rht_grow_above_75() and rht_shrink_below_30(),
so that there's currently no need to have this explicitly selectable.

It can/should be generic and private inside rhashtable until a real
use case pops up. Since we can make this private, we'll save us this
additional indirection layer and can improve insertion/deletion time
as well.

Reference: http://patchwork.ozlabs.org/patch/443040/Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c4b52d9

rhashtable: unconditionally grow when max_shift is not specified · 8331de75

Daniel Borkmann authored Feb 25, 2015

While commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for
worker queue") rightfully moved part of the decision making of
whether we should expand or shrink from the expand/shrink functions
themselves into insert/delete functions in order to avoid unnecessary
worker wake-ups, it however introduced a regression by doing so.

Before that change, if no max_shift was specified (= 0) on rhashtable
initialization, rhashtable_expand() would just grow unconditionally
and lets the available memory be the limiting factor. After that
change, if no max_shift was specified, there would be _no_ expansion
step at all.

Given that netlink and tipc have a max_shift specified, it was not
visible there, but Josh Hunt reported that if nft that starts out
with a default element hint of 3 if not otherwise provided, would
slow i.e. inserts down trememdously as it cannot grow larger to
relax table occupancy.

Given that the test case verifies shrinks/expands manually, we also
must remove pointer to the helper functions to explicitly avoid
parallel resizing on insertions/deletions. test_bucket_stats() and
test_rht_lookup() could also be wrapped around rhashtable mutex to
explicitly synchronize a walk from resizing, but I think that defeats
the actual test case which intended to have explicit test steps,
i.e. 1) inserts, 2) expands, 3) shrinks, 4) deletions, with object
verification after each stage.
Reported-by: Josh Hunt <johunt@akamai.com>
Fixes: c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker queue")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ying Xue <ying.xue@windriver.com>
Cc: Josh Hunt <johunt@akamai.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

8331de75

vhost: drop hard-coded num_buffers size · 0d79a493

Michael S. Tsirkin authored Feb 25, 2015

The 2 that we use for copy_to_iter comes from sizeof(u16),
it used to be that way before the iov iter update.
Fix it up, making it obvious the size of stack access
is right.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d79a493

vhost: cleanup iterator update logic · 4c5a8442

Michael S. Tsirkin authored Feb 25, 2015

Recent iterator-related changes in vhost made it
harder to follow the logic fixing up the header.
In fact, the fixup always happens at the same
offset: sizeof(virtio_net_hdr): sometimes the
fixup iterator is updated by copy_to_iter,
sometimes-by iov_iter_advance.

Rearrange code to make this obvious.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4c5a8442

rocker: silence shift wrapping warning · 5f2ebfbe

Dan Carpenter authored Feb 25, 2015

"val" is declared as a u64 so static checkers complain that this shift
can wrap.  I don't have the hardware but probably it's doesn't have over
31 ports.  Still we may as well silence the warning even if it's not a
real bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f2ebfbe

rocker: add a check for NULL in rocker_probe_ports() · e65ad3be

Dan Carpenter authored Feb 25, 2015

Make sure kmalloc() succeeds.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

e65ad3be

cxgb4: Fix PCI-E Memory window interface for big-endian systems · f01aa633

Hariprasad Shenai authored Feb 25, 2015

When doing reads and writes to adapter memory via the PCI-E Memory Window
interface, data gets swizzled on 4-byte boundaries on Big-Endian systems
because we need to account for the register read/write interface which
incorporates a swizzle onto the Little-Endian PCI-E Bus.

Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f01aa633

enic: do notify_check before returning credits · 2b0c2e2d

Sujith Sankar authored Feb 25, 2015

We should complete notify_check before returning the credits. Once we return the
credits, adaptor may access the notify data.
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b0c2e2d

26 Feb, 2015 13 commits

i40e: check pointers before use · 65d13461

Shannon Nelson authored Feb 21, 2015

Make sure we don't try to dereference NULL pointers when returning values
from the AdminQ calls.

Change-ID: Ia6694f2f415d50acf0aba063c863568742799aff
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

65d13461

i40e: catch NVM write semaphore timeout and retry · 2c47e351

Shannon Nelson authored Feb 21, 2015

In some circumstances, a multi-write transaction takes longer than the
default 3 minute timeout on the write semaphore. If the write failed with
an EBUSY status, this is likely the problem, so here we try to reacquire
the semaphore then retry the write. We only do one retry, then give up.

Change-ID: I1c8be60688acc2f39573839579baf601207c4a36
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

2c47e351

i40e: stop flow director on shutdown · 33c62b34

Mitch A Williams authored Feb 21, 2015

In some cases, the hardware would continue to try to access the FDIR
ring after entering D3Hot state, which would cause either PCIe errors or
NMIs, depending upon system configuration.

Explicitly stop FDIR in our shutdown routine to eliminate this
possibility.

Change-ID: I1bd9fc7fd8f151fe24cad132ac9adddab923e3af
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

33c62b34

i40e: disconnect irqs on shutdown · e147758d

Shannon Nelson authored Feb 21, 2015

Combine the ICR0 shutdown with the standard interrupt shutdown, and
add the interrupt clearing to the PCI shutdown path.

This prevents the driver from allowing stray interrupts or causing
system logs from un-handled interrupts.

Change-ID: I48f6ab95cad7f8ca77c1f26c92a51cc1034ced43
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e147758d

i40evf: TCP/IPv6 over Vxlan Tx checksum offload fix · 85e76d03

Anjali Singhai authored Feb 21, 2015

We were checking the outer Protocol flags and deciding the flow for
inner header. This patch fixes that.
This fixes the Tx checksum offload for TCP/IPv6 over vxlan.

Change-ID: I837aaea921d34f71b24c2bc32aaadea5001ddf78
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

85e76d03

i40e: Issue a PF reset if Tx queue disable timeout · 11e47708

Parikh, Neerav authored Feb 21, 2015

As part of DCB reconfiguration flow if the Tx queue disable times out
then issue a PF reset to do some level of recovery.

Change-ID: I7550021c55bff355351c0365e61e1f05fcaff46d
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

11e47708

i40e: Fix the Tx ring qset handle when DCB reconfigures · cd238a3e

Parikh, Neerav authored Feb 21, 2015

When DCB is reconfigured to single TC the driver did not reset the
Tx ring Qset handle to the correct mapping; which caused Tx queue
disable timeouts.

Change-ID: I4da5915ec92a83c281b478d653fae6ef1b72edfe
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

cd238a3e

i40e: Fix the case where per TC queue count was higher than queues enabled · 7f9ff476

Anjali Singhai authored Feb 21, 2015

When the driver or hardware gets less interrupt vectors than the actual
number of CPU cores, limit the queue count for the priority queue
traffic class (TC) queues.

This will fix a warning with multiple function mode where systems
regularly have more cores than vectors.

Also add extra comment for readability.

Change-ID: I4f02226263aa3995e1f5ee5503eac0cd6ee12fbd
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young  <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

7f9ff476

i40e: fix race in hang check · a68de58d

Jesse Brandeburg authored Feb 24, 2015

The driver was having some issues with false Tx hang detection. This
makes the driver a little more direct with the checks for progress
forward by directly checking the head write back address and tail register
when determining progress.  This avoids Tx hangs where the software
gets behind, because we are directly checking hardware state when
determining hang state.

Change-ID: I774f0e861c9e8ab5ccb213634100fe15440ae24a
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

a68de58d

i40e: Fix TSO with more than 8 frags per segment issue · 71da6197

Anjali Singhai authored Feb 21, 2015

The hardware has some limitations the driver needs to adhere to,
that we found in extended testing.
  1) no more than 8 descriptors per packet on the wire
  2) no header can span more than 3 descriptors

If one of these events occurs, the hardware will generate an internal
error and freeze the Tx queue.

This patch linearizes the skb to avoid these situations.

Change-ID: I37dab7d3966e14895a9663ec4d0aaa8eb0d9e115
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

71da6197

i40e: Don't check for Tx hang when PF down · b67a0335

Akeem G Abodunrin authored Feb 21, 2015

This patch adds check to bail out if device is already down when checking
for Tx hang subtask.

Change-ID: I3853fb7a6d11cb9a4c349b687cb25c15b19977a0
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

b67a0335

i40e: fix shift precedence issue · de78fc5a

Shannon Nelson authored Feb 21, 2015

Add parens to make sure the shift and bitwise precedences don't work backwards
for us.

Change-ID: I60c10ef4fad6bc654522b9d8a53da2e270a0f268
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

de78fc5a

i40e: Fix memory leak at failure path in i40e_dbg_command_write() · dda094a3

Alexey Khoroshilov authored Feb 21, 2015

The patch fixes a leak of 'cmd_buf' when copy_from_user() failed
in i40e_dbg_command_write().

Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

dda094a3

25 Feb, 2015 2 commits

MAINTAINERS: update my email address · 31639b94

Andy Gospodarek authored Feb 25, 2015

I have been signing off on patches with this address so I'll change it.
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31639b94

amd-xgbe-phy: PHY KX/KR mode differences · 74ad7524

Tom Lendacky authored Feb 24, 2015

The PHY requires different settings for the Decision Feedback Analyzer
(DFE) when running in KX mode vs. KR mode. Update the code to change
these settings when changing modes in order to provide a more stable
link.

Additionally, adjust the 10GbE PQ skew default setting to a more sane
value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74ad7524

24 Feb, 2015 4 commits

r8169: Fix trivial typo in rtl_check_firmware · 5c2d2b14

Yannick Guerrini authored Feb 24, 2015

Change 'firwmare' to 'firmware'
Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c2d2b14

xen-netback: release pending index before pushing Tx responses · 7fbb9d84

David Vrabel authored Feb 24, 2015

If the pending indexes are released /after/ pushing the Tx response
then a stale pending index may be used if a new Tx request is
immediately pushed by the frontend.  The may cause various WARNINGs or
BUGs if the stale pending index is actually still in use.

Fix this by releasing the pending index before pushing the Tx
response.

The full barrier for the pending ring update is not required since the
the Tx response push already has a suitable write barrier.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7fbb9d84

af_packet: don't pass empty blocks for PACKET_V3 · 41a50d62

Alexander Drozdov authored Feb 24, 2015

Before da413eec ("packet: Fixed TPACKET V3 to signal poll when block is
closed rather than every packet") poll listening for an af_packet socket was
not signaled if there was no packets to process. After the patch poll is
signaled evety time when block retire timer expires. That happens because
af_packet closes the current block on timeout even if the block is empty.

Passing empty blocks to the user not only wastes CPU but also wastes ring
buffer space increasing probability of packets dropping on small timeouts.
Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com>
Cc: Dan Collins <dan@dcollins.co.nz>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Guy Harris <guy@alum.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>

41a50d62

rtnetlink: avoid 0 sized arrays · 4e10fd5b

Sasha Levin authored Feb 24, 2015

Arrays (when not in a struct) "shall have a value greater than zero".

GCC complains when it's not the case here.

Fixes: ba7d49b1 ("rtnetlink: provide api for getting and setting slave info")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e10fd5b

23 Feb, 2015 9 commits

ipv6: addrconf: validate new MTU before applying it · 77751427

Marcelo Leitner authored Feb 23, 2015

Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.

If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)

The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.

Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

77751427

altera_tse: Fixes in NAPI and interrupt handling paths · 8d4ac39d

Vlastimil Setka authored Feb 23, 2015

Incorrect NAPI polling caused WARNING at net/core/dev.c net_rx_action.
Some stability issues were also seen at high throughput and system
load before this patch.

This patch contains several changes in altera_tse_main.c:

- tse_rx() is fixed to not process more than `limit` frames

- tse_poll() is refactored to match NAPI logic
  - only received frames are counted for return value
  - removed bogus condition `(rxcomplete >= budget || txcomplete > 0)`
  - replace by: if (rxcomplete < budget) -> call __napi_complete and enable irq

- altera_isr()
  - replace spin_lock_irqsave() by spin_lock() - we are in isr
  - use spinlocks just over irq manipulation, not over __napi_schedule
  - reset IRQ first, then disable and schedule napi

This is a cleaned up resubmission from Vlastimil's recent submission.
Signed-off-by: Vlastimil Setka <setka@vsis.cz>
Signed-off-by: Roman Pisl <rpisl@kky.zcu.cz>
Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8d4ac39d

altera_tse: Correct typo in obtaining tx_fifo_depth from devicetree · fe6e4081

Vlastimil Setka authored Feb 23, 2015

This patch corrects a typo in the way tx_fifo_depth is read from the
devicetree. This patch was submitted by Vlastimil about a week ago,
and is now cleaned up and resubmitted.
Signed-off-by: Vlastimil Setka <setka@vsis.cz>
Signed-off-by: Vince Bridgers <vbridger@opensource.altera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe6e4081

net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg · d720d8ce

Catalin Marinas authored Feb 23, 2015

With commit a7526eb5 (net: Unbreak compat_sys_{send,recv}msg), the
MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points,
changing the kernel compat behaviour from the one before the commit it
was trying to fix (1be374a0, net: Block MSG_CMSG_COMPAT in
send(m)msg and recv(m)msg).

On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native
32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by
the kernel). However, on a 64-bit kernel, the compat ABI is different
with commit a7526eb5.

This patch changes the compat_sys_{send,recv}msg behaviour to the one
prior to commit 1be374a0.

The problem was found running 32-bit LTP (sendmsg01) binary on an arm64
kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg()
but the general rule is not to break user ABI (even when the user
behaviour is not entirely sane).

Fixes: a7526eb5 (net: Unbreak compat_sys_{send,recv}msg)
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d720d8ce

irda: replace current->state by set_current_state() · a948f8ce

Fabian Frederick authored Feb 23, 2015

Use helper functions to access current->state.
Direct assignments are prone to races and therefore buggy.

current->state = TASK_RUNNING can be replaced by __set_current_state()

Thanks to Peter Zijlstra for the exact definition of the problem.
Suggested-By: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

a948f8ce

net: sched: export tc_connmark.h so it is uapi accessible · 30ff5476
Jamal Hadi Salim authored Feb 23, 2015
```
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
```
30ff5476

team: fix possible null pointer dereference in team_handle_frame · 57e59563

Jiri Pirko authored Feb 23, 2015

Currently following race is possible in team:

CPU0                                        CPU1
                                            team_port_del
                                              team_upper_dev_unlink
                                                priv_flags &= ~IFF_TEAM_PORT
team_handle_frame
  team_port_get_rcu
    team_port_exists
      priv_flags & IFF_TEAM_PORT == 0
    return NULL (instead of port got
                 from rx_handler_data)
                                              netdev_rx_handler_unregister

The thing is that the flag is removed before rx_handler is unregistered.
If team_handle_frame is called in between, team_port_exists returns 0
and team_port_get_rcu will return NULL.
So do not check the flag here. It is guaranteed by netdev_rx_handler_unregister
that team_handle_frame will always see valid rx_handler_data pointer.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Fixes: 3d249d4c ("net: introduce ethernet teaming device")
Signed-off-by: David S. Miller <davem@davemloft.net>

57e59563

decnet: Fix obvious o/0 typo · 46b9e4bb

Rasmus Villemoes authored Feb 23, 2015

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

46b9e4bb

rhashtable: initialize all rhashtable walker members · 71bb0012

Sasha Levin authored Feb 23, 2015

Commit f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*") forgot to
initialize the members of struct rhashtable_walker after allocating it, which
caused an undefined value for 'resize' which is used later on.

Fixes: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71bb0012