Commits · a64f3f83505f4ed6662fd9dda998d4890aba58bd · Kirill Smelkov / linux

03 Dec, 2015 7 commits

David S. Miller authored Dec 02, 2015

Marcin Wojtas says:

====================
Marvell Armada XP/370/38X Neta fixes

I'm sending v4 with corrected commit log of the last patch, in order to
avoid possible conflicts between the branches as suggested by Gregory
Clement.

Best regards,
Marcin Wojtas

Changes from v4:
* Correct commit log of patch 6/6

Changes from v2:
* Style fixes in patch updating mbus protection
* Remove redundant stable notifications except for patch 4/6

Changes from v1:
* update MBUS windows access protection register once, after whole loop
* add fixing value of MVNETA_RXQ_INTR_ENABLE_ALL_MASK
* add fixing error path for skb_build()
* add possibility of setting custom TX IP checksum limit in DT property
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a64f3f83

mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0 · c4a25007

Marcin Wojtas authored Nov 30, 2015

The Ethernet controller found in the Armada 38x SoC's family support
TCP/IP checksumming with frame sizes larger than 1600 bytes, however
only on port 0.

This commit enables it by setting 'tx-csum-limit' to 9800B in
'ethernet@70000' node.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4a25007

net: mvneta: enable setting custom TX IP checksum limit · 9110ee07

Marcin Wojtas authored Nov 30, 2015

Since Armada 38x SoC can support IP checksum for jumbo frames only on
a single port, it means that this feature should be enabled per-port,
rather than for the whole SoC.

This patch enables setting custom TX IP checksum limit by adding new
optional property to the mvneta device tree node. If not used, by
default 1600B is set for "marvell,armada-370-neta" and 9800B for other
strings, which ensures backward compatibility. Binding documentation
is updated accordingly.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9110ee07

net: mvneta: fix error path for building skb · 26c17a17

Marcin Wojtas authored Nov 30, 2015

In the actual RX processing, there is same error path for both descriptor
ring refilling and building skb fails. This is not correct, because after
successful refill, the ring is already updated with newly allocated
buffer. Then, in case of build_skb() fail, hitherto code left the original
buffer unmapped.

This patch fixes above situation by swapping error check of skb build with
DMA-unmap of original buffer.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Simon Guinot <simon.guinot@sequanux.org>
Cc: <stable@vger.kernel.org> # v4.2+
Fixes a84e3289 ("net: mvneta: fix refilling for Rx DMA buffers")
Signed-off-by: David S. Miller <davem@davemloft.net>

26c17a17

net: mvneta: fix bit assignment for RX packet irq enable · dc1aadf6

Marcin Wojtas authored Nov 30, 2015

A value originally defined in the driver was inappropriate. Even though
the ingress was somehow working, writing MVNETA_RXQ_INTR_ENABLE_ALL_MASK
to MVNETA_INTR_ENABLE didn't make any effect, because the bits [31:16]
are reserved and read-only.

This commit updates MVNETA_RXQ_INTR_ENABLE_ALL_MASK to be compliant with
the controller's documentation.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>

Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network
unit")
Signed-off-by: David S. Miller <davem@davemloft.net>

dc1aadf6

net: mvneta: fix bit assignment in MVNETA_RXQ_CONFIG_REG · e5bdf689

Marcin Wojtas authored Nov 30, 2015

MVNETA_RXQ_HW_BUF_ALLOC bit which controls enabling hardware buffer
allocation was mistakenly set as BIT(1). This commit fixes the assignment.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com>

Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network
unit")
Signed-off-by: David S. Miller <davem@davemloft.net>

e5bdf689

net: mvneta: add configuration for MBUS windows access protection · db6ba9a5

Marcin Wojtas authored Nov 30, 2015

This commit adds missing configuration of MBUS windows access protection
in mvneta_conf_mbus_windows function - a dedicated variable for that
purpose remained there unused since v3.8 initial mvneta support. Because
of that the register contents were inherited from the bootloader.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Reviewed-by: Gregory CLEMENT <gregory.clement@free-electrons.com>

Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network
unit")
Signed-off-by: David S. Miller <davem@davemloft.net>

db6ba9a5

02 Dec, 2015 8 commits

Merge branch 'thunderx-fixes' · 9f7aec5f

David S. Miller authored Dec 02, 2015

Sunil Goutham says:

====================
thunderx: miscellaneous fixes

This patch series contains fixes for various issues observed
with BGX and NIC drivers.

Changes from v1:
- Fixed comment syle in the first patch of the series
- Removed 'Increase transmit queue length' patch from the series,
  will recheck if it's a driver or system issue and resubmit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9f7aec5f

net: thunderx: Enable BGX LMAC's RX/TX only after VF is up · bc69fdfc

Sunil Goutham authored Dec 02, 2015

Enable or disable BGX LMAC's RX/TX based on corresponding VF's
status. If otherwise, when multiple LMAC's physical link is up
then packets from all LMAC's whose corresponding VF is not yet
initialized will get forwarded to VF0. This is due to VNIC's default
configuration where CPI, RSSI e.t.c point to VF0/QSET0/RQ0.

This patch will prevent multiple copies of packets on VF0.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc69fdfc

net: thunderx: Switchon carrier only upon interface link up · 0b72a9a1

Sunil Goutham authored Dec 02, 2015

Call netif_carrier_on() only if interface's link is up. Switching this on
upon IFF_UP by default, is causing issues with ethernet channel bonding
in LACP mode. Initial NETDEV_CHANGE notification was being skipped.

Also fixed some issues with link/speed/duplex reporting via ethtool.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0b72a9a1

net: thunderx: Set CQ timer threshold properly · 006394a7

Sunil Goutham authored Dec 02, 2015

Properly set CQ timer threshold and also set it to 2us.
With previous incorrect settings it was set to 0.5us which is too less.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

006394a7

net: thunderx: Wait for delayed work to finish before destroying it · a7b1f535

Thanneeru Srinivasulu authored Dec 02, 2015

While VNIC or BGX driver teardown, wait for already scheduled delayed work to
finish before destroying it.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7b1f535

net: thunderx: Force to load octeon-mdio before bgx driver. · 723cda5b

Thanneeru Srinivasulu authored Dec 02, 2015

Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

723cda5b

openvswitch: properly refcount vport-vxlan module · 83e4bf7a

Paolo Abeni authored Nov 30, 2015

After 614732ea, no refcount is maintained for the vport-vxlan module.
This allows the userspace to remove such module while vport-vxlan
devices still exist, which leads to later oops.

v1 -> v2:
 - move vport 'owner' initialization in ovs_vport_ops_register()
   and make such function a macro

Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83e4bf7a

bpf, array: fix heap out-of-bounds access when updating elements · fbca9d2d

Daniel Borkmann authored Nov 30, 2015

During own review but also reported by Dmitry's syzkaller [1] it has been
noticed that we trigger a heap out-of-bounds access on eBPF array maps
when updating elements. This happens with each map whose map->value_size
(specified during map creation time) is not multiple of 8 bytes.

In array_map_alloc(), elem_size is round_up(attr->value_size, 8) and
used to align array map slots for faster access. However, in function
array_map_update_elem(), we update the element as ...

memcpy(array->value + array->elem_size * index, value, array->elem_size);

... where we access 'value' out-of-bounds, since it was allocated from
map_update_elem() from syscall side as kmalloc(map->value_size, GFP_USER)
and later on copied through copy_from_user(value, uvalue, map->value_size).
Thus, up to 7 bytes, we can access out-of-bounds.

Same could happen from within an eBPF program, where in worst case we
access beyond an eBPF program's designated stack.

Since 1be7f75d ("bpf: enable non-root eBPF programs") didn't hit an
official release yet, it only affects priviledged users.

In case of array_map_lookup_elem(), the verifier prevents eBPF programs
from accessing beyond map->value_size through check_map_access(). Also
from syscall side map_lookup_elem() only copies map->value_size back to
user, so nothing could leak.

  [1] http://github.com/google/syzkaller

Fixes: 28fbcfa0 ("bpf: add array type of eBPF maps")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fbca9d2d

01 Dec, 2015 5 commits

net: fix sock_wake_async() rcu protection · ceb5d58b

Eric Dumazet authored Nov 29, 2015

Dmitry provided a syzkaller (http://github.com/google/syzkaller)
triggering a fault in sock_wake_async() when async IO is requested.

Said program stressed af_unix sockets, but the issue is generic
and should be addressed in core networking stack.

The problem is that by the time sock_wake_async() is called,
we should not access the @flags field of 'struct socket',
as the inode containing this socket might be freed without
further notice, and without RCU grace period.

We already maintain an RCU protected structure, "struct socket_wq"
so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
is the safe route.

It also reduces number of cache lines needing dirtying, so might
provide a performance improvement anyway.

In followup patches, we might move remaining flags (SOCK_NOSPACE,
SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
being mostly read and let it being shared between cpus.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ceb5d58b

net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA · 9cd3e072

Eric Dumazet authored Nov 29, 2015

This patch is a cleanup to make following patch easier to
review.

Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()

To ease backports, we rename both constants.

Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9cd3e072

vmxnet3: fix checks for dma mapping errors · 5738a09d

Alexey Khoroshilov authored Nov 28, 2015

vmxnet3_drv does not check dma_addr with dma_mapping_error()
after mapping dma memory. The patch adds the checks and
tries to handle failures.

Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5738a09d

wan/x25: Fix use-after-free in x25_asy_open_tty() · ee9159dd

Peter Hurley authored Nov 27, 2015

The N_X25 line discipline may access the previous line discipline's closed
and already-freed private data on open [1].

The tty->disc_data field _never_ refers to valid data on entry to the
line discipline's open() method. Rather, the ldisc is expected to
initialize that field for its own use for the lifetime of the instance
(ie. from open() to close() only).

[1]
    [  634.336761] ==================================================================
    [  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0
    [  634.339558] Read of size 4 by task syzkaller_execu/8981
    [  634.340359] =============================================================================
    [  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
    ...
    [  634.405018] Call Trace:
    [  634.405277] dump_stack (lib/dump_stack.c:52)
    [  634.405775] print_trailer (mm/slub.c:655)
    [  634.406361] object_err (mm/slub.c:662)
    [  634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
    [  634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
    [  634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
    [  634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
    [  634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
    [  634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
    [  634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
    [  634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
    [  634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)
Reported-and-tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee9159dd

Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" · 304d888b

Nicolas Dichtel authored Nov 27, 2015

This reverts commit ab450605.

In IPv6, we cannot inherit the dst of the original dst. ndisc packets
are IPv6 packets and may take another route than the original packet.

This patch breaks the following scenario: a packet comes from eth0 and
is forwarded through vxlan1. The encapsulated packet triggers an NS
which cannot be sent because of the wrong route.

CC: Jiri Benc <jbenc@redhat.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

304d888b

30 Nov, 2015 12 commits

tcp: initialize tp->copied_seq in case of cross SYN connection · 142a2e7e

Eric Dumazet authored Nov 26, 2015

Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :

WARN_ON(tp->copied_seq != tp->rcv_nxt &&
        !(flags & (MSG_PEEK | MSG_TRUNC)));

His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.

Thanks again Dmitry for your report and testings.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

142a2e7e

net: fsl: Fix error checking for platform_get_irq() · 0f2c0d32

Mark Brown authored Nov 26, 2015

The gianfar driver has recently been enabled on arm64 but fails to build
since it check the return value of platform_get_irq() against NO_IRQ. Fix
this by instead checking for a negative error code.

Even on ARM where this code was previously being built this check was
incorrect since platform_get_irq() returns a negative error code which
may not be exactly the (unsigned int)(-1) that NO_IRQ is defined to be.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0f2c0d32

net: fsl: Don't use NO_IRQ to check return value of irq_of_parse_and_map() · fea0f665

Mark Brown authored Nov 26, 2015

This driver can be built on arm64 but relies on NO_IRQ to check the return
value of irq_of_parse_and_map() which fails to build on arm64 because the
architecture does not provide a NO_IRQ. Fix this to correctly check the
return value of irq_of_parse_and_map().

Even on ARM systems where the driver was previously used the check was
broken since on ARM NO_IRQ is -1 but irq_of_parse_and_map() returns 0 on
error.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

fea0f665

af-unix: passcred support for sendpage · 9490f886

Hannes Frederic Sowa authored Nov 26, 2015

sendpage did not care about credentials at all. This could lead to
situations in which because of fd passing between processes we could
append data to skbs with different scm data. It is illegal to splice those
skbs together. Instead we have to allocate a new skb and if requested
fill out the scm details.

Fixes: 869e7c62 ("net: af_unix: implement stream sendpage support")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9490f886

Merge branch 'stmmac-fixes' · 6e0f0331

David S. Miller authored Nov 30, 2015

Giuseppe Cavallaro says:

====================
Spare stmmac fixes

These are some fixes for the stmmac d.d. tested on STi platforms.
They are for some part of the PM, STi glue and rx path when test
Jumbo.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6e0f0331

stmmac: fix oversized frame reception · e527c4a7

Giuseppe CAVALLARO authored Nov 26, 2015

The receive skb buffers can be preallocated when the link is opened
according to mtu size.
While testing on a network environment with not standard MTU (e.g. 3000),
a panic occurred if an incoming packet had a length greater than rx skb
buffer size. This is because the HW is programmed to copy, from the DMA,
an Jumbo frame and the Sw must check if the allocated buffer is enough to
store the frame.
Signed-off-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e527c4a7

stmmac: fix PHY reset during resume · ae26c1c6

Giuseppe CAVALLARO authored Nov 26, 2015

When stmmac_mdio_reset, was called from stmmac_resume, it was not
resetting the PHY due to which MAC was not getting reset properly and
hence ethernet interface not was resumed properly.
The issue was currently only reproducible on stih301-b2204.
Signed-off-by: Pankaj Dev <pankaj.dev@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ae26c1c6

stmmac: dwmac-sti: fix st,tx-retime-src check · 22407e13

Giuseppe CAVALLARO authored Nov 26, 2015

In case of the st,tx-retime-src is missing from device-tree
(it's an optional field) the driver will invoke the strcasecmp to check
which clock has been selected and this is a bug; the else condition
is needed.

In the dwmac_setup, the "rs" variable, passed to the strcasecmp, was not
initialized and the compiler, depending on the options adopted, could
take it in some different part of the stack generating the hang in such
configuration.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

22407e13

stmmac: fix csr clock divisor for 300MHz · 61adcc03

Giuseppe CAVALLARO authored Nov 26, 2015

This patch is to fix the csr clock in case of 300MHz is provided.
Reported-by: Kent Borg <Kent.Borg@csr.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61adcc03

stmmac: fix a filter problem after resuming. · ac316c78

Giuseppe CAVALLARO authored Nov 26, 2015

When resume the HW is re-configured but some settings can be lost.
For example, the MAC Address_X High/Low Registers used for VLAN tagging..
So, while resuming, the set_filter callback needs to be invoked to
re-program perfect and hash-table registers.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ac316c78

drivers: net: xgene: fix possible use after free · 9ffad80a

Eric Dumazet authored Nov 25, 2015

Once TX has been enabled on a NIC, it is illegal to access skb,
as this skb might have been freed by another cpu, from TX completion
handler.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Cc: Iyappan Subramanian <isubramanian@apm.com>
Acked-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9ffad80a

packet: Allow packets with only a header (but no payload) · 880621c2

Martin Blumenstingl authored Nov 22, 2015

Commit 9c707762 ("packet: make packet_snd fail on len smaller
than l2 header") added validation for the packet size in packet_snd.
This change enforces that every packet needs a header (with at least
hard_header_len bytes) plus a payload with at least one byte. Before
this change the payload was optional.

This fixes PPPoE connections which do not have a "Service" or
"Host-Uniq" configured (which is violating the spec, but is still
widely used in real-world setups). Those are currently failing with the
following message: "pppd: packet size is too short (24 <= 24)"
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

880621c2

25 Nov, 2015 2 commits

bpf: fix clearing on persistent program array maps · c9da161c

Daniel Borkmann authored Nov 24, 2015

Currently, when having map file descriptors pointing to program arrays,
there's still the issue that we unconditionally flush program array
contents via bpf_fd_array_map_clear() in bpf_map_release(). This happens
when such a file descriptor is released and is independent of the map's
refcount.

Having this flush independent of the refcount is for a reason: there
can be arbitrary complex dependency chains among tail calls, also circular
ones (direct or indirect, nesting limit determined during runtime), and
we need to make sure that the map drops all references to eBPF programs
it holds, so that the map's refcount can eventually drop to zero and
initiate its freeing. Btw, a walk of the whole dependency graph would
not be possible for various reasons, one being complexity and another
one inconsistency, i.e. new programs can be added to parts of the graph
at any time, so there's no guaranteed consistent state for the time of
such a walk.

Now, the program array pinning itself works, but the issue is that each
derived file descriptor on close would nevertheless call unconditionally
into bpf_fd_array_map_clear(). Instead, keep track of users and postpone
this flush until the last reference to a user is dropped. As this only
concerns a subset of references (f.e. a prog array could hold a program
that itself has reference on the prog array holding it, etc), we need to
track them separately.

Short analysis on the refcounting: on map creation time usercnt will be
one, so there's no change in behaviour for bpf_map_release(), if unpinned.
If we already fail in map_create(), we are immediately freed, and no
file descriptor has been made public yet. In bpf_obj_pin_user(), we need
to probe for a possible map in bpf_fd_probe_obj() already with a usercnt
reference, so before we drop the reference on the fd with fdput().
Therefore, if actual pinning fails, we need to drop that reference again
in bpf_any_put(), otherwise we keep holding it. When last reference
drops on the inode, the bpf_any_put() in bpf_evict_inode() will take
care of dropping the usercnt again. In the bpf_obj_get_user() case, the
bpf_any_get() will grab a reference on the usercnt, still at a time when
we have the reference on the path. Should we later on fail to grab a new
file descriptor, bpf_any_put() will drop it, otherwise we hold it until
bpf_map_release() time.

Joint work with Alexei.

Fixes: b2197755 ("bpf: add support for persistent maps/progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9da161c

isdn: Partially revert debug format string usage clean up · 19cebbcb

Christoph Biedl authored Nov 25, 2015

Commit 35a4a573 ("isdn: clean up debug format string usage") introduced
a safeguard to avoid accidential format string interpolation of data
when calling debugl1 or HiSax_putstatus. This did however not take into
account VHiSax_putstatus (called by HiSax_putstatus) does *not* call
vsprintf if the head parameter is NULL - the format string is treated
as plain text then instead. As a result, the string "%s" is processed
literally, and the actual information is lost. This affects the isdnlog
userspace program which stopped logging information since that commit.

So revert the HiSax_putstatus invocations to the previous state.

Fixes: 35a4a573 ("isdn: clean up debug format string usage")
Cc: Kees Cook <keescook@chromium.org>
Cc: Karsten Keil <isdn@linux-pingi.de>
Signed-off-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

19cebbcb

24 Nov, 2015 6 commits

RDS: fix race condition when sending a message on unbound socket · 8c7188b2

Quentin Casasnovas authored Nov 24, 2015

Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket.  The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket.  This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().

Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.

I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.

Complete earlier incomplete fix to CVE-2015-6937:

  74e98eb0 ("RDS: verify the underlying transport exists before creating a connection")

Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8c7188b2

net: openvswitch: Remove invalid comment · 20f79566

Aaron Conole authored Nov 24, 2015

During pre-upstream development, the openvswitch datapath used a custom
hashtable to store vports that could fail on delete due to lack of
memory. However, prior to upstream submission, this code was reworked to
use an hlist based hastable with flexible-array based buckets. As such
the failure condition was eliminated from the vport_del path, rendering
this comment invalid.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

20f79566

net: ipmr, ip6mr: fix vif/tunnel failure race condition · fbdd29bf

Nikolay Aleksandrov authored Nov 24, 2015

Since (at least) commit b17a7c17 ("[NET]: Do sysfs registration as
part of register_netdevice."), netdev_run_todo() deals only with
unregistration, so we don't need to do the rtnl_unlock/lock cycle to
finish registration when failing pimreg or dvmrp device creation. In
fact that opens a race condition where someone can delete the device
while rtnl is unlocked because it's fully registered. The problem gets
worse when netlink support is introduced as there are more points of entry
that can cause it and it also makes reusing that code correctly impossible.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fbdd29bf

rxrpc: Correctly handle ack at end of client call transmit phase · 33c40e24

David Howells authored Nov 24, 2015

Normally, the transmit phase of a client call is implicitly ack'd by the
reception of the first data packet of the response being received.
However, if a security negotiation happens, the transmit phase, if it is
entirely contained in a single packet, may get an ack packet in response
and then may get aborted due to security negotiation failure.

Because the client has shifted state to RXRPC_CALL_CLIENT_AWAIT_REPLY due
to having transmitted all the data, the code that handles processing of the
received ack packet doesn't note the hard ack the data packet.

The following abort packet in the case of security negotiation failure then
incurs an assertion failure when it tries to drain the Tx queue because the
hard ack state is out of sync (hard ack means the packets have been
processed and can be discarded by the sender; a soft ack means that the
packets are received but could still be discarded and rerequested by the
receiver).

To fix this, we should record the hard ack we received for the ack packet.

The assertion failure looks like:

	RxRPC: Assertion failed
	1 <= 0 is false
	0x1 <= 0x0 is false
	------------[ cut here ]------------
	kernel BUG at ../net/rxrpc/ar-ack.c:431!
	...
	RIP: 0010:[<ffffffffa006857b>]  [<ffffffffa006857b>] rxrpc_rotate_tx_window+0xbc/0x131 [af_rxrpc]
	...
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

33c40e24

ipv6: distinguish frag queues by device for multicast and link-local packets · 264640fc

Michal Kubeček authored Nov 24, 2015

If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.

To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.

Note: similar patch has been already submitted by Yoshifuji Hideaki in

  http://patchwork.ozlabs.org/patch/220979/

but got lost and forgotten for some reason.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

264640fc

drivers: net: xgene: fix: ifconfig up/down crash · aeb20b6b

Iyappan Subramanian authored Nov 23, 2015

Fixing kernel crash when doing ifconfig down and up in a loop,

[ 124.028237] Call trace:
[ 124.030670] [<ffffffc000367ce0>] memcpy+0x20/0x180
[ 124.035436] [<ffffffc00053c250>] skb_clone+0x3c/0xa8
[ 124.040374] [<ffffffc00053ffa4>] __skb_tstamp_tx+0xc0/0x118
[ 124.045918] [<ffffffc00054000c>] skb_tstamp_tx+0x10/0x1c
[ 124.051203] [<ffffffc00049bc84>] xgene_enet_start_xmit+0x2e4/0x33c
[ 124.057352] [<ffffffc00054fc20>] dev_hard_start_xmit+0x2e8/0x400
[ 124.063327] [<ffffffc00056cb14>] sch_direct_xmit+0x90/0x1d4
[ 124.068870] [<ffffffc000550100>] __dev_queue_xmit+0x28c/0x498
[ 124.074585] [<ffffffc00055031c>] dev_queue_xmit_sk+0x10/0x1c
[ 124.080216] [<ffffffc0005c3f14>] ip_finish_output2+0x3d0/0x438
[ 124.086017] [<ffffffc0005c5794>] ip_finish_output+0x198/0x1ac
[ 124.091732] [<ffffffc0005c61d4>] ip_output+0xec/0x164
[ 124.096755] [<ffffffc0005c5910>] ip_local_out_sk+0x38/0x48
[ 124.102211] [<ffffffc0005c5d84>] ip_queue_xmit+0x288/0x330
[ 124.107668] [<ffffffc0005da8bc>] tcp_transmit_skb+0x908/0x964
[ 124.113383] [<ffffffc0005dc0d4>] tcp_send_ack+0x128/0x138
[ 124.118753] [<ffffffc0005d1580>] __tcp_ack_snd_check+0x5c/0x94
[ 124.124555] [<ffffffc0005d7a0c>] tcp_rcv_established+0x554/0x68c
[ 124.130530] [<ffffffc0005df0d4>] tcp_v4_do_rcv+0xa4/0x37c
[ 124.135900] [<ffffffc000539430>] release_sock+0xb4/0x150
[ 124.141184] [<ffffffc0005cdf88>] tcp_recvmsg+0x448/0x9e0
[ 124.146468] [<ffffffc0005f2f3c>] inet_recvmsg+0xa0/0xc0
[ 124.151666] [<ffffffc000533660>] sock_recvmsg+0x10/0x1c
[ 124.156863] [<ffffffc0005370d4>] SyS_recvfrom+0xa4/0xf8
[ 124.162061] Code: f2400c84 540001c0 cb040042 36000064 (38401423)
[ 124.168133] ---[ end trace 7ab2550372e8a65b ]---

The fix was to reorder napi_enable, napi_disable, request_irq and
free_irq calls, move register_netdev after dma_coerce_mask_and_coherent.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Tested-by: Khuong Dinh <kdinh@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aeb20b6b