Commits · c20a548792f15f8d8e38cd74356301c6db0d241f · nexedi / linux

05 Dec, 2017 14 commits

net: qualcomm: rmnet: Fix leak on transmit failure · c20a5487

Subash Abhinov Kasiviswanathan authored Dec 05, 2017

If a skb in transmit path does not have sufficient headroom to add
the map header, the skb is not sent out and is never freed.

Fixes: ceed73a2 ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

c20a5487

VSOCK: fix outdated sk_state value in hvs_release() · c9d3fe9d

Stefan Hajnoczi authored Dec 05, 2017

Since commit 3b4477d2 ("VSOCK: use TCP
state constants for sk_state") VSOCK has used TCP_* constants for
sk_state.

Commit b4562ca7 ("hv_sock: add locking
in the open/close/release code paths") reintroduced the SS_DISCONNECTING
constant.

This patch replaces the old SS_DISCONNECTING with the new TCP_CLOSING
constant.

CC: Dexuan Cui <decui@microsoft.com>
CC: Cathy Avery <cavery@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c9d3fe9d

tipc: fix memory leak in tipc_accept_from_sock() · a7d5f107

Jon Maloy authored Dec 04, 2017

When the function tipc_accept_from_sock() fails to create an instance of
struct tipc_subscriber it omits to free the already created instance of
struct tipc_conn instance before it returns.

We fix that with this commit.
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a7d5f107

tipc: fix a null pointer deref on error path · 672ecbe1

Cong Wang authored Dec 04, 2017

In tipc_topsrv_kern_subscr() when s->tipc_conn_new() fails
we call tipc_close_conn() to clean up, but in this case
calling conn_put() is just enough.

This fixes the folllowing crash:

 kasan: GPF could be caused by NULL-ptr deref or user memory access
 general protection fault: 0000 [#1] SMP KASAN
 Dumping ftrace buffer:
    (ftrace buffer empty)
 Modules linked in:
 CPU: 0 PID: 3085 Comm: syzkaller064164 Not tainted 4.15.0-rc1+ #137
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 task: 00000000c24413a5 task.stack: 000000005e8160b5
 RIP: 0010:__lock_acquire+0xd55/0x47f0 kernel/locking/lockdep.c:3378
 RSP: 0018:ffff8801cb5474a8 EFLAGS: 00010002
 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff85ecb400
 RBP: ffff8801cb547830 R08: 0000000000000001 R09: 0000000000000000
 R10: 0000000000000000 R11: ffffffff87489d60 R12: ffff8801cd2980c0
 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000020
 FS:  00000000014ee880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007ffee2426e40 CR3: 00000001cb85a000 CR4: 00000000001406f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4004
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
  _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:175
  spin_lock_bh include/linux/spinlock.h:320 [inline]
  tipc_subscrb_subscrp_delete+0x8f/0x470 net/tipc/subscr.c:201
  tipc_subscrb_delete net/tipc/subscr.c:238 [inline]
  tipc_subscrb_release_cb+0x17/0x30 net/tipc/subscr.c:316
  tipc_close_conn+0x171/0x270 net/tipc/server.c:204
  tipc_topsrv_kern_subscr+0x724/0x810 net/tipc/server.c:514
  tipc_group_create+0x702/0x9c0 net/tipc/group.c:184
  tipc_sk_join net/tipc/socket.c:2747 [inline]
  tipc_setsockopt+0x249/0xc10 net/tipc/socket.c:2861
  SYSC_setsockopt net/socket.c:1851 [inline]
  SyS_setsockopt+0x189/0x360 net/socket.c:1830
  entry_SYSCALL_64_fastpath+0x1f/0x96

Fixes: 14c04493 ("tipc: add ability to order and receive topology events in driver")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

672ecbe1

Merge branch 'sh_eth-dma-mapping-fixes' · a6cec1f5

David S. Miller authored Dec 05, 2017

Thomas Petazzoni says:

====================
net: sh_eth: DMA mapping API fixes

Here are two patches that fix how the sh_eth driver is using the DMA
mapping API: a bogus struct device is used in some places, or a NULL
struct device is used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a6cec1f5

net: sh_eth: don't use NULL as "struct device" for the DMA mapping API · 573500db

Thomas Petazzoni authored Dec 04, 2017

Using NULL as argument for the DMA mapping API is bogus, as the DMA
mapping API may use information from the "struct device" to perform
the DMA mapping operation. Therefore, pass the appropriate "struct
device".
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

573500db

net: sh_eth: use correct "struct device" when calling DMA mapping functions · 22c1aed4

Thomas Petazzoni authored Dec 04, 2017

There are two types of "struct device": the one representing the
physical device on its physical bus (platform, SPI, PCI, etc.), and
the one representing the logical device in its device class (net,
etc.).

The DMA mapping API expects to receive as argument a "struct device"
representing the physical device, as the "struct device" contains
information about the bus that the DMA API needs.

However, the sh_eth driver mistakenly uses the "struct device"
representing the logical device (embedded in "struct net_device")
rather than the "struct device" representing the physical device on
its bus.

This commit fixes that by adjusting all calls to the DMA mapping API.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

22c1aed4

Merge branch 'RED-qdisc-fixes' · c1d69de9

David S. Miller authored Dec 05, 2017

Nogah Frankel says:

====================
RED qdisc fixes

Add some input validation checks to RED qdisc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c1d69de9

net_sched: red: Avoid illegal values · 8afa10cb

Nogah Frankel authored Dec 04, 2017

Check the qmin & qmax values doesn't overflow for the given Wlog value.
Check that qmin <= qmax.

Fixes: a7834745 ("[PKT_SCHED]: Generic RED layer")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8afa10cb

net_sched: red: Avoid devision by zero · 5c472203

Nogah Frankel authored Dec 04, 2017

Do not allow delta value to be zero since it is used as a divisor.

Fixes: 8af2a218 ("sch_red: Adaptative RED AQM")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c472203

gianfar: fix a flooded alignment reports because of padding issue. · 58117672

Zumeng Chen authored Dec 04, 2017

According to LS1021A RM, the value of PAL can be set so that the start of the
IP header in the receive data buffer is aligned to a 32-bit boundary. Normally,
setting PAL = 2 provides minimal padding to ensure such alignment of the IP
header.

However every incoming packet's 8-byte time stamp will be inserted into the
packet data buffer as padding alignment bytes when hardware time stamping is
enabled.

So we set the padding 8+2 here to avoid the flooded alignment faults:

root@128:~# cat /proc/cpu/alignment
User:           0
System:         17539 (inet_gro_receive+0x114/0x2c0)
Skipped:        0
Half:           0
Word:           0
DWord:          0
Multi:          17539
User faults:    2 (fixup)

Also shown when exception report enablement

CPU: 0 PID: 161 Comm: irq/66-eth1_g0_ Not tainted 4.1.21-rt13-WR8.0.0.0_preempt-rt #16
Hardware name: Freescale LS1021A
[<8001b420>] (unwind_backtrace) from [<8001476c>] (show_stack+0x20/0x24)
[<8001476c>] (show_stack) from [<807cfb48>] (dump_stack+0x94/0xac)
[<807cfb48>] (dump_stack) from [<80025d70>] (do_alignment+0x720/0x958)
[<80025d70>] (do_alignment) from [<80009224>] (do_DataAbort+0x40/0xbc)
[<80009224>] (do_DataAbort) from [<80015398>] (__dabt_svc+0x38/0x60)
Exception stack(0x86ad1cc0 to 0x86ad1d08)
1cc0: f9b3e080 86b3d072 2d78d287 00000000 866816c0 86b3d05e 86e785d0 00000000
1ce0: 00000011 0000000e 80840ab0 86ad1d3c 86ad1d08 86ad1d08 806d7fc0 806d806c
1d00: 40070013 ffffffff
[<80015398>] (__dabt_svc) from [<806d806c>] (inet_gro_receive+0x114/0x2c0)
[<806d806c>] (inet_gro_receive) from [<80660eec>] (dev_gro_receive+0x21c/0x3c0)
[<80660eec>] (dev_gro_receive) from [<8066133c>] (napi_gro_receive+0x44/0x17c)
[<8066133c>] (napi_gro_receive) from [<804f0538>] (gfar_clean_rx_ring+0x39c/0x7d4)
[<804f0538>] (gfar_clean_rx_ring) from [<804f0bf4>] (gfar_poll_rx_sq+0x58/0xe0)
[<804f0bf4>] (gfar_poll_rx_sq) from [<80660b10>] (net_rx_action+0x27c/0x43c)
[<80660b10>] (net_rx_action) from [<80033638>] (do_current_softirqs+0x1e0/0x3dc)
[<80033638>] (do_current_softirqs) from [<800338c4>] (__local_bh_enable+0x90/0xa8)
[<800338c4>] (__local_bh_enable) from [<8008025c>] (irq_forced_thread_fn+0x70/0x84)
[<8008025c>] (irq_forced_thread_fn) from [<800805e8>] (irq_thread+0x16c/0x244)
[<800805e8>] (irq_thread) from [<8004e490>] (kthread+0xe8/0x104)
[<8004e490>] (kthread) from [<8000fda8>] (ret_from_fork+0x14/0x2c)
Signed-off-by: Zumeng Chen <zumeng.chen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

58117672

Revert "net: core: maybe return -EEXIST in __dev_alloc_name" · 029b6d14

Johannes Berg authored Dec 02, 2017

This reverts commit d6f295e9; some userspace (in the case
we noticed it's wpa_supplicant), is relying on the current
error code to determine that a fixed name interface already
exists.
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

029b6d14

nfp: fix port stats for mac representors · 42d779ff

Pieter Jansen van Vuuren authored Dec 01, 2017

Previously we swapped the tx_packets, tx_bytes and tx_dropped counters
with rx_packets, rx_bytes and rx_dropped counters, respectively. This
behaviour is correct and expected for VF representors but it should not
be swapped for physical port mac representors.

Fixes: eadfa4c3 ("nfp: add stats and xmit helpers for representors")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

42d779ff

Revert "tcp: must block bh in __inet_twsk_hashdance()" · e599ea14

Eric Dumazet authored Dec 01, 2017

We had to disable BH _before_ calling __inet_twsk_hashdance() in commit
cfac7f83 ("tcp/dccp: block bh before arming time_wait timer").

This means we can revert 614bdd4d ("tcp: must block bh in
__inet_twsk_hashdance()").
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e599ea14

04 Dec, 2017 3 commits

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2391f0b4

Linus Torvalds authored Dec 04, 2017

Pull virtio fixes from Michael Tsirkin:
 "virtio and qemu bugfixes

  A couple of bugfixes that just became ready"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio_balloon: fix increment of vb->num_pfns in fill_balloon()
  virtio: release virtio index when fail to device_register
  fw_cfg: fix driver remove

2391f0b4

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 236fa078

Linus Torvalds authored Dec 04, 2017

Pull networking fixes from David Miller:

 1) Various TCP control block fixes, including one that crashes with
    SELinux, from David Ahern and Eric Dumazet.

 2) Fix ACK generation in rxrpc, from David Howells.

 3) ipvlan doesn't set the mark properly in the ipv4 route lookup key,
    from Gao Feng.

 4) SIT configuration doesn't take on the frag_off ipv4 field
    configuration properly, fix from Hangbin Liu.

 5) TSO can fail after device down/up on stmmac, fix from Lars Persson.

 6) Various bpftool fixes (mostly in JSON handling) from Quentin Monnet.

 7) Various SKB leak fixes in vhost/tun/tap (mostly observed as
    performance problems). From Wei Xu.

 8) mvpps's TX descriptors were not zero initialized, from Yan Markman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
  tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
  rxrpc: Fix the MAINTAINERS record
  rxrpc: Use correct netns source in rxrpc_release_sock()
  liquidio: fix incorrect indentation of assignment statement
  stmmac: reset last TSO segment size after device open
  ipvlan: Add the skb->mark as flow4's member to lookup route
  s390/qeth: build max size GSO skbs on L2 devices
  s390/qeth: fix GSO throughput regression
  s390/qeth: fix thinko in IPv4 multicast address tracking
  tap: free skb if flags error
  tun: free skb in early errors
  vhost: fix skb leak in handle_rx()
  bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()
  bnxt_en: fix dst/src fid for vxlan encap/decap actions
  bnxt_en: wildcard smac while creating tunnel decap filter
  bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown
  phylink: ensure we take the link down when phylink_stop() is called
  sfp: warn about modules requiring address change sequence
  sfp: improve RX_LOS handling
  ...

236fa078

arch/tile: mark as orphaned · 8ee5ad1d

Chris Metcalf authored Dec 04, 2017

The chip family of TILEPro and TILE-Gx was developed by Tilera, which
was eventually acquired by Mellanox.  The tile architecture was added to
the kernel in 2010 and first appeared in 2.6.36.

Now at Mellanox we are developing new chips based on the ARM64
architecture; our last TILE-Gx chip (the Gx72) was released in 2013, and
our customers using tile architecture products are not, as far as we
know, looking to upgrade to newer kernel releases.  In the absence of
someone in the community stepping up to take over maintainership, this
commit marks the architecture as orphaned.

Cc: Chris Metcalf <metcalf@alum.mit.edu>
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8ee5ad1d

03 Dec, 2017 23 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · c2eb6d07

David S. Miller authored Dec 03, 2017

Daniel Borkmann says:

====================
pull-request: bpf 2017-12-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a compilation warning in xdp redirect tracepoint due to
   missing bpf.h include that pulls in struct bpf_map, from Xie.

2) Limit the maximum number of attachable BPF progs for a given
   perf event as long as uabi is not frozen yet. The hard upper
   limit is now 64 and therefore the same as with BPF multi-prog
   for cgroups. Also add related error checking for the sample
   BPF loader when enabling and attaching to the perf event, from
   Yonghong.

3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
   case, so that the test case can always pass and not fail in
   some environments due to too low default limit, also from
   Yonghong.

4) Fix up a missing license header comment for kernel/bpf/offload.c,
   from Jakub.

5) Several fixes for bpftool, among others a crash on incorrect
   arguments when json output is used, error message handling
   fixes on unknown options and proper destruction of json writer
   for some exit cases, all from Quentin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c2eb6d07

Merge branch 'tcp-cb-selinux-corruption' · e4485c74

David S. Miller authored Dec 03, 2017

Eric Dumazet says:

====================
tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()

James Morris reported kernel stack corruption bug that
we tracked back to commit 971f10ec ("tcp: better TCP_SKB_CB
layout to reduce cache line misses")

First patch needs to be backported to kernels >= 3.18,
while second patch needs to be backported to kernels >= 4.9, since
this was the time when inet_exact_dif_match appeared.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

e4485c74

tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match() · b4d1605a

David Ahern authored Dec 03, 2017

After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
socket lookups happen while skb->cb[] has not been mangled yet by TCP.

Fixes: a04a480d ("net: Require exact match for TCP socket lookups if dif is l3mdev")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b4d1605a

tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb() · eeea10b8

Eric Dumazet authored Dec 03, 2017

James Morris reported kernel stack corruption bug [1] while
running the SELinux testsuite, and bisected to a recent
commit bffa72cf ("net: sk_buff rbnode reorg")

We believe this commit is fine, but exposes an older bug.

SELinux code runs from tcp_filter() and might send an ICMP,
expecting IP options to be found in skb->cb[] using regular IPCB placement.

We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.

This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
similar way we added them for IPv6.

[1]
[  339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
[  339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
[  339.822505]
[  339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15
[  339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A   01/19/2017
[  339.885060] Call Trace:
[  339.896875]  <IRQ>
[  339.908103]  dump_stack+0x63/0x87
[  339.920645]  panic+0xe8/0x248
[  339.932668]  ? ip_push_pending_frames+0x33/0x40
[  339.946328]  ? icmp_send+0x525/0x530
[  339.958861]  ? kfree_skbmem+0x60/0x70
[  339.971431]  __stack_chk_fail+0x1b/0x20
[  339.984049]  icmp_send+0x525/0x530
[  339.996205]  ? netlbl_skbuff_err+0x36/0x40
[  340.008997]  ? selinux_netlbl_err+0x11/0x20
[  340.021816]  ? selinux_socket_sock_rcv_skb+0x211/0x230
[  340.035529]  ? security_sock_rcv_skb+0x3b/0x50
[  340.048471]  ? sk_filter_trim_cap+0x44/0x1c0
[  340.061246]  ? tcp_v4_inbound_md5_hash+0x69/0x1b0
[  340.074562]  ? tcp_filter+0x2c/0x40
[  340.086400]  ? tcp_v4_rcv+0x820/0xa20
[  340.098329]  ? ip_local_deliver_finish+0x71/0x1a0
[  340.111279]  ? ip_local_deliver+0x6f/0xe0
[  340.123535]  ? ip_rcv_finish+0x3a0/0x3a0
[  340.135523]  ? ip_rcv_finish+0xdb/0x3a0
[  340.147442]  ? ip_rcv+0x27c/0x3c0
[  340.158668]  ? inet_del_offload+0x40/0x40
[  340.170580]  ? __netif_receive_skb_core+0x4ac/0x900
[  340.183285]  ? rcu_accelerate_cbs+0x5b/0x80
[  340.195282]  ? __netif_receive_skb+0x18/0x60
[  340.207288]  ? process_backlog+0x95/0x140
[  340.218948]  ? net_rx_action+0x26c/0x3b0
[  340.230416]  ? __do_softirq+0xc9/0x26a
[  340.241625]  ? do_softirq_own_stack+0x2a/0x40
[  340.253368]  </IRQ>
[  340.262673]  ? do_softirq+0x50/0x60
[  340.273450]  ? __local_bh_enable_ip+0x57/0x60
[  340.285045]  ? ip_finish_output2+0x175/0x350
[  340.296403]  ? ip_finish_output+0x127/0x1d0
[  340.307665]  ? nf_hook_slow+0x3c/0xb0
[  340.318230]  ? ip_output+0x72/0xe0
[  340.328524]  ? ip_fragment.constprop.54+0x80/0x80
[  340.340070]  ? ip_local_out+0x35/0x40
[  340.350497]  ? ip_queue_xmit+0x15c/0x3f0
[  340.361060]  ? __kmalloc_reserve.isra.40+0x31/0x90
[  340.372484]  ? __skb_clone+0x2e/0x130
[  340.382633]  ? tcp_transmit_skb+0x558/0xa10
[  340.393262]  ? tcp_connect+0x938/0xad0
[  340.403370]  ? ktime_get_with_offset+0x4c/0xb0
[  340.414206]  ? tcp_v4_connect+0x457/0x4e0
[  340.424471]  ? __inet_stream_connect+0xb3/0x300
[  340.435195]  ? inet_stream_connect+0x3b/0x60
[  340.445607]  ? SYSC_connect+0xd9/0x110
[  340.455455]  ? __audit_syscall_entry+0xaf/0x100
[  340.466112]  ? syscall_trace_enter+0x1d0/0x2b0
[  340.476636]  ? __audit_syscall_exit+0x209/0x290
[  340.487151]  ? SyS_connect+0xe/0x10
[  340.496453]  ? do_syscall_64+0x67/0x1b0
[  340.506078]  ? entry_SYSCALL64_slow_path+0x25/0x25

Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: James Morris <james.l.morris@oracle.com>
Tested-by: James Morris <james.l.morris@oracle.com>
Tested-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eeea10b8

Linux 4.15-rc2 · ae64f9bd
Linus Torvalds authored Dec 03, 2017

ae64f9bd

Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · 87fc5c68

Linus Torvalds authored Dec 03, 2017

Pull ARM fix from Russell King:
 "Just one fix this time around, for the late commit in the merge window
  that triggered a problem with qemu. Qemu is apparently also going to
  receive a fix for the discovered issue"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: avoid faulting on qemu

87fc5c68

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · ae4806a3

Linus Torvalds authored Dec 03, 2017

Pull i2c fixes from Wolfram Sang:
 "Here are two bugfixes for I2C, fixing a memleak in the core and irq
  allocation for i801.

  Also three bugfixes for the at24 eeprom driver which Bartosz collected
  while taking over maintainership for this driver"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  eeprom: at24: check at24_read/write arguments
  eeprom: at24: fix reading from 24MAC402/24MAC602
  eeprom: at24: correctly set the size for at24mac402
  i2c: i2c-boardinfo: fix memory leaks on devinfo
  i2c: i801: Fix Failed to allocate irq -2147483648 error

ae4806a3

Merge tag 'hwmon-for-linus-v4.15-rc2' of... · 49a418d7

Linus Torvalds authored Dec 03, 2017

Merge tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Fixes:

   - Drop reference to obsolete maintainer tree

   - Fix overflow bug in pmbus driver

   - Fix SMBUS timeout problem in jc42 driver

  For the SMBUS timeout handling, we had a brief discussion if this
  should be considered a bug fix or a feature. Peter says "it fixes real
  problems where the application misbehave due to faulty content when
  reading from an eeprom", and he needs the patch in his company's v4.14
  images. This is good enough for me and warrants backport to stable
  kernels"

* tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (jc42) optionally try to disable the SMBUS timeout
  hwmon: (pmbus) Use 64bit math for DIRECT format values
  hwmon: Drop reference to Jean's tree

49a418d7

rxrpc: Fix the MAINTAINERS record · bcd1d601

David Howells authored Dec 01, 2017

Fix the MAINTAINERS record so that it's more obvious who the maintainer for
AF_RXRPC is.
Reported-by: Joe Perches <joe@perches.com>
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcd1d601

rxrpc: Use correct netns source in rxrpc_release_sock() · c5012564

David Howells authored Dec 01, 2017

In rxrpc_release_sock() there may be no rx->local value to access, so we
can't unconditionally follow it to the rxrpc network namespace information
to poke the connection reapers.

Instead, use the socket's namespace pointer to find the namespace.

This unfixed code causes the following static checker warning:

	net/rxrpc/af_rxrpc.c:898 rxrpc_release_sock()
	error: we previously assumed 'rx->local' could be null (see line 887)

Fixes: 3d18cbb7 ("rxrpc: Fix conn expiry timers")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c5012564

liquidio: fix incorrect indentation of assignment statement · 886afc1d

Colin Ian King authored Dec 01, 2017

Remove one extraneous level of indentation on assignment statement.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

886afc1d

Merge tag 'linux-can-fixes-for-4.15-20171201' of... · ed75e1ac

David S. Miller authored Dec 03, 2017

Merge tag 'linux-can-fixes-for-4.15-20171201' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2017-12-01

this is a pull for net consisting of nine patches.

The first three patches are by Jimmy Assarsson for the kvaser_usb driver
and add the missing free()s in some error path, a signed/unsigned
comparison and ratelimit the error messages in case of incomplete
messages. Oliver Stäbler's patch for the ti_hecc driver fix the napi
poll function's return value. The return values of the probe function of
the peak_canfd and peak_pci PCI drivers are fixed by Stephane Grosjean's
patch. Two patches by me for the flexcan driver update the
bugs/features/quirks overview table and fix the error state transition
for the VF610 SoC. The two patches by Martin Kelly for the mcba_usb
driver fix a typo and a device disconnect bug.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ed75e1ac

Merge tag 'at24-4.15-fixes-for-wolfram' of... · edef3098

Wolfram Sang authored Dec 02, 2017

Merge tag 'at24-4.15-fixes-for-wolfram' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current

Please consider pulling the following fixes for v4.15. While it doesn't
fix any regression introduced in the v4.15 merge window, we have a
feature in at24 since linux v4.8 - reading the mac address block from
at24mac series - which turned out to be not working.

This pull request contains changes that fix it together with a patch
that hardens the read and write argument sanitization with
out-of-bounds checks that were missing.

edef3098

stmmac: reset last TSO segment size after device open · 45ab4b13

Lars Persson authored Dec 01, 2017

The mss variable tracks the last max segment size sent to the TSO
engine. We do not update the hardware as long as we receive skb:s with
the same value in gso_size.

During a network device down/up cycle (mapped to stmmac_release() and
stmmac_open() callbacks) we issue a reset to the hardware and it
forgets the setting for mss. However we did not zero out our mss
variable so the next transmission of a gso packet happens with an
undefined hardware setting.

This triggers a hang in the TSO engine and eventuelly the netdev
watchdog will bark.

Fixes: f748be53 ("stmmac: support new GMAC4")
Signed-off-by: Lars Persson <larper@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45ab4b13

ipvlan: Add the skb->mark as flow4's member to lookup route · a98a4ebc

Gao Feng authored Dec 01, 2017

Current codes don't use skb->mark to assign flowi4_mark, it would
make the policy route rule with fwmark doesn't work as expected.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a98a4ebc

Merge branch 's390-qeth-fixes' · af57b7ff

David S. Miller authored Dec 02, 2017

Julian Wiedmann says:

====================
s390/qeth: fixes 2017-12-01

please apply the following three fixes for 4.15. These should also go
back to stable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

af57b7ff

s390/qeth: build max size GSO skbs on L2 devices · 0cbff6d4

Julian Wiedmann authored Dec 01, 2017

The current GSO skb size limit was copy&pasted over from the L3 path,
where it is needed due to a TSO limitation.
As L2 devices don't offer TSO support (and thus all GSO skbs are
segmented before they reach the driver), there's no reason to restrict
the stack in how large it may build the GSO skbs.

Fixes: d52aec97 ("qeth: enable scatter/gather in layer 2 mode")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0cbff6d4

s390/qeth: fix GSO throughput regression · 6d69b1f1

Julian Wiedmann authored Dec 01, 2017

Using GSO with small MTUs currently results in a substantial throughput
regression - which is caused by how qeth needs to map non-linear skbs
into its IO buffer elements:
compared to a linear skb, each GSO-segmented skb effectively consumes
twice as many buffer elements (ie two instead of one) due to the
additional header-only part. This causes the Output Queue to be
congested with low-utilized IO buffers.

Fix this as follows:
If the MSS is low enough so that a non-SG GSO segmentation produces
order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is
where we anticipate the biggest savings, since an SG-enabled
GSO segmentation produces skbs that always consume at least two
buffer elements.

Larger MSS values continue to get a SG-enabled GSO segmentation, since
1) the relative overhead of the additional header-only buffer element
becomes less noticeable, and
2) the linearization overhead increases.

With the throughput regression fixed, re-enable NETIF_F_SG by default to
reap the significant CPU savings of GSO.

Fixes: 5722963a ("qeth: do not turn on SG per default")
Reported-by: Nils Hoppmann <niho@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d69b1f1

s390/qeth: fix thinko in IPv4 multicast address tracking · bc3ab705

Julian Wiedmann authored Dec 01, 2017

Commit 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
reworked how secondary addresses are managed for qeth devices.
Instead of dropping & subsequently re-adding all addresses on every
ndo_set_rx_mode() call, qeth now keeps track of the addresses that are
currently registered with the HW.
On a ndo_set_rx_mode(), we thus only need to do (de-)registration
requests for the addresses that have actually changed.

On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong
hashtable - and thus never finds a match. As a result, we first delete
*all* such addresses, and then re-add them again. So each set_rx_mode()
causes a short period where the IPv4 Multicast addresses are not
registered, and the card stops forwarding inbound traffic for them.

Fix this by setting the ->is_multicast flag on the lookup object, thus
enabling qeth_l3_ip_from_hash() to search the correct hashtable and
find a match there.

Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bc3ab705

Merge branch 'vhost-skb-leaks' · 7344ba03

David S. Miller authored Dec 02, 2017

Wei Xu says:

====================
vhost: fix a few skb leaks

Matthew found a roughly 40% tcp throughput regression with commit
c67df11f(vhost_net: try batch dequing from skb array) as discussed
in the following thread:
https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html

v4:
- fix zero iov iterator count in tap/tap_do_read()(Jason)
- don't put tun in case of EBADFD(Jason)
- Replace msg->msg_control with new 'skb' when calling tun/tap_do_read()

v3:
- move freeing skb from vhost to tun/tap recvmsg() to not
  confuse the callers.

v2:
- add Matthew as the reporter, thanks matthew.
- moving zero headcount check ahead instead of defer consuming skb
  due to jason and mst's comment.
- add freeing skb in favor of recvmsg() fails.
====================
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7344ba03

tap: free skb if flags error · 61d78537

Wei Xu authored Dec 01, 2017

tap_recvmsg() supports accepting skb by msg_control after
commit 3b4ba04a ("tap: support receiving skb from msg_control"),
the skb if presented should be freed within the function, otherwise
it would be leaked.
Signed-off-by: Wei Xu <wexu@redhat.com>
Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61d78537

tun: free skb in early errors · c33ee15b

Wei Xu authored Dec 01, 2017

tun_recvmsg() supports accepting skb by msg_control after
commit ac77cfd4 ("tun: support receiving skb through msg_control"),
the skb if presented should be freed no matter how far it can go
along, otherwise it would be leaked.

This patch fixes several missed cases.
Signed-off-by: Wei Xu <wexu@redhat.com>
Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c33ee15b

vhost: fix skb leak in handle_rx() · 6e474083

Wei Xu authored Dec 01, 2017

Matthew found a roughly 40% tcp throughput regression with commit
c67df11f(vhost_net: try batch dequing from skb array) as discussed
in the following thread:
https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html

Eventually we figured out that it was a skb leak in handle_rx()
when sending packets to the VM. This usually happens when a guest
can not drain out vq as fast as vhost fills in, afterwards it sets
off the traffic jam and leaks skb(s) which occurs as no headcount
to send on the vq from vhost side.

This can be avoided by making sure we have got enough headcount
before actually consuming a skb from the batched rx array while
transmitting, which is simply done by moving checking the zero
headcount a bit ahead.
Signed-off-by: Wei Xu <wexu@redhat.com>
Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6e474083