1. 05 Dec, 2017 17 commits
    • Eric Dumazet's avatar
      net: remove hlist_nulls_add_tail_rcu() · d7efc6c1
      Eric Dumazet authored
      Alexander Potapenko reported use of uninitialized memory [1]
      
      This happens when inserting a request socket into TCP ehash,
      in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized.
      
      Bug was added by commit d894ba18 ("soreuseport: fix ordering for
      mixed v4/v6 sockets")
      
      Note that d296ba60 ("soreuseport: Resolve merge conflict for v4/v6
      ordering fix") missed the opportunity to get rid of
      hlist_nulls_add_tail_rcu() :
      
      Both UDP sockets and TCP/DCCP listeners no longer use
      __sk_nulls_add_node_rcu() for their hash insertion.
      
      Since all other sockets have unique 4-tuple, the reuseport status
      has no special meaning, so we can always use hlist_nulls_add_head_rcu()
      for them and save few cycles/instructions.
      
      [1]
      
      ==================================================================
      BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x185/0x1d0 lib/dump_stack.c:52
       kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016
       __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766
       __sk_nulls_add_node_rcu ./include/net/sock.h:684
       inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413
       reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754
       inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765
       tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414
       tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314
       tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917
       tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483
       tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763
       ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216
       NF_HOOK ./include/linux/netfilter.h:248
       ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257
       dst_input ./include/net/dst.h:477
       ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397
       NF_HOOK ./include/linux/netfilter.h:248
       ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488
       __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298
       __netif_receive_skb net/core/dev.c:4336
       netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497
       napi_skb_finish net/core/dev.c:4858
       napi_gro_receive+0x629/0xa50 net/core/dev.c:4889
       e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018
       e1000_clean_rx_irq+0x1492/0x1d30
      drivers/net/ethernet/intel/e1000/e1000_main.c:4474
       e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819
       napi_poll net/core/dev.c:5500
       net_rx_action+0x73c/0x1820 net/core/dev.c:5566
       __do_softirq+0x4b4/0x8dd kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364
       irq_exit+0x203/0x240 kernel/softirq.c:405
       exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638
       do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263
       common_interrupt+0x86/0x86
      
      Fixes: d894ba18 ("soreuseport: fix ordering for mixed v4/v6 sockets")
      Fixes: d296ba60 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7efc6c1
    • David S. Miller's avatar
      Merge branch 'rmnet-Fix-leaks-in-failure-scenarios' · a5266440
      David S. Miller authored
      Subash Abhinov Kasiviswanathan says:
      
      ====================
      net: qualcomm: rmnet: Fix leaks in failure scenarios
      
      Patch 1 fixes a leak in transmit path where a skb cannot be
      transmitted due to insufficient headroom to stamp the map header.
      Patch 2 fixes a leak in rmnet_newlink() failure because the
      rmnet endpoint was never freed
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5266440
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Fix leak in device creation failure · 6296928f
      Subash Abhinov Kasiviswanathan authored
      If the rmnet device creation fails in the newlink either while
      registering with the physical device or after subsequent
      operations, the rmnet endpoint information is never freed.
      
      Fixes: ceed73a2 ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6296928f
    • Subash Abhinov Kasiviswanathan's avatar
      net: qualcomm: rmnet: Fix leak on transmit failure · c20a5487
      Subash Abhinov Kasiviswanathan authored
      If a skb in transmit path does not have sufficient headroom to add
      the map header, the skb is not sent out and is never freed.
      
      Fixes: ceed73a2 ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c20a5487
    • Stefan Hajnoczi's avatar
      VSOCK: fix outdated sk_state value in hvs_release() · c9d3fe9d
      Stefan Hajnoczi authored
      Since commit 3b4477d2 ("VSOCK: use TCP
      state constants for sk_state") VSOCK has used TCP_* constants for
      sk_state.
      
      Commit b4562ca7 ("hv_sock: add locking
      in the open/close/release code paths") reintroduced the SS_DISCONNECTING
      constant.
      
      This patch replaces the old SS_DISCONNECTING with the new TCP_CLOSING
      constant.
      
      CC: Dexuan Cui <decui@microsoft.com>
      CC: Cathy Avery <cavery@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarJorgen Hansen <jhansen@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9d3fe9d
    • Jon Maloy's avatar
      tipc: fix memory leak in tipc_accept_from_sock() · a7d5f107
      Jon Maloy authored
      When the function tipc_accept_from_sock() fails to create an instance of
      struct tipc_subscriber it omits to free the already created instance of
      struct tipc_conn instance before it returns.
      
      We fix that with this commit.
      Reported-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7d5f107
    • Cong Wang's avatar
      tipc: fix a null pointer deref on error path · 672ecbe1
      Cong Wang authored
      In tipc_topsrv_kern_subscr() when s->tipc_conn_new() fails
      we call tipc_close_conn() to clean up, but in this case
      calling conn_put() is just enough.
      
      This fixes the folllowing crash:
      
       kasan: GPF could be caused by NULL-ptr deref or user memory access
       general protection fault: 0000 [#1] SMP KASAN
       Dumping ftrace buffer:
          (ftrace buffer empty)
       Modules linked in:
       CPU: 0 PID: 3085 Comm: syzkaller064164 Not tainted 4.15.0-rc1+ #137
       Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       task: 00000000c24413a5 task.stack: 000000005e8160b5
       RIP: 0010:__lock_acquire+0xd55/0x47f0 kernel/locking/lockdep.c:3378
       RSP: 0018:ffff8801cb5474a8 EFLAGS: 00010002
       RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
       RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff85ecb400
       RBP: ffff8801cb547830 R08: 0000000000000001 R09: 0000000000000000
       R10: 0000000000000000 R11: ffffffff87489d60 R12: ffff8801cd2980c0
       R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000020
       FS:  00000000014ee880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007ffee2426e40 CR3: 00000001cb85a000 CR4: 00000000001406f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4004
        __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
        _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:175
        spin_lock_bh include/linux/spinlock.h:320 [inline]
        tipc_subscrb_subscrp_delete+0x8f/0x470 net/tipc/subscr.c:201
        tipc_subscrb_delete net/tipc/subscr.c:238 [inline]
        tipc_subscrb_release_cb+0x17/0x30 net/tipc/subscr.c:316
        tipc_close_conn+0x171/0x270 net/tipc/server.c:204
        tipc_topsrv_kern_subscr+0x724/0x810 net/tipc/server.c:514
        tipc_group_create+0x702/0x9c0 net/tipc/group.c:184
        tipc_sk_join net/tipc/socket.c:2747 [inline]
        tipc_setsockopt+0x249/0xc10 net/tipc/socket.c:2861
        SYSC_setsockopt net/socket.c:1851 [inline]
        SyS_setsockopt+0x189/0x360 net/socket.c:1830
        entry_SYSCALL_64_fastpath+0x1f/0x96
      
      Fixes: 14c04493 ("tipc: add ability to order and receive topology events in driver")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      672ecbe1
    • David S. Miller's avatar
      Merge branch 'sh_eth-dma-mapping-fixes' · a6cec1f5
      David S. Miller authored
      Thomas Petazzoni says:
      
      ====================
      net: sh_eth: DMA mapping API fixes
      
      Here are two patches that fix how the sh_eth driver is using the DMA
      mapping API: a bogus struct device is used in some places, or a NULL
      struct device is used.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6cec1f5
    • Thomas Petazzoni's avatar
      net: sh_eth: don't use NULL as "struct device" for the DMA mapping API · 573500db
      Thomas Petazzoni authored
      Using NULL as argument for the DMA mapping API is bogus, as the DMA
      mapping API may use information from the "struct device" to perform
      the DMA mapping operation. Therefore, pass the appropriate "struct
      device".
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Acked-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      573500db
    • Thomas Petazzoni's avatar
      net: sh_eth: use correct "struct device" when calling DMA mapping functions · 22c1aed4
      Thomas Petazzoni authored
      There are two types of "struct device": the one representing the
      physical device on its physical bus (platform, SPI, PCI, etc.), and
      the one representing the logical device in its device class (net,
      etc.).
      
      The DMA mapping API expects to receive as argument a "struct device"
      representing the physical device, as the "struct device" contains
      information about the bus that the DMA API needs.
      
      However, the sh_eth driver mistakenly uses the "struct device"
      representing the logical device (embedded in "struct net_device")
      rather than the "struct device" representing the physical device on
      its bus.
      
      This commit fixes that by adjusting all calls to the DMA mapping API.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Acked-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22c1aed4
    • David S. Miller's avatar
      Merge branch 'RED-qdisc-fixes' · c1d69de9
      David S. Miller authored
      Nogah Frankel says:
      
      ====================
      RED qdisc fixes
      
      Add some input validation checks to RED qdisc.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1d69de9
    • Nogah Frankel's avatar
      net_sched: red: Avoid illegal values · 8afa10cb
      Nogah Frankel authored
      Check the qmin & qmax values doesn't overflow for the given Wlog value.
      Check that qmin <= qmax.
      
      Fixes: a7834745 ("[PKT_SCHED]: Generic RED layer")
      Signed-off-by: default avatarNogah Frankel <nogahf@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8afa10cb
    • Nogah Frankel's avatar
      net_sched: red: Avoid devision by zero · 5c472203
      Nogah Frankel authored
      Do not allow delta value to be zero since it is used as a divisor.
      
      Fixes: 8af2a218 ("sch_red: Adaptative RED AQM")
      Signed-off-by: default avatarNogah Frankel <nogahf@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c472203
    • Zumeng Chen's avatar
      gianfar: fix a flooded alignment reports because of padding issue. · 58117672
      Zumeng Chen authored
      According to LS1021A RM, the value of PAL can be set so that the start of the
      IP header in the receive data buffer is aligned to a 32-bit boundary. Normally,
      setting PAL = 2 provides minimal padding to ensure such alignment of the IP
      header.
      
      However every incoming packet's 8-byte time stamp will be inserted into the
      packet data buffer as padding alignment bytes when hardware time stamping is
      enabled.
      
      So we set the padding 8+2 here to avoid the flooded alignment faults:
      
      root@128:~# cat /proc/cpu/alignment
      User:           0
      System:         17539 (inet_gro_receive+0x114/0x2c0)
      Skipped:        0
      Half:           0
      Word:           0
      DWord:          0
      Multi:          17539
      User faults:    2 (fixup)
      
      Also shown when exception report enablement
      
      CPU: 0 PID: 161 Comm: irq/66-eth1_g0_ Not tainted 4.1.21-rt13-WR8.0.0.0_preempt-rt #16
      Hardware name: Freescale LS1021A
      [<8001b420>] (unwind_backtrace) from [<8001476c>] (show_stack+0x20/0x24)
      [<8001476c>] (show_stack) from [<807cfb48>] (dump_stack+0x94/0xac)
      [<807cfb48>] (dump_stack) from [<80025d70>] (do_alignment+0x720/0x958)
      [<80025d70>] (do_alignment) from [<80009224>] (do_DataAbort+0x40/0xbc)
      [<80009224>] (do_DataAbort) from [<80015398>] (__dabt_svc+0x38/0x60)
      Exception stack(0x86ad1cc0 to 0x86ad1d08)
      1cc0: f9b3e080 86b3d072 2d78d287 00000000 866816c0 86b3d05e 86e785d0 00000000
      1ce0: 00000011 0000000e 80840ab0 86ad1d3c 86ad1d08 86ad1d08 806d7fc0 806d806c
      1d00: 40070013 ffffffff
      [<80015398>] (__dabt_svc) from [<806d806c>] (inet_gro_receive+0x114/0x2c0)
      [<806d806c>] (inet_gro_receive) from [<80660eec>] (dev_gro_receive+0x21c/0x3c0)
      [<80660eec>] (dev_gro_receive) from [<8066133c>] (napi_gro_receive+0x44/0x17c)
      [<8066133c>] (napi_gro_receive) from [<804f0538>] (gfar_clean_rx_ring+0x39c/0x7d4)
      [<804f0538>] (gfar_clean_rx_ring) from [<804f0bf4>] (gfar_poll_rx_sq+0x58/0xe0)
      [<804f0bf4>] (gfar_poll_rx_sq) from [<80660b10>] (net_rx_action+0x27c/0x43c)
      [<80660b10>] (net_rx_action) from [<80033638>] (do_current_softirqs+0x1e0/0x3dc)
      [<80033638>] (do_current_softirqs) from [<800338c4>] (__local_bh_enable+0x90/0xa8)
      [<800338c4>] (__local_bh_enable) from [<8008025c>] (irq_forced_thread_fn+0x70/0x84)
      [<8008025c>] (irq_forced_thread_fn) from [<800805e8>] (irq_thread+0x16c/0x244)
      [<800805e8>] (irq_thread) from [<8004e490>] (kthread+0xe8/0x104)
      [<8004e490>] (kthread) from [<8000fda8>] (ret_from_fork+0x14/0x2c)
      Signed-off-by: default avatarZumeng Chen <zumeng.chen@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58117672
    • Johannes Berg's avatar
      Revert "net: core: maybe return -EEXIST in __dev_alloc_name" · 029b6d14
      Johannes Berg authored
      This reverts commit d6f295e9; some userspace (in the case
      we noticed it's wpa_supplicant), is relying on the current
      error code to determine that a fixed name interface already
      exists.
      Reported-by: default avatarJouni Malinen <j@w1.fi>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      029b6d14
    • Pieter Jansen van Vuuren's avatar
      nfp: fix port stats for mac representors · 42d779ff
      Pieter Jansen van Vuuren authored
      Previously we swapped the tx_packets, tx_bytes and tx_dropped counters
      with rx_packets, rx_bytes and rx_dropped counters, respectively. This
      behaviour is correct and expected for VF representors but it should not
      be swapped for physical port mac representors.
      
      Fixes: eadfa4c3 ("nfp: add stats and xmit helpers for representors")
      Signed-off-by: default avatarPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42d779ff
    • Eric Dumazet's avatar
      Revert "tcp: must block bh in __inet_twsk_hashdance()" · e599ea14
      Eric Dumazet authored
      We had to disable BH _before_ calling __inet_twsk_hashdance() in commit
      cfac7f83 ("tcp/dccp: block bh before arming time_wait timer").
      
      This means we can revert 614bdd4d ("tcp: must block bh in
      __inet_twsk_hashdance()").
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e599ea14
  2. 04 Dec, 2017 3 commits
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2391f0b4
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "virtio and qemu bugfixes
      
        A couple of bugfixes that just became ready"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_balloon: fix increment of vb->num_pfns in fill_balloon()
        virtio: release virtio index when fail to device_register
        fw_cfg: fix driver remove
      2391f0b4
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 236fa078
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Various TCP control block fixes, including one that crashes with
          SELinux, from David Ahern and Eric Dumazet.
      
       2) Fix ACK generation in rxrpc, from David Howells.
      
       3) ipvlan doesn't set the mark properly in the ipv4 route lookup key,
          from Gao Feng.
      
       4) SIT configuration doesn't take on the frag_off ipv4 field
          configuration properly, fix from Hangbin Liu.
      
       5) TSO can fail after device down/up on stmmac, fix from Lars Persson.
      
       6) Various bpftool fixes (mostly in JSON handling) from Quentin Monnet.
      
       7) Various SKB leak fixes in vhost/tun/tap (mostly observed as
          performance problems). From Wei Xu.
      
       8) mvpps's TX descriptors were not zero initialized, from Yan Markman.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
        tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match()
        tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
        rxrpc: Fix the MAINTAINERS record
        rxrpc: Use correct netns source in rxrpc_release_sock()
        liquidio: fix incorrect indentation of assignment statement
        stmmac: reset last TSO segment size after device open
        ipvlan: Add the skb->mark as flow4's member to lookup route
        s390/qeth: build max size GSO skbs on L2 devices
        s390/qeth: fix GSO throughput regression
        s390/qeth: fix thinko in IPv4 multicast address tracking
        tap: free skb if flags error
        tun: free skb in early errors
        vhost: fix skb leak in handle_rx()
        bnxt_en: Fix a variable scoping in bnxt_hwrm_do_send_msg()
        bnxt_en: fix dst/src fid for vxlan encap/decap actions
        bnxt_en: wildcard smac while creating tunnel decap filter
        bnxt_en: Need to unconditionally shut down RoCE in bnxt_shutdown
        phylink: ensure we take the link down when phylink_stop() is called
        sfp: warn about modules requiring address change sequence
        sfp: improve RX_LOS handling
        ...
      236fa078
    • Chris Metcalf's avatar
      arch/tile: mark as orphaned · 8ee5ad1d
      Chris Metcalf authored
      The chip family of TILEPro and TILE-Gx was developed by Tilera, which
      was eventually acquired by Mellanox.  The tile architecture was added to
      the kernel in 2010 and first appeared in 2.6.36.
      
      Now at Mellanox we are developing new chips based on the ARM64
      architecture; our last TILE-Gx chip (the Gx72) was released in 2013, and
      our customers using tile architecture products are not, as far as we
      know, looking to upgrade to newer kernel releases.  In the absence of
      someone in the community stepping up to take over maintainership, this
      commit marks the architecture as orphaned.
      
      Cc: Chris Metcalf <metcalf@alum.mit.edu>
      Signed-off-by: default avatarChris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ee5ad1d
  3. 03 Dec, 2017 20 commits
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · c2eb6d07
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2017-12-02
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix a compilation warning in xdp redirect tracepoint due to
         missing bpf.h include that pulls in struct bpf_map, from Xie.
      
      2) Limit the maximum number of attachable BPF progs for a given
         perf event as long as uabi is not frozen yet. The hard upper
         limit is now 64 and therefore the same as with BPF multi-prog
         for cgroups. Also add related error checking for the sample
         BPF loader when enabling and attaching to the perf event, from
         Yonghong.
      
      3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
         case, so that the test case can always pass and not fail in
         some environments due to too low default limit, also from
         Yonghong.
      
      4) Fix up a missing license header comment for kernel/bpf/offload.c,
         from Jakub.
      
      5) Several fixes for bpftool, among others a crash on incorrect
         arguments when json output is used, error message handling
         fixes on unknown options and proper destruction of json writer
         for some exit cases, all from Quentin.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2eb6d07
    • David S. Miller's avatar
      Merge branch 'tcp-cb-selinux-corruption' · e4485c74
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()
      
      James Morris reported kernel stack corruption bug that
      we tracked back to commit 971f10ec ("tcp: better TCP_SKB_CB
      layout to reduce cache line misses")
      
      First patch needs to be backported to kernels >= 3.18,
      while second patch needs to be backported to kernels >= 4.9, since
      this was the time when inet_exact_dif_match appeared.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4485c74
    • David Ahern's avatar
      tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match() · b4d1605a
      David Ahern authored
      After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
      socket lookups happen while skb->cb[] has not been mangled yet by TCP.
      
      Fixes: a04a480d ("net: Require exact match for TCP socket lookups if dif is l3mdev")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4d1605a
    • Eric Dumazet's avatar
      tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb() · eeea10b8
      Eric Dumazet authored
      James Morris reported kernel stack corruption bug [1] while
      running the SELinux testsuite, and bisected to a recent
      commit bffa72cf ("net: sk_buff rbnode reorg")
      
      We believe this commit is fine, but exposes an older bug.
      
      SELinux code runs from tcp_filter() and might send an ICMP,
      expecting IP options to be found in skb->cb[] using regular IPCB placement.
      
      We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.
      
      This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
      similar way we added them for IPv6.
      
      [1]
      [  339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
      [  339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
      [  339.822505]
      [  339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15
      [  339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A   01/19/2017
      [  339.885060] Call Trace:
      [  339.896875]  <IRQ>
      [  339.908103]  dump_stack+0x63/0x87
      [  339.920645]  panic+0xe8/0x248
      [  339.932668]  ? ip_push_pending_frames+0x33/0x40
      [  339.946328]  ? icmp_send+0x525/0x530
      [  339.958861]  ? kfree_skbmem+0x60/0x70
      [  339.971431]  __stack_chk_fail+0x1b/0x20
      [  339.984049]  icmp_send+0x525/0x530
      [  339.996205]  ? netlbl_skbuff_err+0x36/0x40
      [  340.008997]  ? selinux_netlbl_err+0x11/0x20
      [  340.021816]  ? selinux_socket_sock_rcv_skb+0x211/0x230
      [  340.035529]  ? security_sock_rcv_skb+0x3b/0x50
      [  340.048471]  ? sk_filter_trim_cap+0x44/0x1c0
      [  340.061246]  ? tcp_v4_inbound_md5_hash+0x69/0x1b0
      [  340.074562]  ? tcp_filter+0x2c/0x40
      [  340.086400]  ? tcp_v4_rcv+0x820/0xa20
      [  340.098329]  ? ip_local_deliver_finish+0x71/0x1a0
      [  340.111279]  ? ip_local_deliver+0x6f/0xe0
      [  340.123535]  ? ip_rcv_finish+0x3a0/0x3a0
      [  340.135523]  ? ip_rcv_finish+0xdb/0x3a0
      [  340.147442]  ? ip_rcv+0x27c/0x3c0
      [  340.158668]  ? inet_del_offload+0x40/0x40
      [  340.170580]  ? __netif_receive_skb_core+0x4ac/0x900
      [  340.183285]  ? rcu_accelerate_cbs+0x5b/0x80
      [  340.195282]  ? __netif_receive_skb+0x18/0x60
      [  340.207288]  ? process_backlog+0x95/0x140
      [  340.218948]  ? net_rx_action+0x26c/0x3b0
      [  340.230416]  ? __do_softirq+0xc9/0x26a
      [  340.241625]  ? do_softirq_own_stack+0x2a/0x40
      [  340.253368]  </IRQ>
      [  340.262673]  ? do_softirq+0x50/0x60
      [  340.273450]  ? __local_bh_enable_ip+0x57/0x60
      [  340.285045]  ? ip_finish_output2+0x175/0x350
      [  340.296403]  ? ip_finish_output+0x127/0x1d0
      [  340.307665]  ? nf_hook_slow+0x3c/0xb0
      [  340.318230]  ? ip_output+0x72/0xe0
      [  340.328524]  ? ip_fragment.constprop.54+0x80/0x80
      [  340.340070]  ? ip_local_out+0x35/0x40
      [  340.350497]  ? ip_queue_xmit+0x15c/0x3f0
      [  340.361060]  ? __kmalloc_reserve.isra.40+0x31/0x90
      [  340.372484]  ? __skb_clone+0x2e/0x130
      [  340.382633]  ? tcp_transmit_skb+0x558/0xa10
      [  340.393262]  ? tcp_connect+0x938/0xad0
      [  340.403370]  ? ktime_get_with_offset+0x4c/0xb0
      [  340.414206]  ? tcp_v4_connect+0x457/0x4e0
      [  340.424471]  ? __inet_stream_connect+0xb3/0x300
      [  340.435195]  ? inet_stream_connect+0x3b/0x60
      [  340.445607]  ? SYSC_connect+0xd9/0x110
      [  340.455455]  ? __audit_syscall_entry+0xaf/0x100
      [  340.466112]  ? syscall_trace_enter+0x1d0/0x2b0
      [  340.476636]  ? __audit_syscall_exit+0x209/0x290
      [  340.487151]  ? SyS_connect+0xe/0x10
      [  340.496453]  ? do_syscall_64+0x67/0x1b0
      [  340.506078]  ? entry_SYSCALL64_slow_path+0x25/0x25
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJames Morris <james.l.morris@oracle.com>
      Tested-by: default avatarJames Morris <james.l.morris@oracle.com>
      Tested-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeea10b8
    • Linus Torvalds's avatar
      Linux 4.15-rc2 · ae64f9bd
      Linus Torvalds authored
      ae64f9bd
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · 87fc5c68
      Linus Torvalds authored
      Pull ARM fix from Russell King:
       "Just one fix this time around, for the late commit in the merge window
        that triggered a problem with qemu. Qemu is apparently also going to
        receive a fix for the discovered issue"
      
      * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: avoid faulting on qemu
      87fc5c68
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · ae4806a3
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Here are two bugfixes for I2C, fixing a memleak in the core and irq
        allocation for i801.
      
        Also three bugfixes for the at24 eeprom driver which Bartosz collected
        while taking over maintainership for this driver"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        eeprom: at24: check at24_read/write arguments
        eeprom: at24: fix reading from 24MAC402/24MAC602
        eeprom: at24: correctly set the size for at24mac402
        i2c: i2c-boardinfo: fix memory leaks on devinfo
        i2c: i801: Fix Failed to allocate irq -2147483648 error
      ae4806a3
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-linus-v4.15-rc2' of... · 49a418d7
      Linus Torvalds authored
      Merge tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
       "Fixes:
      
         - Drop reference to obsolete maintainer tree
      
         - Fix overflow bug in pmbus driver
      
         - Fix SMBUS timeout problem in jc42 driver
      
        For the SMBUS timeout handling, we had a brief discussion if this
        should be considered a bug fix or a feature. Peter says "it fixes real
        problems where the application misbehave due to faulty content when
        reading from an eeprom", and he needs the patch in his company's v4.14
        images. This is good enough for me and warrants backport to stable
        kernels"
      
      * tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (jc42) optionally try to disable the SMBUS timeout
        hwmon: (pmbus) Use 64bit math for DIRECT format values
        hwmon: Drop reference to Jean's tree
      49a418d7
    • David Howells's avatar
      rxrpc: Fix the MAINTAINERS record · bcd1d601
      David Howells authored
      Fix the MAINTAINERS record so that it's more obvious who the maintainer for
      AF_RXRPC is.
      Reported-by: default avatarJoe Perches <joe@perches.com>
      Reported-by: default avatarDavid Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcd1d601
    • David Howells's avatar
      rxrpc: Use correct netns source in rxrpc_release_sock() · c5012564
      David Howells authored
      In rxrpc_release_sock() there may be no rx->local value to access, so we
      can't unconditionally follow it to the rxrpc network namespace information
      to poke the connection reapers.
      
      Instead, use the socket's namespace pointer to find the namespace.
      
      This unfixed code causes the following static checker warning:
      
      	net/rxrpc/af_rxrpc.c:898 rxrpc_release_sock()
      	error: we previously assumed 'rx->local' could be null (see line 887)
      
      Fixes: 3d18cbb7 ("rxrpc: Fix conn expiry timers")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5012564
    • Colin Ian King's avatar
      liquidio: fix incorrect indentation of assignment statement · 886afc1d
      Colin Ian King authored
      Remove one extraneous level of indentation on assignment statement.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      886afc1d
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-4.15-20171201' of... · ed75e1ac
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-4.15-20171201' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2017-12-01
      
      this is a pull for net consisting of nine patches.
      
      The first three patches are by Jimmy Assarsson for the kvaser_usb driver
      and add the missing free()s in some error path, a signed/unsigned
      comparison and ratelimit the error messages in case of incomplete
      messages. Oliver Stäbler's patch for the ti_hecc driver fix the napi
      poll function's return value. The return values of the probe function of
      the peak_canfd and peak_pci PCI drivers are fixed by Stephane Grosjean's
      patch. Two patches by me for the flexcan driver update the
      bugs/features/quirks overview table and fix the error state transition
      for the VF610 SoC. The two patches by Martin Kelly for the mcba_usb
      driver fix a typo and a device disconnect bug.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed75e1ac
    • Wolfram Sang's avatar
      Merge tag 'at24-4.15-fixes-for-wolfram' of... · edef3098
      Wolfram Sang authored
      Merge tag 'at24-4.15-fixes-for-wolfram' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current
      
      Please consider pulling the following fixes for v4.15. While it doesn't
      fix any regression introduced in the v4.15 merge window, we have a
      feature in at24 since linux v4.8 - reading the mac address block from
      at24mac series - which turned out to be not working.
      
      This pull request contains changes that fix it together with a patch
      that hardens the read and write argument sanitization with
      out-of-bounds checks that were missing.
      edef3098
    • Lars Persson's avatar
      stmmac: reset last TSO segment size after device open · 45ab4b13
      Lars Persson authored
      The mss variable tracks the last max segment size sent to the TSO
      engine. We do not update the hardware as long as we receive skb:s with
      the same value in gso_size.
      
      During a network device down/up cycle (mapped to stmmac_release() and
      stmmac_open() callbacks) we issue a reset to the hardware and it
      forgets the setting for mss. However we did not zero out our mss
      variable so the next transmission of a gso packet happens with an
      undefined hardware setting.
      
      This triggers a hang in the TSO engine and eventuelly the netdev
      watchdog will bark.
      
      Fixes: f748be53 ("stmmac: support new GMAC4")
      Signed-off-by: default avatarLars Persson <larper@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45ab4b13
    • Gao Feng's avatar
      ipvlan: Add the skb->mark as flow4's member to lookup route · a98a4ebc
      Gao Feng authored
      Current codes don't use skb->mark to assign flowi4_mark, it would
      make the policy route rule with fwmark doesn't work as expected.
      Signed-off-by: default avatarGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a98a4ebc
    • David S. Miller's avatar
      Merge branch 's390-qeth-fixes' · af57b7ff
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2017-12-01
      
      please apply the following three fixes for 4.15. These should also go
      back to stable.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af57b7ff
    • Julian Wiedmann's avatar
      s390/qeth: build max size GSO skbs on L2 devices · 0cbff6d4
      Julian Wiedmann authored
      The current GSO skb size limit was copy&pasted over from the L3 path,
      where it is needed due to a TSO limitation.
      As L2 devices don't offer TSO support (and thus all GSO skbs are
      segmented before they reach the driver), there's no reason to restrict
      the stack in how large it may build the GSO skbs.
      
      Fixes: d52aec97 ("qeth: enable scatter/gather in layer 2 mode")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cbff6d4
    • Julian Wiedmann's avatar
      s390/qeth: fix GSO throughput regression · 6d69b1f1
      Julian Wiedmann authored
      Using GSO with small MTUs currently results in a substantial throughput
      regression - which is caused by how qeth needs to map non-linear skbs
      into its IO buffer elements:
      compared to a linear skb, each GSO-segmented skb effectively consumes
      twice as many buffer elements (ie two instead of one) due to the
      additional header-only part. This causes the Output Queue to be
      congested with low-utilized IO buffers.
      
      Fix this as follows:
      If the MSS is low enough so that a non-SG GSO segmentation produces
      order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is
      where we anticipate the biggest savings, since an SG-enabled
      GSO segmentation produces skbs that always consume at least two
      buffer elements.
      
      Larger MSS values continue to get a SG-enabled GSO segmentation, since
      1) the relative overhead of the additional header-only buffer element
      becomes less noticeable, and
      2) the linearization overhead increases.
      
      With the throughput regression fixed, re-enable NETIF_F_SG by default to
      reap the significant CPU savings of GSO.
      
      Fixes: 5722963a ("qeth: do not turn on SG per default")
      Reported-by: default avatarNils Hoppmann <niho@de.ibm.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d69b1f1
    • Julian Wiedmann's avatar
      s390/qeth: fix thinko in IPv4 multicast address tracking · bc3ab705
      Julian Wiedmann authored
      Commit 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      reworked how secondary addresses are managed for qeth devices.
      Instead of dropping & subsequently re-adding all addresses on every
      ndo_set_rx_mode() call, qeth now keeps track of the addresses that are
      currently registered with the HW.
      On a ndo_set_rx_mode(), we thus only need to do (de-)registration
      requests for the addresses that have actually changed.
      
      On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong
      hashtable - and thus never finds a match. As a result, we first delete
      *all* such addresses, and then re-add them again. So each set_rx_mode()
      causes a short period where the IPv4 Multicast addresses are not
      registered, and the card stops forwarding inbound traffic for them.
      
      Fix this by setting the ->is_multicast flag on the lookup object, thus
      enabling qeth_l3_ip_from_hash() to search the correct hashtable and
      find a match there.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc3ab705
    • David S. Miller's avatar
      Merge branch 'vhost-skb-leaks' · 7344ba03
      David S. Miller authored
      Wei Xu says:
      
      ====================
      vhost: fix a few skb leaks
      
      Matthew found a roughly 40% tcp throughput regression with commit
      c67df11f(vhost_net: try batch dequing from skb array) as discussed
      in the following thread:
      https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html
      
      v4:
      - fix zero iov iterator count in tap/tap_do_read()(Jason)
      - don't put tun in case of EBADFD(Jason)
      - Replace msg->msg_control with new 'skb' when calling tun/tap_do_read()
      
      v3:
      - move freeing skb from vhost to tun/tap recvmsg() to not
        confuse the callers.
      
      v2:
      - add Matthew as the reporter, thanks matthew.
      - moving zero headcount check ahead instead of defer consuming skb
        due to jason and mst's comment.
      - add freeing skb in favor of recvmsg() fails.
      ====================
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Tested-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7344ba03