1. 20 Sep, 2018 5 commits
    • Sudarsana Reddy Kalluru's avatar
      qed: Do not add VLAN 0 tag to untagged frames in multi-function mode. · 0216da94
      Sudarsana Reddy Kalluru authored
      In certain multi-function switch dependent modes, firmware adds vlan tag 0
      to the untagged frames. This leads to double tagging for the traffic
      if the dcbx is enabled, which is not the desired behavior. To avoid this,
      driver needs to set "dcb_dont_add_vlan0" flag.
      
      Fixes: cac6f691 ("qed: Add support for Unified Fabric Port")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0216da94
    • Sudarsana Reddy Kalluru's avatar
      qed: Fix populating the invalid stag value in multi function mode. · 50fdf601
      Sudarsana Reddy Kalluru authored
      In multi-function mode, driver receives the stag value (outer vlan)
      for a PF from management FW (MFW). If the stag value is negotiated prior to
      the driver load, then the stag is not notified to the driver and hence
      driver will have the invalid stag value.
      The fix is to request the MFW for STAG value during the driver load time.
      
      Fixes: cac6f691 ("qed: Add support for Unified Fabric Port")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50fdf601
    • Antoine Tenart's avatar
      net: mvneta: fix the Rx desc buffer DMA unmapping · cf5cca6e
      Antoine Tenart authored
      With CONFIG_DMA_API_DEBUG enabled we now get a warning when using the
      mvneta driver:
      
        mvneta d0030000.ethernet: DMA-API: device driver frees DMA memory with
        wrong function [device address=0x000000001165b000] [size=4096 bytes]
        [mapped as page] [unmapped as single]
      
      This is because when using the s/w buffer management, the Rx descriptor
      buffer is mapped with dma_map_page but unmapped with dma_unmap_single.
      This patch fixes this by using the right unmapping function.
      
      Fixes: 562e2f46 ("net: mvneta: Improve the buffer allocation method for SWBM")
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf5cca6e
    • Paolo Abeni's avatar
      ip6_tunnel: be careful when accessing the inner header · 76c0ddd8
      Paolo Abeni authored
      the ip6 tunnel xmit ndo assumes that the processed skb always
      contains an ip[v6] header, but syzbot has found a way to send
      frames that fall short of this assumption, leading to the following splat:
      
      BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307
      [inline]
      BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0
      net/ipv6/ip6_tunnel.c:1390
      CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x185/0x1d0 lib/dump_stack.c:53
        kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
        __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
        ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline]
        ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390
        __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
        netdev_start_xmit include/linux/netdevice.h:4075 [inline]
        xmit_one net/core/dev.c:3026 [inline]
        dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
        __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
        dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
        packet_snd net/packet/af_packet.c:2944 [inline]
        packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
        __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
        SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
        SyS_sendmmsg+0x63/0x90 net/socket.c:2162
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x441819
      RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
      RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003
      RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510
      R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
        kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
        kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
        slab_post_alloc_hook mm/slab.h:445 [inline]
        slab_alloc_node mm/slub.c:2737 [inline]
        __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:984 [inline]
        alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
        sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
        packet_alloc_skb net/packet/af_packet.c:2803 [inline]
        packet_snd net/packet/af_packet.c:2894 [inline]
        packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
        __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
        SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
        SyS_sendmmsg+0x63/0x90 net/socket.c:2162
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      This change addresses the issue adding the needed check before
      accessing the inner header.
      
      The ipv4 side of the issue is apparently there since the ipv4 over ipv6
      initial support, and the ipv6 side predates git history.
      
      Fixes: c4d3efaf ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76c0ddd8
    • David S. Miller's avatar
      Merge tag 'batadv-net-for-davem-20180919' of git://git.open-mesh.org/linux-merge · aa86b03c
      David S. Miller authored
      Simon Wunderlich says:
      
      ====================
      pull request for net: batman-adv 2018-09-19
      
      here are some bugfixes which we would like to see integrated into net.
      
      We forgot to bump the version number in the last round for net-next, so
      the belated patch to do that is included - we hope you can adopt it.
      This will most likely create a merge conflict later when merging into
      net-next with this rounds net-next patchset, but net-next should keep
      the 2018.4 version[1].
      
      [1] resolution:
      
      --- a/net/batman-adv/main.h
      +++ b/net/batman-adv/main.h
      @@ -25,11 +25,7 @@
       #define BATADV_DRIVER_DEVICE "batman-adv"
      
       #ifndef BATADV_SOURCE_VERSION
      -<<<<<<<
      -#define BATADV_SOURCE_VERSION "2018.3"
      -=======
       #define BATADV_SOURCE_VERSION "2018.4"
      ->>>>>>>
       #endif
      
       /* B.A.T.M.A.N. parameters */
      
      Please pull or let me know of any problem!
      
      Here are some batman-adv bugfixes:
      
       - Avoid ELP information leak, by Sven Eckelmann
      
       - Fix sysfs segfault issues, by Sven Eckelmann (2 patches)
      
       - Fix locking when adding entries in various lists,
         by Sven Eckelmann (5 patches)
      
       - Fix refcount if queue_work() fails, by Marek Lindner (2 patches)
      
       - Fixup forgotten version bump, by Sven Eckelmann
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa86b03c
  2. 19 Sep, 2018 21 commits
    • David S. Miller's avatar
      Merge branch 'ipv6-fix-issues-on-accessing-fib6_metrics' · 69ba423d
      David S. Miller authored
      Wei Wang says:
      
      ====================
      ipv6: fix issues on accessing fib6_metrics
      
      The latest fix on the memory leak of fib6_metrics still causes
      use-after-free.
      This patch series first revert the previous fix and propose a new fix
      that is more inline with ipv4 logic and is tested to fix the
      use-after-free issue reported.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69ba423d
    • Wei Wang's avatar
      ipv6: fix memory leak on dst->_metrics · ce7ea4af
      Wei Wang authored
      When dst->_metrics and f6i->fib6_metrics share the same memory, both
      take reference count on the dst_metrics structure. However, when dst is
      destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic()
      which does not take care of READONLY metrics and does not release refcnt.
      This causes memory leak.
      Similar to ipv4 logic, the fix is to properly release refcnt and free
      the memory space pointed by dst->_metrics if refcnt becomes 0.
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce7ea4af
    • Wei Wang's avatar
      Revert "ipv6: fix double refcount of fib6_metrics" · 86758605
      Wei Wang authored
      This reverts commit e70a3aad.
      
      This change causes use-after-free on dst->_metrics.
      The crash trace looks like this:
      [   97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140
      [   97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801
      
      [   97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11
      [   97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018
      [   97.777957] Call Trace:
      [   97.777971]  [<ffffffff895709db>] dump_stack+0x4d/0x72
      [   97.777985]  [<ffffffff881651df>] print_address_description+0x6f/0x260
      [   97.777997]  [<ffffffff88165747>] kasan_report+0x257/0x370
      [   97.778001]  [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140
      [   97.778004]  [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20
      [   97.778008]  [<ffffffff894488e6>] ip6_mtu+0x116/0x140
      [   97.778013]  [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280
      [   97.778016]  [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0
      [   97.778022]  [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0
      [   97.778037]  [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0
      [   97.778040]  [<ffffffff881643b1>] ? save_stack+0xb1/0xd0
      [   97.778046]  [<ffffffff89264c82>] tcp_send_mss+0x22/0x220
      [   97.778059]  [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0
      [   97.778062]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
      [   97.778066]  [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60
      [   97.778070]  [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280
      [   97.778075]  [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430
      [   97.778078]  [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0
      [   97.778082]  [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140
      [   97.778085]  [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20
      [   97.778088]  [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50
      [   97.778092]  [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50
      [   97.778098]  [<ffffffff89352d43>] inet_sendmsg+0x103/0x480
      [   97.778102]  [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0
      [   97.778105]  [<ffffffff890294da>] sock_sendmsg+0xba/0xf0
      [   97.778108]  [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0
      [   97.778113]  [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0
      [   97.778116]  [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0
      [   97.778119]  [<ffffffff881646d1>] ? memset+0x31/0x40
      [   97.778123]  [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380
      [   97.778127]  [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250
      [   97.778130]  [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180
      [   97.778133]  [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200
      [   97.778137]  [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0
      [   97.778141]  [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190
      [   97.778144]  [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190
      [   97.778147]  [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20
      [   97.778152]  [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0
      [   97.778155]  [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190
      [   97.778158]  [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20
      [   97.778162]  [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430
      [   97.778166]  [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0
      [   97.778171]  [<ffffffff8960131f>] ? page_fault+0x2f/0x50
      [   97.778174]  [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [   97.778177] RIP: 0033:0x7f83fa36000d
      [   97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      [   97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d
      [   97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036
      [   97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168
      [   97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40
      [   97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000
      
      [   97.779684] Allocated by task 5919:
      [   97.783185]  save_stack+0x46/0xd0
      [   97.783187]  kasan_kmalloc+0xad/0xe0
      [   97.783189]  kmem_cache_alloc_trace+0xdf/0x580
      [   97.783190]  ip6_convert_metrics.isra.79+0x7e/0x190
      [   97.783192]  ip6_route_info_create+0x60a/0x2480
      [   97.783193]  ip6_route_add+0x1d/0x80
      [   97.783195]  inet6_rtm_newroute+0xdd/0xf0
      [   97.783198]  rtnetlink_rcv_msg+0x641/0xb10
      [   97.783200]  netlink_rcv_skb+0x27b/0x3e0
      [   97.783202]  rtnetlink_rcv+0x15/0x20
      [   97.783203]  netlink_unicast+0x4be/0x720
      [   97.783204]  netlink_sendmsg+0x7bc/0xbf0
      [   97.783205]  sock_sendmsg+0xba/0xf0
      [   97.783207]  ___sys_sendmsg+0x6ca/0x8e0
      [   97.783208]  __sys_sendmsg+0xe6/0x190
      [   97.783209]  SyS_sendmsg+0x13/0x20
      [   97.783211]  do_syscall_64+0x2ac/0x430
      [   97.783213]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      [   97.784709] Freed by task 0:
      [   97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist
      [   97.794497]  save_stack+0x46/0xd0
      [   97.794499]  kasan_slab_free+0x71/0xc0
      [   97.794500]  kfree+0x7c/0xf0
      [   97.794501]  fib6_info_destroy_rcu+0x24f/0x310
      [   97.794504]  rcu_process_callbacks+0x38b/0x1730
      [   97.794506]  __do_softirq+0x1c8/0x5d0
      Reported-by: default avatarJohn Sperbeck <jsperbeck@google.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86758605
    • Russell King's avatar
      sfp: fix oops with ethtool -m · 126d6848
      Russell King authored
      If a network interface is created prior to the SFP socket being
      available, ethtool can request module information.  This unfortunately
      leads to an oops:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000008
      pgd = (ptrval)
      [00000008] *pgd=7c400831, *pte=00000000, *ppte=00000000
      Internal error: Oops: 17 [#1] SMP ARM
      Modules linked in:
      CPU: 0 PID: 1480 Comm: ethtool Not tainted 4.19.0-rc3 #138
      Hardware name: Broadcom Northstar Plus SoC
      PC is at sfp_get_module_info+0x8/0x10
      LR is at dev_ethtool+0x218c/0x2afc
      
      Fix this by not filling in the network device's SFP bus pointer until
      SFP is fully bound, thereby avoiding the core calling into the SFP bus
      code.
      
      Fixes: ce0aa27f ("sfp: add sfp-bus to bridge between network devices and sfp cages")
      Reported-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      126d6848
    • Antoine Tenart's avatar
      net: mvpp2: fix a txq_done race condition · 774268f3
      Antoine Tenart authored
      When no Tx IRQ is available, the txq_done() routine (called from
      tx_done()) shouldn't be called from the polling function, as in such
      case it is already called in the Tx path thanks to an hrtimer. This
      mostly occurred when using PPv2.1, as the engine then do not have Tx
      IRQs.
      
      Fixes: edc660fa ("net: mvpp2: replace TX coalescing interrupts with hrtimer")
      Reported-by: default avatarStefan Chulski <stefanc@marvell.com>
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      774268f3
    • David S. Miller's avatar
      Merge branch 'net-smc-fixes' · 81d0b759
      David S. Miller authored
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2018-09-18
      
      here are some fixes in different areas of the smc code for the net
      tree.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81d0b759
    • YueHaibing's avatar
      net/smc: fix sizeof to int comparison · 38189779
      YueHaibing authored
      Comparing an int to a size, which is unsigned, causes the int to become
      unsigned, giving the wrong result. kernel_sendmsg can return a negative
      error code.
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38189779
    • Karsten Graul's avatar
      net/smc: no urgent data check for listen sockets · 71d117f5
      Karsten Graul authored
      Don't check a listen socket for pending urgent data in smc_poll().
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71d117f5
    • Ursula Braun's avatar
      net/smc: enable fallback for connection abort in state INIT · dd65d87a
      Ursula Braun authored
      If a linkgroup is terminated abnormally already due to failing
      LLC CONFIRM LINK or LLC ADD LINK, fallback to TCP is still possible.
      In this case do not switch to state SMC_PEERABORTWAIT and do not set
      sk_err.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd65d87a
    • Ursula Braun's avatar
      net/smc: remove duplicate mutex_unlock · 1ca52fcf
      Ursula Braun authored
      For a failing smc_listen_rdma_finish() smc_listen_decline() is
      called. If fallback is possible, the new socket is already enqueued
      to be accepted in smc_listen_decline(). Avoid enqueuing a second time
      afterwards in this case, otherwise the smc_create_lgr_pending lock
      is released twice:
      [  373.463976] WARNING: bad unlock balance detected!
      [  373.463978] 4.18.0-rc7+ #123 Tainted: G           O
      [  373.463979] -------------------------------------
      [  373.463980] kworker/1:1/30 is trying to release lock (smc_create_lgr_pending) at:
      [  373.463990] [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc]
      [  373.463991] but there are no more locks to release!
      [  373.463991]
      other info that might help us debug this:
      [  373.463993] 2 locks held by kworker/1:1/30:
      [  373.463994]  #0: 00000000772cbaed ((wq_completion)"events"){+.+.}, at: process_one_work+0x1ec/0x6b0
      [  373.464000]  #1: 000000003ad0894a ((work_completion)(&new_smc->smc_listen_work)){+.+.}, at: process_one_work+0x1ec/0x6b0
      [  373.464003]
      stack backtrace:
      [  373.464005] CPU: 1 PID: 30 Comm: kworker/1:1 Kdump: loaded Tainted: G           O      4.18.0-rc7uschi+ #123
      [  373.464007] Hardware name: IBM 2827 H43 738 (LPAR)
      [  373.464010] Workqueue: events smc_listen_work [smc]
      [  373.464011] Call Trace:
      [  373.464015] ([<0000000000114100>] show_stack+0x60/0xd8)
      [  373.464019]  [<0000000000a8c9bc>] dump_stack+0x9c/0xd8
      [  373.464021]  [<00000000001dcaf8>] print_unlock_imbalance_bug+0xf8/0x108
      [  373.464022]  [<00000000001e045c>] lock_release+0x114/0x4f8
      [  373.464025]  [<0000000000aa87fa>] __mutex_unlock_slowpath+0x4a/0x300
      [  373.464027]  [<000003ff801205fc>] smc_listen_work+0x22c/0x5d0 [smc]
      [  373.464029]  [<0000000000197a68>] process_one_work+0x2a8/0x6b0
      [  373.464030]  [<0000000000197ec2>] worker_thread+0x52/0x410
      [  373.464033]  [<000000000019fd0e>] kthread+0x15e/0x178
      [  373.464035]  [<0000000000aaf58a>] kernel_thread_starter+0x6/0xc
      [  373.464052]  [<0000000000aaf584>] kernel_thread_starter+0x0/0xc
      [  373.464054] INFO: lockdep is turned off.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ca52fcf
    • Ursula Braun's avatar
      net/smc: fix non-blocking connect problem · 648a5a7a
      Ursula Braun authored
      In state SMC_INIT smc_poll() delegates polling to the internal
      CLC socket. This means, once the connect worker has finished
      its kernel_connect() step, the poll wake-up may occur. This is not
      intended. The wake-up should occur from the wake up call in
      smc_connect_work() after __smc_connect() has finished.
      Thus in state SMC_INIT this patch now calls sock_poll_wait() on the
      main SMC socket.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      648a5a7a
    • Kazuya Mizuguchi's avatar
      ravb: do not write 1 to reserved bits · 2fe397a3
      Kazuya Mizuguchi authored
      EtherAVB hardware requires 0 to be written to status register bits in
      order to clear them, however, care must be taken not to:
      
      1. Clear other bits, by writing zero to them
      2. Write one to reserved bits
      
      This patch corrects the ravb driver with respect to the second point above.
      This is done by defining reserved bit masks for the affected registers and,
      after auditing the code, ensure all sites that may write a one to a
      reserved bit use are suitably masked.
      Signed-off-by: default avatarKazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
      Signed-off-by: default avatarSimon Horman <horms+renesas@verge.net.au>
      Reviewed-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fe397a3
    • zhong jiang's avatar
      net: bnxt: Fix a uninitialized variable warning. · 65fac4fe
      zhong jiang authored
      Fix the following compile warning:
      
      drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:49:5: warning: ‘nvm_param.dir_type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        if (nvm_param.dir_type == BNXT_NVM_PORT_CFG)
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Acked-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65fac4fe
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2018-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 6344244c
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2018-09-17
      
      Sorry about the previous submission of this series which was mistakenly
      marked for net-next, here I am resending with 'net' mark.
      
      This series provides three fixes to mlx5 core and mlx5e netdevice
      driver.
      
      Please pull and let me know if there's any problem.
      
      For -stable v4.16:
      ('net/mlx5: Check for SQ and not RQ state when modifying hairpin SQ')
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6344244c
    • Christian Lamparter's avatar
      net: emac: fix fixed-link setup for the RTL8363SB switch · 08e39982
      Christian Lamparter authored
      On the Netgear WNDAP620, the emac ethernet isn't receiving nor
      xmitting any frames from/to the RTL8363SB (identifies itself
      as a RTL8367RB).
      
      This is caused by the emac hardware not knowing the forced link
      parameters for speed, duplex, pause, etc.
      
      This begs the question, how this was working on the original
      driver code, when it was necessary to set the phy_address and
      phy_map to 0xffffffff. But I guess without access to the old
      PPC405/440/460 hardware, it's not possible to know.
      Signed-off-by: default avatarChristian Lamparter <chunkeey@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08e39982
    • Suren Baghdasaryan's avatar
      NFC: Fix the number of pipes · e285d5bf
      Suren Baghdasaryan authored
      According to ETSI TS 102 622 specification chapter 4.4 pipe identifier
      is 7 bits long which allows for 128 unique pipe IDs. Because
      NFC_HCI_MAX_PIPES is used as the number of pipes supported and not
      as the max pipe ID, its value should be 128 instead of 127.
      
      nfc_hci_recv_from_llc extracts pipe ID from packet header using
      NFC_HCI_FRAGMENT(0x7F) mask which allows for pipe ID value of 127.
      Same happens when NCI_HCP_MSG_GET_PIPE() is being used. With
      pipes array having only 127 elements and pipe ID of 127 the OOB memory
      access will result.
      
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Cc: Allen Pais <allen.pais@oracle.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Suggested-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e285d5bf
    • Suren Baghdasaryan's avatar
      NFC: Fix possible memory corruption when handling SHDLC I-Frame commands · 674d9de0
      Suren Baghdasaryan authored
      When handling SHDLC I-Frame commands "pipe" field used for indexing
      into an array should be checked before usage. If left unchecked it
      might access memory outside of the array of size NFC_HCI_MAX_PIPES(127).
      
      Malformed NFC HCI frames could be injected by a malicious NFC device
      communicating with the device being attacked (remote attack vector),
      or even by an attacker with physical access to the I2C bus such that
      they could influence the data transfers on that bus (local attack vector).
      skb->data is controlled by the attacker and has only been sanitized in
      the most trivial ways (CRC check), therefore we can consider the
      create_info struct and all of its members to tainted. 'create_info->pipe'
      with max value of 255 (uint8) is used to take an offset of the
      hdev->pipes array of 127 elements which can lead to OOB write.
      
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Cc: Allen Pais <allen.pais@oracle.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Suggested-by: default avatarKevin Deus <kdeus@google.com>
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      674d9de0
    • Sabrina Dubroca's avatar
      selftests: pmtu: properly redirect stderr to /dev/null · 0a286afe
      Sabrina Dubroca authored
      The cleanup function uses "$CMD 2 > /dev/null", which doesn't actually
      send stderr to /dev/null, so when the netns doesn't exist, the error
      message is shown. Use "2> /dev/null" instead, so that those messages
      disappear, as was intended.
      
      Fixes: d1f1b9cb ("selftests: net: Introduce first PMTU test")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a286afe
    • David S. Miller's avatar
      Merge branch 'stmmac-Coalesce-and-tail-addr-fixes' · 87ebcffd
      David S. Miller authored
      Jose Abreu says:
      
      ====================
      net: stmmac: Coalesce and tail addr fixes
      
      The fix for coalesce timer and a fix in tail address setting that impacts
      XGMAC2 operation.
      
      The series is:
      Tested-by: default avatarJerome Brunet <jbrunet@baylibre.com>
      	on a113 s400 board (single queue)
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87ebcffd
    • Jose Abreu's avatar
      net: stmmac: Fixup the tail addr setting in xmit path · 0431100b
      Jose Abreu authored
      Currently we are always setting the tail address of descriptor list to
      the end of the pre-allocated list.
      
      According to databook this is not correct. Tail address should point to
      the last available descriptor + 1, which means we have to update the
      tail address everytime we call the xmit function.
      
      This should make no impact in older versions of MAC but in newer
      versions there are some DMA features which allows the IP to fetch
      descriptors in advance and in a non sequential order so its critical
      that we set the tail address correctly.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Fixes: f748be53 ("stmmac: support new GMAC4")
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0431100b
    • Jose Abreu's avatar
      net: stmmac: Rework coalesce timer and fix multi-queue races · 8fce3331
      Jose Abreu authored
      This follows David Miller advice and tries to fix coalesce timer in
      multi-queue scenarios.
      
      We are now using per-queue coalesce values and per-queue TX timer.
      
      Coalesce timer default values was changed to 1ms and the coalesce frames
      to 25.
      
      Tested in B2B setup between XGMAC2 and GMAC5.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Fixes: 	ce736788 ("net: stmmac: adding multiple buffers for TX")
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Neil Armstrong <narmstrong@baylibre.com>
      Cc: Jerome Brunet <jbrunet@baylibre.com>
      Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8fce3331
  3. 18 Sep, 2018 11 commits
    • Greg Kroah-Hartman's avatar
      Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net · 5211da9c
      Greg Kroah-Hartman authored
      Dave writes:
        "Various fixes, all over the place:
      
         1) OOB data generation fix in bluetooth, from Matias Karhumaa.
      
         2) BPF BTF boundary calculation fix, from Martin KaFai Lau.
      
         3) Don't bug on excessive frags, to be compatible in situations mixing
            older and newer kernels on each end.  From Juergen Gross.
      
         4) Scheduling in RCU fix in hv_netvsc, from Stephen Hemminger.
      
         5) Zero keying information in TLS layer before freeing copies
            of them, from Sabrina Dubroca.
      
         6) Fix NULL deref in act_sample, from Davide Caratti.
      
         7) Orphan SKB before GRO in veth to prevent crashes with XDP,
            from Toshiaki Makita.
      
         8) Fix use after free in ip6_xmit, from Eric Dumazet.
      
         9) Fix VF mac address regression in bnxt_en, from Micahel Chan.
      
         10) Fix MSG_PEEK behavior in TLS layer, from Daniel Borkmann.
      
         11) Programming adjustments to r8169 which fix not being to enter deep
             sleep states on some machines, from Kai-Heng Feng and Hans de
             Goede.
      
         12) Fix DST_NOCOUNT flag handling for ipv6 routes, from Peter
             Oskolkov."
      
      * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (45 commits)
        net/ipv6: do not copy dst flags on rt init
        qmi_wwan: set DTR for modems in forced USB2 mode
        clk: x86: Stop marking clocks as CLK_IS_CRITICAL
        r8169: Get and enable optional ether_clk clock
        clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail
        r8169: enable ASPM on RTL8106E
        r8169: Align ASPM/CLKREQ setting function with vendor driver
        Revert "kcm: remove any offset before parsing messages"
        kcm: remove any offset before parsing messages
        net: ethernet: Fix a unused function warning.
        net: dsa: mv88e6xxx: Fix ATU Miss Violation
        tls: fix currently broken MSG_PEEK behavior
        hv_netvsc: pair VF based on serial number
        PCI: hv: support reporting serial number as slot information
        bnxt_en: Fix VF mac address regression.
        ipv6: fix possible use-after-free in ip6_xmit()
        net: hp100: fix always-true check for link up state
        ARM: dts: at91: add new compatibility string for macb on sama5d3
        net: macb: disable scatter-gather for macb on sama5d3
        net: mvpp2: let phylink manage the carrier state
        ...
      5211da9c
    • Peter Oskolkov's avatar
      net/ipv6: do not copy dst flags on rt init · 30bfd930
      Peter Oskolkov authored
      DST_NOCOUNT in dst_entry::flags tracks whether the entry counts
      toward route cache size (net->ipv6.sysctl.ip6_rt_max_size).
      
      If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented
      in dist_init() and decremented in dst_destroy().
      
      This flag is tied to allocation/deallocation of dst_entry and
      should not be copied from another dst/route. Otherwise it can happen
      that dst_ops::pcpuc_entries counter grows until no new routes can
      be allocated because the counter reached ip6_rt_max_size due to
      DST_NOCOUNT not set and thus no counter decrements on gc-ed routes.
      
      Fixes: 3b6761d1 ("net/ipv6: Move dst flags to booleans in fib entries")
      Cc: David Ahern <dsahern@gmail.com>
      Acked-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30bfd930
    • Bjørn Mork's avatar
      qmi_wwan: set DTR for modems in forced USB2 mode · 922005c7
      Bjørn Mork authored
      Recent firmware revisions have added the ability to force
      these modems to USB2 mode, hiding their SuperSpeed
      capabilities from the host.  The driver has been using the
      SuperSpeed capability, as shown by the bcdUSB field of the
      device descriptor, to detect the need to enable the DTR
      quirk.  This method fails when the modems are forced to
      USB2 mode by the modem firmware.
      
      Fix by unconditionally enabling the DTR quirk for the
      affected device IDs.
      Reported-by: default avatarFred Veldini <fred.veldini@gmail.com>
      Reported-by: default avatarDeshu Wen <dwen@sierrawireless.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Reported-by: default avatarFred Veldini <fred.veldini@gmail.com>
      Reported-by: default avatarDeshu Wen <dwen@sierrawireless.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      922005c7
    • David S. Miller's avatar
      Merge branch 'r8169-clk-fixes' · 89bfd48d
      David S. Miller authored
      Hans de Goede says:
      
      ====================
      r8169 (x86) clk fixes to fix S0ix not being reached
      
      This series adds code to the r8169 ethernet driver to get and enable an
      external clock if present, avoiding the need for a hack in the
      clk-pmc-atom driver where that clock was left on continuesly causing x86
      some devices to not reach deep power saving states (S0ix) when suspended
      causing to them to quickly drain their battery while suspended.
      
      The 3 commits in this series need to be merged in order to avoid
      regressions while bisecting. The clk-pmc-atom driver does not see much
      changes (it was last touched over a year ago). So the clk maintainers
      have agreed with merging all 3 patches through the net tree.
      All 3 patches have Stephen Boyd's Acked-by for this purpose.
      
      This v2 of the series only had some minor tweaks done to the commit
      messages and is ready for merging through the net tree now.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89bfd48d
    • Hans de Goede's avatar
      clk: x86: Stop marking clocks as CLK_IS_CRITICAL · 648e9218
      Hans de Goede authored
      Commit d31fd43c ("clk: x86: Do not gate clocks enabled by the
      firmware"), which added the code to mark clocks as CLK_IS_CRITICAL, causes
      all unclaimed PMC clocks on Cherry Trail devices to be on all the time,
      resulting on the device not being able to reach S0i3 when suspended.
      
      The reason for this commit is that on some Bay Trail / Cherry Trail devices
      the r8169 ethernet controller uses pmc_plt_clk_4. Now that the clk-pmc-atom
      driver exports an "ether_clk" alias for pmc_plt_clk_4 and the r8169 driver
      has been modified to get and enable this clock (if present) the marking of
      the clocks as CLK_IS_CRITICAL is no longer necessary.
      
      This commit removes the CLK_IS_CRITICAL marking, fixing Cherry Trail
      devices not being able to reach S0i3 greatly decreasing their battery
      drain when suspended.
      
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861
      Cc: Johannes Stezenbach <js@sig21.net>
      Cc: Carlo Caione <carlo@endlessm.com>
      Reported-by: default avatarJohannes Stezenbach <js@sig21.net>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: default avatarStephen Boyd <sboyd@kernel.org>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      648e9218
    • Hans de Goede's avatar
      r8169: Get and enable optional ether_clk clock · c2f6f3ee
      Hans de Goede authored
      On some boards a platform clock is used as clock for the r8169 chip,
      this commit adds support for getting and enabling this clock (assuming
      it has an "ether_clk" alias set on it).
      
      This is related to commit d31fd43c ("clk: x86: Do not gate clocks
      enabled by the firmware") which is a previous attempt to fix this for some
      x86 boards, but this causes all Cherry Trail SoC using boards to not reach
      there lowest power states when suspending.
      
      This commit (together with an atom-pmc-clk driver commit adding the alias)
      fixes things properly by making the r8169 get the clock and enable it when
      it needs it.
      
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861
      Cc: Johannes Stezenbach <js@sig21.net>
      Cc: Carlo Caione <carlo@endlessm.com>
      Reported-by: default avatarJohannes Stezenbach <js@sig21.net>
      Acked-by: default avatarStephen Boyd <sboyd@kernel.org>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2f6f3ee
    • Hans de Goede's avatar
      clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail · b1e3454d
      Hans de Goede authored
      Commit d31fd43c ("clk: x86: Do not gate clocks enabled by the
      firmware") causes all unclaimed PMC clocks on Cherry Trail devices to be on
      all the time, resulting on the device not being able to reach S0i2 or S0i3
      when suspended.
      
      The reason for this commit is that on some Bay Trail / Cherry Trail devices
      the ethernet controller uses pmc_plt_clk_4. This commit adds an "ether_clk"
      alias, so that the relevant ethernet drivers can try to (optionally) use
      this, without needing X86 specific code / hacks, thus fixing ethernet on
      these devices without breaking S0i3 support.
      
      This commit uses clkdev_hw_create() to create the alias, mirroring the code
      for the already existing "mclk" alias for pmc_plt_clk_3.
      
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=193891#c102
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=196861
      Cc: Johannes Stezenbach <js@sig21.net>
      Cc: Carlo Caione <carlo@endlessm.com>
      Reported-by: default avatarJohannes Stezenbach <js@sig21.net>
      Acked-by: default avatarStephen Boyd <sboyd@kernel.org>
      Reviewed-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1e3454d
    • Kai-Heng Feng's avatar
      r8169: enable ASPM on RTL8106E · 0866cd15
      Kai-Heng Feng authored
      The Intel SoC was prevented from entering lower idle state because
      of RTL8106E's ASPM was not enabled.
      
      So enable ASPM on RTL8106E (chip version 39).
      Now the Intel SoC can enter lower idle state, power consumption and
      temperature are much lower.
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0866cd15
    • Kai-Heng Feng's avatar
      r8169: Align ASPM/CLKREQ setting function with vendor driver · 94235460
      Kai-Heng Feng authored
      There's a small delay after setting ASPM in vendor drivers, r8101 and
      r8168.
      In addition, those drivers enable ASPM before ClkReq, also change that
      to align with vendor driver.
      
      I haven't seen anything bad becasue of this, but I think it's better to
      keep in sync with vendor driver.
      Signed-off-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94235460
    • David S. Miller's avatar
      Revert "kcm: remove any offset before parsing messages" · 3275b4df
      David S. Miller authored
      This reverts commit 072222b4.
      
      I just read that this causes regressions.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3275b4df
    • Dominique Martinet's avatar
      kcm: remove any offset before parsing messages · 072222b4
      Dominique Martinet authored
      The current code assumes kcm users know they need to look for the
      strparser offset within their bpf program, which is not documented
      anywhere and examples laying around do not do.
      
      The actual recv function does handle the offset well, so we can create a
      temporary clone of the skb and pull that one up as required for parsing.
      
      The pull itself has a cost if we are pulling beyond the head data,
      measured to 2-3% latency in a noisy VM with a local client stressing
      that path. The clone's impact seemed too small to measure.
      
      This bug can be exhibited easily by implementing a "trivial" kcm parser
      taking the first bytes as size, and on the client sending at least two
      such packets in a single write().
      
      Note that bpf sockmap has the same problem, both for parse and for recv,
      so it would pulling twice or a real pull within the strparser logic if
      anyone cares about that.
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      072222b4
  4. 17 Sep, 2018 3 commits