1. 19 Jun, 2018 15 commits
    • Eric Dumazet's avatar
      net/ipv6: respect rcu grace period before freeing fib6_info · 9b0a8da8
      Eric Dumazet authored
      syzbot reported use after free that is caused by fib6_info being
      freed without a proper RCU grace period.
      
      CPU: 0 PID: 1407 Comm: udevd Not tainted 4.17.0+ #39
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       __read_once_size include/linux/compiler.h:188 [inline]
       find_rr_leaf net/ipv6/route.c:705 [inline]
       rt6_select net/ipv6/route.c:761 [inline]
       fib6_table_lookup+0x12b7/0x14d0 net/ipv6/route.c:1823
       ip6_pol_route+0x1c2/0x1020 net/ipv6/route.c:1856
       ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2082
       fib6_rule_lookup+0x211/0x6d0 net/ipv6/fib6_rules.c:122
       ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2110
       ip6_route_output include/net/ip6_route.h:82 [inline]
       icmpv6_xrlim_allow net/ipv6/icmp.c:211 [inline]
       icmp6_send+0x147c/0x2da0 net/ipv6/icmp.c:535
       icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0xa5/0x790 net/ipv6/route.c:2244
       dst_link_failure include/net/dst.h:427 [inline]
       ndisc_error_report+0xd1/0x1c0 net/ipv6/ndisc.c:695
       neigh_invalidate+0x246/0x550 net/core/neighbour.c:892
       neigh_timer_handler+0xaf9/0xde0 net/core/neighbour.c:978
       call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
       run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
       __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1d1/0x200 kernel/softirq.c:404
       exiting_irq arch/x86/include/asm/apic.h:527 [inline]
       smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
       </IRQ>
      RIP: 0010:strlen+0x5e/0xa0 lib/string.c:482
      Code: 24 00 74 3b 48 bb 00 00 00 00 00 fc ff df 4c 89 e0 48 83 c0 01 48 89 c2 48 89 c1 48 c1 ea 03 83 e1 07 0f b6 14 1a 38 ca 7f 04 <84> d2 75 23 80 38 00 75 de 48 83 c4 08 4c 29 e0 5b 41 5c 5d c3 48
      RSP: 0018:ffff8801af117850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      RAX: ffff880197f53bd0 RBX: dffffc0000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff81c5b06c RDI: ffff880197f53bc0
      RBP: ffff8801af117868 R08: ffff88019a976540 R09: 0000000000000000
      R10: ffff88019a976540 R11: 0000000000000000 R12: ffff880197f53bc0
      R13: ffff880197f53bc0 R14: ffffffff899e4e90 R15: ffff8801d91c6a00
       strlen include/linux/string.h:267 [inline]
       getname_kernel+0x24/0x370 fs/namei.c:218
       open_exec+0x17/0x70 fs/exec.c:882
       load_elf_binary+0x968/0x5610 fs/binfmt_elf.c:780
       search_binary_handler+0x17d/0x570 fs/exec.c:1653
       exec_binprm fs/exec.c:1695 [inline]
       __do_execve_file.isra.35+0x16fe/0x2710 fs/exec.c:1819
       do_execveat_common fs/exec.c:1866 [inline]
       do_execve fs/exec.c:1883 [inline]
       __do_sys_execve fs/exec.c:1964 [inline]
       __se_sys_execve fs/exec.c:1959 [inline]
       __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7f1576a46207
      Code: 77 19 f4 48 89 d7 44 89 c0 0f 05 48 3d 00 f0 ff ff 76 e0 f7 d8 64 41 89 01 eb d8 f7 d8 64 41 89 01 eb df b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 00 8c 2d 00 f7 d8 64 89 02
      RSP: 002b:00007ffff2784568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
      RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1576a46207
      RDX: 0000000001215b10 RSI: 00007ffff2784660 RDI: 00007ffff2785670
      RBP: 0000000000625500 R08: 000000000000589c R09: 000000000000589c
      R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001215b10
      R13: 0000000000000007 R14: 0000000001204250 R15: 0000000000000005
      
      Allocated by task 12188:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
       kmalloc include/linux/slab.h:513 [inline]
       kzalloc include/linux/slab.h:706 [inline]
       fib6_info_alloc+0xbb/0x280 net/ipv6/ip6_fib.c:152
       ip6_route_info_create+0x782/0x2b50 net/ipv6/route.c:3013
       ip6_route_add+0x23/0xb0 net/ipv6/route.c:3154
       ipv6_route_ioctl+0x5a5/0x760 net/ipv6/route.c:3660
       inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
       sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
       sock_ioctl+0x30d/0x680 net/socket.c:1097
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:500 [inline]
       do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
       ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
       __do_sys_ioctl fs/ioctl.c:708 [inline]
       __se_sys_ioctl fs/ioctl.c:706 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 1402:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kfree+0xd9/0x260 mm/slab.c:3813
       fib6_info_destroy+0x29b/0x350 net/ipv6/ip6_fib.c:207
       fib6_info_release include/net/ip6_fib.h:286 [inline]
       __ip6_del_rt_siblings net/ipv6/route.c:3235 [inline]
       ip6_route_del+0x11c4/0x13b0 net/ipv6/route.c:3316
       ipv6_route_ioctl+0x616/0x760 net/ipv6/route.c:3663
       inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
       sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
       sock_ioctl+0x30d/0x680 net/socket.c:1097
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:500 [inline]
       do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
       ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
       __do_sys_ioctl fs/ioctl.c:708 [inline]
       __se_sys_ioctl fs/ioctl.c:706 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
       do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801b5df2580
       which belongs to the cache kmalloc-256 of size 256
      The buggy address is located 8 bytes inside of
       256-byte region [ffff8801b5df2580, ffff8801b5df2680)
      The buggy address belongs to the page:
      page:ffffea0006d77c80 count:1 mapcount:0 mapping:ffff8801da8007c0 index:0xffff8801b5df2e40
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006c5cc48 ffffea0007363308 ffff8801da8007c0
      raw: ffff8801b5df2e40 ffff8801b5df2080 0000000100000006 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801b5df2480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b5df2500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      > ffff8801b5df2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
       ffff8801b5df2600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b5df2680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      
      Fixes: a64efe14 ("net/ipv6: introduce fib6_info struct and helpers")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Tested-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b0a8da8
    • Liran Alon's avatar
      net: net_failover: fix typo in net_failover_slave_register() · e5223438
      Liran Alon authored
      Sync both unicast and multicast lists instead of unicast twice.
      
      Fixes: cfc80d9a ("net: Introduce net_failover driver")
      Reviewed-by: default avatarJoao Martins <joao.m.martins@oracle.com>
      Signed-off-by: default avatarLiran Alon <liran.alon@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5223438
    • Xin Long's avatar
      ipvlan: use ETH_MAX_MTU as max mtu · 548feb33
      Xin Long authored
      Similar to the fixes on team and bonding, this restores the ability
      to set an ipvlan device's mtu to anything higher than 1500.
      
      Fixes: 91572088 ("net: use core MTU range checking in core net infra")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      548feb33
    • Stefan Agner's avatar
      net: hamradio: use eth_broadcast_addr · 4e8439aa
      Stefan Agner authored
      The array bpq_eth_addr is only used to get the size of an
      address, whereas the bcast_addr is used to set the broadcast
      address. This leads to a warning when using clang:
      drivers/net/hamradio/bpqether.c:94:13: warning: variable 'bpq_eth_addr' is not
            needed and will not be emitted [-Wunneeded-internal-declaration]
      static char bpq_eth_addr[6];
                  ^
      
      Remove both variables and use the common eth_broadcast_addr
      to set the broadcast address.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e8439aa
    • Govindarajulu Varadarajan's avatar
      enic: initialize enic->rfs_h.lock in enic_probe · 3256d29f
      Govindarajulu Varadarajan authored
      lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without
      initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops
      can be called when interface is down.
      
      Move enic_rfs_flw_tbl_init to enic_probe.
      
      INFO: trying to register non-static key.
      the code is fine but needs lockdep annotation.
      turning off the locking correctness validator.
      CPU: 18 PID: 1189 Comm: ethtool Not tainted 4.17.0-rc7-devel+ #27
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
      Call Trace:
      dump_stack+0x85/0xc0
      register_lock_class+0x550/0x560
      ? __handle_mm_fault+0xa8b/0x1100
      __lock_acquire+0x81/0x670
      lock_acquire+0xb9/0x1e0
      ?  enic_get_rxnfc+0x139/0x2b0 [enic]
      _raw_spin_lock_bh+0x38/0x80
      ? enic_get_rxnfc+0x139/0x2b0 [enic]
      enic_get_rxnfc+0x139/0x2b0 [enic]
      ethtool_get_rxnfc+0x8d/0x1c0
      dev_ethtool+0x16c8/0x2400
      ? __mutex_lock+0x64d/0xa00
      ? dev_load+0x6a/0x150
      dev_ioctl+0x253/0x4b0
      sock_do_ioctl+0x9a/0x130
      sock_ioctl+0x1af/0x350
      do_vfs_ioctl+0x8e/0x670
      ? syscall_trace_enter+0x1e2/0x380
      ksys_ioctl+0x60/0x90
      __x64_sys_ioctl+0x16/0x20
      do_syscall_64+0x5a/0x170
      entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: default avatarGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3256d29f
    • David S. Miller's avatar
      Merge branch 'NCSI-silence-warning-messages' · d1a65e21
      David S. Miller authored
      Joel Stanley says:
      
      ====================
      Slience NCSI logging
      
      v2:
        Fix indent issue and commit message based on Joe's feedback
        Add Sam's acks
      
      Here are three changes to silence unnecessary warnings in the ncsi code.
      
      The final patch adds Sam as the maintainer for NCSI.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1a65e21
    • Joel Stanley's avatar
      MAINTAINERS: Add Sam as the maintainer for NCSI · 01a21986
      Joel Stanley authored
      Sam has been handing the maintenance of NCSI for a number release cycles
      now.
      Acked-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01a21986
    • Joel Stanley's avatar
      net/ncsi: Use netdev_dbg for debug messages · 6e42a3f5
      Joel Stanley authored
      This moves all of the netdev_printk(KERN_DEBUG, ...) messages over to
      netdev_dbg.
      
      As Joe explains:
      
      > netdev_dbg is not included in object code unless
      > DEBUG is defined or CONFIG_DYNAMIC_DEBUG is set.
      > And then, it is not emitted into the log unless
      > DEBUG is set or this specific netdev_dbg is enabled
      > via the dynamic debug control file.
      
      Which is what we're after in this case.
      Acked-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e42a3f5
    • Joel Stanley's avatar
      net/ncsi: Drop no more channels message · 5d3b1467
      Joel Stanley authored
      This does not provide useful information. As the ncsi maintainer said:
      
       > either we get a channel or broadcom has gone out to lunch
      Acked-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d3b1467
    • Joel Stanley's avatar
      net/ncsi: Silence debug messages · 87975a01
      Joel Stanley authored
      In normal operation we see this series of messages as the host drives
      the network device:
      
       ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
       ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
       ftgmac100 1e660000.ethernet eth0: NCSI interface down
       ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
       ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI interface up
       ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
       ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
       ftgmac100 1e660000.ethernet eth0: NCSI interface down
       ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
       ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
       ftgmac100 1e660000.ethernet eth0: NCSI interface up
      
      This makes all of these messages netdev_dbg. They are still useful to
      debug eg. misbehaving network device firmware, but we do not need them
      filling up the kernel logs in normal operation.
      Acked-by: default avatarSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: default avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87975a01
    • Daniel Borkmann's avatar
      bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize · c51818d5
      Daniel Borkmann authored
      Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start))
      is clearly wrong since I40E_SKB_PAD already points to the offset where
      the original xdp->data was sitting since xdp->data_hard_start is defined
      as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD
      when build skb is used.
      
      However, also before cc5b114d ("bpf, i40e: add meta data support")
      this seems broken since bpf_xdp_adjust_head() helper could have been used
      to alter headroom and enlarge / shrink the frame and with that the assumption
      that the xdp->data remains unchanged does not hold and would push a bogus
      packet to upper stack.
      
      ixgbe got this right in 92470808 ("ixgbe: add XDP support for pass and
      drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both
      skb_reserve() and truesize calculation.
      
      Fixes: cc5b114d ("bpf, i40e: add meta data support")
      Fixes: 0c8493d9 ("i40e: add XDP support for pass and drop actions")
      Reported-by: default avatarKeith Busch <keith.busch@linux.intel.com>
      Reported-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Björn Töpel <bjorn.topel@intel.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Tested-by: default avatarKeith Busch <keith.busch@linux.intel.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c51818d5
    • David S. Miller's avatar
      Merge branch 'qed-fixes' · d563e7a2
      David S. Miller authored
      Sudarsana Reddy Kalluru says:
      
      ====================
      qed*: Fix series.
      
      The patch series fixes few issues in the qed/qede drivers.
      Please consider applying this series to "net".
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d563e7a2
    • Sudarsana Reddy Kalluru's avatar
      qed: Do not advertise DCBX_LLD_MANAGED capability. · ff54d5cd
      Sudarsana Reddy Kalluru authored
      Do not advertise DCBX_LLD_MANAGED capability i.e., do not allow
      external agent to manage the dcbx/lldp negotiation. MFW acts as lldp agent
      for qed* devices, and no other lldp agent is allowed to coexist with mfw.
      
      Also updated a debug print, to not to display the redundant info.
      
      Fixes: a1d8d8a5 ("qed: Add dcbnl support.")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff54d5cd
    • Sudarsana Reddy Kalluru's avatar
      qed: Add sanity check for SIMD fastpath handler. · 3935a709
      Sudarsana Reddy Kalluru authored
      Avoid calling a SIMD fastpath handler if it is NULL. The check is needed
      to handle an unlikely scenario where unsolicited interrupt is destined to
      a PF in INTa mode.
      
      Fixes: fe56b9e6 ("qed: Add module with basic common support")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3935a709
    • Sudarsana Reddy Kalluru's avatar
      qed: Fix possible memory leak in Rx error path handling. · 4f9de4df
      Sudarsana Reddy Kalluru authored
      Memory for packet buffers need to be freed in the error paths as there is
      no consumer (e.g., upper layer) for such packets and that memory will never
      get freed.
      The issue was uncovered when port was attacked with flood of isatap
      packets, these are multicast packets hence were directed at all the PFs.
      For foce PF, this meant they were routed to the ll2 module which in turn
      drops such packets.
      
      Fixes: 0a7fb11c ("qed: Add Light L2 support")
      Signed-off-by: default avatarSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: default avatarAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: default avatarMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f9de4df
  2. 16 Jun, 2018 7 commits
    • Konstantin Khlebnikov's avatar
      net_sched: blackhole: tell upper qdisc about dropped packets · 7e85dc8c
      Konstantin Khlebnikov authored
      When blackhole is used on top of classful qdisc like hfsc it breaks
      qlen and backlog counters because packets are disappear without notice.
      
      In HFSC non-zero qlen while all classes are inactive triggers warning:
      WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
      and schedules watchdog work endlessly.
      
      This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
      this flag tells upper layer: this packet is gone and isn't queued.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e85dc8c
    • David S. Miller's avatar
      bluetooth: hci_nokia: Don't include linux/unaligned/le_struct.h directly. · a9122886
      David S. Miller authored
      This breaks the build as this header is not meant to be used in this
      way.
      
      ./include/linux/unaligned/access_ok.h:8:28: error: redefinition of ‘get_unaligned_le16’
       static __always_inline u16 get_unaligned_le16(const void *p)
                                  ^~~~~~~~~~~~~~~~~~
      In file included from drivers/bluetooth/hci_nokia.c:32:
      ./include/linux/unaligned/le_struct.h:7:19: note: previous definition of ‘get_unaligned_le16’ was here
       static inline u16 get_unaligned_le16(const void *p)
      
      Use asm/unaligned.h instead.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9122886
    • David Woodhouse's avatar
      atm: Preserve value of skb->truesize when accounting to vcc · 9bbe60a6
      David Woodhouse authored
      ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
      which they are to be sent. But it doesn't take ownership of those
      packets from the sock (if any) which originally owned them. They should
      remain owned by their actual sender until they've left the box.
      
      There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
      for certain skbs, precisely to avoid messing up sk_wmem_alloc
      accounting. Ideally that hack would cover the ATM use case too, but it
      doesn't — skbs which aren't owned by any sock, for example PPP control
      frames, still get their truesize adjusted when the low-level ATM driver
      adds headroom.
      
      This has always been an issue, it seems. The truesize of a packet
      increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
      for normal traffic, only for control frames. So I think we just got away
      with it, and we probably needed to send 2GiB of LCP echo frames before
      the misaccounting would ever have caused a problem and caused
      atm_may_send() to start refusing packets.
      
      Commit 14afee4b ("net: convert sock.sk_wmem_alloc from atomic_t to
      refcount_t") did exactly what it was intended to do, and turned this
      mostly-theoretical problem into a real one, causing PPPoATM to fail
      immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
      starts refusing to allow new packets.
      
      The least intrusive solution to this problem is to stash the value of
      skb->truesize that was accounted to the VCC, in a new member of the
      ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
      value instead of the then-current value of skb->truesize.
      
      Fixes: 158f323b ("net: adjust skb->truesize in pskb_expand_head()")
      Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Tested-by: default avatarKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bbe60a6
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 0841d986
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-06-16
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix a panic in devmap handling in generic XDP where return type
         of __devmap_lookup_elem() got changed recently but generic XDP
         code missed the related update, from Toshiaki.
      
      2) Fix a freeze when BPF progs are loaded that include BPF to BPF
         calls when JIT is enabled where we would later bail out via error
         path w/o dropping kallsyms, and another one to silence syzkaller
         splats from locking prog read-only, from Daniel.
      
      3) Fix a bug in test_offloads.py BPF selftest which must not assume
         that the underlying system have no BPF progs loaded prior to test,
         and one in bpftool to fix accuracy of program load time, from Jakub.
      
      4) Fix a bug in bpftool's probe for availability of the bpf(2)
         BPF_TASK_FD_QUERY subcommand, from Yonghong.
      
      5) Fix a regression in AF_XDP's XDP_SKB receive path where queue
         id check got erroneously removed, from Björn.
      
      6) Fix missing state cleanup in BPF's xfrm tunnel test, from William.
      
      7) Check tunnel type more accurately in BPF's tunnel collect metadata
         kselftest, from Jian.
      
      8) Fix missing Kconfig fragments for BPF kselftests, from Anders.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0841d986
    • Linus Torvalds's avatar
      Merge branch 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 35773c93
      Linus Torvalds authored
      Pull AFS updates from Al Viro:
       "Assorted AFS stuff - ended up in vfs.git since most of that consists
        of David's AFS-related followups to Christoph's procfs series"
      
      * 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        afs: Optimise callback breaking by not repeating volume lookup
        afs: Display manually added cells in dynamic root mount
        afs: Enable IPv6 DNS lookups
        afs: Show all of a server's addresses in /proc/fs/afs/servers
        afs: Handle CONFIG_PROC_FS=n
        proc: Make inline name size calculation automatic
        afs: Implement network namespacing
        afs: Mark afs_net::ws_cell as __rcu and set using rcu functions
        afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
        proc: Add a way to make network proc files writable
        afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
        afs: Rearrange fs/afs/proc.c to move the show routines up
        afs: Rearrange fs/afs/proc.c by moving fops and open functions down
        afs: Move /proc management functions to the end of the file
      35773c93
    • Linus Torvalds's avatar
      Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 29d6849d
      Linus Torvalds authored
      Pull compat updates from Al Viro:
       "Some biarch patches - getting rid of assorted (mis)uses of
        compat_alloc_user_space().
      
        Not much in that area this cycle..."
      
      * 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        orangefs: simplify compat ioctl handling
        signalfd: lift sigmask copyin and size checks to callers of do_signalfd4()
        vmsplice(): lift importing iovec into vmsplice(2) and compat counterpart
      29d6849d
    • Linus Torvalds's avatar
      Merge branch 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · a5b729ea
      Linus Torvalds authored
      Pull aio fixes from Al Viro:
       "Assorted AIO followups and fixes"
      
      * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        eventpoll: switch to ->poll_mask
        aio: only return events requested in poll_mask() for IOCB_CMD_POLL
        eventfd: only return events requested in poll_mask()
        aio: mark __aio_sigset::sigmask const
      a5b729ea
  3. 15 Jun, 2018 18 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 9215310c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Various netfilter fixlets from Pablo and the netfilter team.
      
       2) Fix regression in IPVS caused by lack of PMTU exceptions on local
          routes in ipv6, from Julian Anastasov.
      
       3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.
      
       4) Don't crash on poll in TLS, from Daniel Borkmann.
      
       5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
          including Avahi mDNS. From Bart Van Assche.
      
       6) Missing of_node_put in qcom/emac driver, from Yue Haibing.
      
       7) We lack checking of the TCP checking in one special case during SYN
          receive, from Frank van der Linden.
      
       8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.
      
       9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.
      
      10) Must grab HW caps before doing quirk checks in stmmac driver, from
          Jose Abreu.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
        net: stmmac: Run HWIF Quirks after getting HW caps
        neighbour: skip NTF_EXT_LEARNED entries during forced gc
        net: cxgb3: add error handling for sysfs_create_group
        tls: fix waitall behavior in tls_sw_recvmsg
        tls: fix use-after-free in tls_push_record
        l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
        l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
        mlxsw: spectrum_switchdev: Fix port_vlan refcounting
        mlxsw: spectrum_router: Align with new route replace logic
        mlxsw: spectrum_router: Allow appending to dev-only routes
        ipv6: Only emit append events for appended routes
        stmmac: added support for 802.1ad vlan stripping
        cfg80211: fix rcu in cfg80211_unregister_wdev
        mac80211: Move up init of TXQs
        mac80211_hwsim: fix module init error paths
        cfg80211: initialize sinfo in cfg80211_get_station
        nl80211: fix some kernel doc tag mistakes
        hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
        rds: avoid unenecessary cong_update in loop transport
        l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
        ...
      9215310c
    • Linus Torvalds's avatar
      Merge tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · de7f01c2
      Linus Torvalds authored
      Pull module updates from Jessica Yu:
       "Minor code cleanup and also allow sig_enforce param to be shown in
        sysfs with CONFIG_MODULE_SIG_FORCE"
      
      * tag 'modules-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        module: Allow to always show the status of modsign
        module: Do not access sig_enforce directly
      de7f01c2
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 8d1e5133
      Linus Torvalds authored
      Pull uml updates from Richard Weinberger:
       "Minor updates for UML:
      
         - fixes for our new vector network driver by Anton
      
         - initcall cleanup by Alexander
      
         - We have a new mailinglist, sourceforge.net sucks"
      
      * 'for-linus-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
        um: Fix raw interface options
        um: Fix initialization of vector queues
        um: remove uml initcalls
        um: Update mailing list address
      8d1e5133
    • Toshiaki Makita's avatar
      xdp: Fix handling of devmap in generic XDP · 6d5fc195
      Toshiaki Makita authored
      Commit 67f29e07 ("bpf: devmap introduce dev_map_enqueue") changed
      the return value type of __devmap_lookup_elem() from struct net_device *
      to struct bpf_dtab_netdev * but forgot to modify generic XDP code
      accordingly.
      
      Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
      net_device is expected, then skb->dev was set to invalid value.
      
      v2:
      - Fix compiler warning without CONFIG_BPF_SYSCALL.
      
      Fixes: 67f29e07 ("bpf: devmap introduce dev_map_enqueue")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      6d5fc195
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-4.18-merge_window' of... · 6a4d4b32
      Linus Torvalds authored
      Merge tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V updates from Palmer Dabbelt:
       "This contains some small RISC-V updates I'd like to target for 4.18.
      
        They are all fairly small this time. Here's a short summary, there's
        more info in the commits/merges:
      
         - a fix to __clear_user to respect the passed arguments.
      
         - enough support for the perf subsystem to work with RISC-V's ISA
           defined performance counters.
      
         - support for sparse and cleanups suggested by it.
      
         - support for R_RISCV_32 (a relocation, not the 32-bit ISA).
      
         - some MAINTAINERS cleanups.
      
         - the addition of CONFIG_HVC_RISCV_SBI to our defconfig, as it's
           always present.
      
        I've given these a simple build+boot test"
      
      * tag 'riscv-for-linus-4.18-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        RISC-V: Add CONFIG_HVC_RISCV_SBI=y to defconfig
        RISC-V: Handle R_RISCV_32 in modules
        riscv/ftrace: Export _mcount when DYNAMIC_FTRACE isn't set
        riscv: add riscv-specific predefines to CHECKFLAGS
        riscv: split the declaration of __copy_user
        riscv: no __user for probe_kernel_address()
        riscv: use NULL instead of a plain 0
        perf: riscv: Add Document for Future Porting Guide
        perf: riscv: preliminary RISC-V support
        MAINTAINERS: Update Albert's email, he's back at Berkeley
        MAINTAINERS: Add myself as a maintainer for SiFive's drivers
        riscv: Fix the bug in memory access fixup code
      6a4d4b32
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 8949170c
      Linus Torvalds authored
      Pull more kvm updates from Paolo Bonzini:
       "Mostly the PPC part of the release, but also switching to Arnd's fix
        for the hyperv config issue and a typo fix.
      
        Main PPC changes:
      
         - reimplement the MMIO instruction emulation
      
         - transactional memory support for PR KVM
      
         - improve radix page table handling"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (63 commits)
        KVM: x86: VMX: redo fix for link error without CONFIG_HYPERV
        KVM: x86: fix typo at kvm_arch_hardware_setup comment
        KVM: PPC: Book3S PR: Fix failure status setting in tabort. emulation
        KVM: PPC: Book3S PR: Enable use on POWER9 bare-metal hosts in HPT mode
        KVM: PPC: Book3S PR: Don't let PAPR guest set MSR hypervisor bit
        KVM: PPC: Book3S PR: Fix failure status setting in treclaim. emulation
        KVM: PPC: Book3S PR: Fix MSR setting when delivering interrupts
        KVM: PPC: Book3S PR: Handle additional interrupt types
        KVM: PPC: Book3S PR: Enable kvmppc_get/set_one_reg_pr() for HTM registers
        KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGS
        KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctl
        KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl
        KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctl
        KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTM
        KVM: PPC: Book3S PR: Add guard code to prevent returning to guest with PR=0 and Transactional state
        KVM: PPC: Book3S PR: Add emulation for tabort. in privileged state
        KVM: PPC: Book3S PR: Add emulation for trechkpt.
        KVM: PPC: Book3S PR: Add emulation for treclaim.
        KVM: PPC: Book3S PR: Restore NV regs after emulating mfspr from TM SPRs
        KVM: PPC: Book3S PR: Always fail transactions in guest privileged state
        ...
      8949170c
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 2f3f0566
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "virtio, vhost: features, fixes
      
         - PCI virtual function support for virtio
      
         - DMA barriers for virtio strong barriers
      
         - bugfixes"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio: update the comments for transport features
        virtio_pci: support enabling VFs
        vhost: fix info leak due to uninitialized memory
        virtio_ring: switch to dma_XX barriers for rpmsg
      2f3f0566
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-fixes' · b5518c70
      Alexei Starovoitov authored
      Daniel Borkmann says:
      
      ====================
      First one is a panic I ran into while testing the second
      one where we got several syzkaller reports. Series here
      fixes both.
      
      Thanks!
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b5518c70
    • Daniel Borkmann's avatar
      bpf: reject any prog that failed read-only lock · 9facc336
      Daniel Borkmann authored
      We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
      as well as the BPF image as read-only through bpf_prog_lock_ro(). In
      the case any of these would fail we throw a WARN_ON_ONCE() in order to
      yell loudly to the log. Perhaps, to some extend, this may be comparable
      to an allocation where __GFP_NOWARN is explicitly not set.
      
      Added via 65869a47 ("bpf: improve read-only handling"), this behavior
      is slightly different compared to any of the other in-kernel set_memory_ro()
      users who do not check the return code of set_memory_ro() and friends /at
      all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given
      in BPF this is mandatory hardening step, we want to know whether there
      are any issues that would leave both BPF data writable. So it happens
      that syzkaller enabled fault injection and it triggered memory allocation
      failure deep inside x86's change_page_attr_set_clr() which was triggered
      from set_memory_ro().
      
      Now, there are two options: i) leaving everything as is, and ii) reworking
      the image locking code in order to have a final checkpoint out of the
      central bpf_prog_select_runtime() which probes whether any of the calls
      during prog setup weren't successful, and then bailing out with an error.
      Option ii) is a better approach since this additional paranoia avoids
      altogether leaving any potential W+X pages from BPF side in the system.
      Therefore, lets be strict about it, and reject programs in such unlikely
      occasion. While testing I noticed also that one bpf_prog_lock_ro()
      call was missing on the outer dummy prog in case of calls, e.g. in the
      destructor we call bpf_prog_free_deferred() on the main prog where we
      try to bpf_prog_unlock_free() the program, and since we go via
      bpf_prog_select_runtime() do that as well.
      
      Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com
      Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9facc336
    • Daniel Borkmann's avatar
      bpf: fix panic in prog load calls cleanup · 7d1982b4
      Daniel Borkmann authored
      While testing I found that when hitting error path in bpf_prog_load()
      where we jump to free_used_maps and prog contained BPF to BPF calls
      that were JITed earlier, then we never clean up the bpf_prog_kallsyms_add()
      done under jit_subprogs(). Add proper API to make BPF kallsyms deletion
      more clear and fix that.
      
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7d1982b4
    • Jose Abreu's avatar
      net: stmmac: Run HWIF Quirks after getting HW caps · 7cfde0af
      Jose Abreu authored
      Currently we were running HWIF quirks before getting HW capabilities.
      This is not right because some HWIF callbacks depend on HW caps.
      
      Lets save the quirks callback and use it in a later stage.
      
      This fixes Altera socfpga.
      Signed-off-by: default avatarJose Abreu <joabreu@synopsys.com>
      Fixes: 5f0456b4 ("net: stmmac: Implement logic to automatically select HW Interface")
      Reported-by: default avatarDinh Nguyen <dinh.linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Vitor Soares <soares@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Cc: Dinh Nguyen <dinh.linux@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cfde0af
    • Roopa Prabhu's avatar
      neighbour: skip NTF_EXT_LEARNED entries during forced gc · f6a6f203
      Roopa Prabhu authored
      Commit 9ce33e46 ("neighbour: support for NTF_EXT_LEARNED flag")
      added support for NTF_EXT_LEARNED for neighbour entries.
      NTF_EXT_LEARNED entries are neigh entries managed by control
      plane (eg: Ethernet VPN implementation in FRR routing suite).
      Periodic gc already excludes these entries. This patch extends
      it to forced gc which the earlier patch missed.
      
      Fixes: 9ce33e46 ("neighbour: support for NTF_EXT_LEARNED flag")
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6a6f203
    • Zhouyang Jia's avatar
      net: cxgb3: add error handling for sysfs_create_group · 7c099773
      Zhouyang Jia authored
      When sysfs_create_group fails, the lack of error-handling code may
      cause unexpected results.
      
      This patch adds error-handling code after calling sysfs_create_group.
      Signed-off-by: default avatarZhouyang Jia <jiazhouyang09@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c099773
    • David S. Miller's avatar
      Merge branch 'tls-fixes' · c14a0246
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Two tls fixes
      
      First one is syzkaller trigered uaf and second one noticed
      while writing test code with tls ulp. For details please see
      individual patches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c14a0246
    • Daniel Borkmann's avatar
      tls: fix waitall behavior in tls_sw_recvmsg · 06030dba
      Daniel Borkmann authored
      Current behavior in tls_sw_recvmsg() is to wait for incoming tls
      messages and copy up to exactly len bytes of data that the user
      provided. This is problematic in the sense that i) if no packet
      is currently queued in strparser we keep waiting until one has been
      processed and pushed into tls receive layer for tls_wait_data() to
      wake up and push the decrypted bits to user space. Given after
      tls decryption, we're back at streaming data, use sock_rcvlowat()
      hint from tcp socket instead. Retain current behavior with MSG_WAITALL
      flag and otherwise use the hint target for breaking the loop and
      returning to application. This is done if currently no ctx->recv_pkt
      is ready, otherwise continue to process it from our strparser
      backlog.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06030dba
    • Daniel Borkmann's avatar
      tls: fix use-after-free in tls_push_record · a447da7d
      Daniel Borkmann authored
      syzkaller managed to trigger a use-after-free in tls like the
      following:
      
        BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
        Write of size 1 at addr ffff88037aa08000 by task a.out/2317
      
        CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
        Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
        Call Trace:
         dump_stack+0x71/0xab
         print_address_description+0x6a/0x280
         kasan_report+0x258/0x380
         ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_push_record.constprop.15+0x6a2/0x810 [tls]
         tls_sw_push_pending_record+0x2e/0x40 [tls]
         tls_sk_proto_close+0x3fe/0x710 [tls]
         ? tcp_check_oom+0x4c0/0x4c0
         ? tls_write_space+0x260/0x260 [tls]
         ? kmem_cache_free+0x88/0x1f0
         inet_release+0xd6/0x1b0
         __sock_release+0xc0/0x240
         sock_close+0x11/0x20
         __fput+0x22d/0x660
         task_work_run+0x114/0x1a0
         do_exit+0x71a/0x2780
         ? mm_update_next_owner+0x650/0x650
         ? handle_mm_fault+0x2f5/0x5f0
         ? __do_page_fault+0x44f/0xa50
         ? mm_fault_error+0x2d0/0x2d0
         do_group_exit+0xde/0x300
         __x64_sys_exit_group+0x3a/0x50
         do_syscall_64+0x9a/0x300
         ? page_fault+0x8/0x30
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This happened through fault injection where aead_req allocation in
      tls_do_encryption() eventually failed and we returned -ENOMEM from
      the function. Turns out that the use-after-free is triggered from
      tls_sw_sendmsg() in the second tls_push_record(). The error then
      triggers a jump to waiting for memory in sk_stream_wait_memory()
      resp. returning immediately in case of MSG_DONTWAIT. What follows is
      the trim_both_sgl(sk, orig_size), which drops elements from the sg
      list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
      when the socket is being closed, where tls_sk_proto_close() callback
      is invoked. The tls_complete_pending_work() will figure that there's
      a pending closed tls record to be flushed and thus calls into the
      tls_push_pending_closed_record() from there. ctx->push_pending_record()
      is called from the latter, which is the tls_sw_push_pending_record()
      from sw path. This again calls into tls_push_record(). And here the
      tls_fill_prepend() will panic since the buffer address has been freed
      earlier via trim_both_sgl(). One way to fix it is to move the aead
      request allocation out of tls_do_encryption() early into tls_push_record().
      This means we don't prep the tls header and advance state to the
      TLS_PENDING_CLOSED_RECORD before allocation which could potentially
      fail happened. That fixes the issue on my side.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
      Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarDave Watson <davejwatson@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a447da7d
    • David S. Miller's avatar
      Merge branch 'l2tp-l2tp_ppp-must-ignore-non-PPP-sessions' · 695ad876
      David S. Miller authored
      Guillaume Nault says:
      
      ====================
      l2tp: l2tp_ppp must ignore non-PPP sessions
      
      The original L2TP code was written for version 2 of the protocol, which
      could only carry PPP sessions. Then L2TPv3 generalised the protocol so that
      it could transport different kinds of pseudo-wires. But parts of the
      l2tp_ppp module still break in presence of non-PPP sessions.
      
      Assuming L2TPv2 tunnels can only transport PPP sessions is right, but
      l2tp_netlink failed to ensure that (fixed in patch 1).
      When retrieving a session from an arbitrary tunnel, l2tp_ppp needs to
      filter out non-PPP sessions (last occurrence fixed in patch 2).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      695ad876
    • Guillaume Nault's avatar
      l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl() · ecd012e4
      Guillaume Nault authored
      pppol2tp_tunnel_ioctl() can act on an L2TPv3 tunnel, in which case
      'session' may be an Ethernet pseudo-wire.
      
      However, pppol2tp_session_ioctl() expects a PPP pseudo-wire, as it
      assumes l2tp_session_priv() points to a pppol2tp_session structure. For
      an Ethernet pseudo-wire l2tp_session_priv() points to an l2tp_eth_sess
      structure instead, making pppol2tp_session_ioctl() access invalid
      memory.
      
      Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
      Signed-off-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ecd012e4