1. 14 Jan, 2021 1 commit
    • Song Liu's avatar
      bpf: Reject too big ctx_size_in for raw_tp test run · 7ac6ad05
      Song Liu authored
      syzbot reported a WARNING for allocating too big memory:
      
      WARNING: CPU: 1 PID: 8484 at mm/page_alloc.c:4976 __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5011
      Modules linked in:
      CPU: 1 PID: 8484 Comm: syz-executor862 Not tainted 5.11.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:4976
      Code: 00 00 0c 00 0f 85 a7 00 00 00 8b 3c 24 4c 89 f2 44 89 e6 c6 44 24 70 00 48 89 6c 24 58 e8 d0 d7 ff ff 49 89 c5 e9 ea fc ff ff <0f> 0b e9 b5 fd ff ff 89 74 24 14 4c 89 4c 24 08 4c 89 74 24 18 e8
      RSP: 0018:ffffc900012efb10 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 1ffff9200025df66 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000140dc0
      RBP: 0000000000140dc0 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff81b1f7e1 R11: 0000000000000000 R12: 0000000000000014
      R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000
      FS:  000000000190c880(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f08b7f316c0 CR3: 0000000012073000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      alloc_pages_current+0x18c/0x2a0 mm/mempolicy.c:2267
      alloc_pages include/linux/gfp.h:547 [inline]
      kmalloc_order+0x2e/0xb0 mm/slab_common.c:837
      kmalloc_order_trace+0x14/0x120 mm/slab_common.c:853
      kmalloc include/linux/slab.h:557 [inline]
      kzalloc include/linux/slab.h:682 [inline]
      bpf_prog_test_run_raw_tp+0x4b5/0x670 net/bpf/test_run.c:282
      bpf_prog_test_run kernel/bpf/syscall.c:3120 [inline]
      __do_sys_bpf+0x1ea9/0x4f10 kernel/bpf/syscall.c:4398
      do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x440499
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffe1f3bfb18 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440499
      RDX: 0000000000000048 RSI: 0000000020000600 RDI: 000000000000000a
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401ca0
      R13: 0000000000401d30 R14: 0000000000000000 R15: 0000000000000000
      
      This is because we didn't filter out too big ctx_size_in. Fix it by
      rejecting ctx_size_in that are bigger than MAX_BPF_FUNC_ARGS (12) u64
      numbers.
      
      Fixes: 1b4d60ec ("bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint")
      Reported-by: syzbot+4f98876664c7337a4ae6@syzkaller.appspotmail.com
      Signed-off-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210112234254.1906829-1-songliubraving@fb.com
      7ac6ad05
  2. 12 Jan, 2021 6 commits
  3. 11 Jan, 2021 1 commit
  4. 10 Jan, 2021 4 commits
  5. 09 Jan, 2021 12 commits
    • Hoang Le's avatar
      tipc: fix NULL deref in tipc_link_xmit() · b7741344
      Hoang Le authored
      The buffer list can have zero skb as following path:
      tipc_named_node_up()->tipc_node_xmit()->tipc_link_xmit(), so
      we need to check the list before casting an &sk_buff.
      
      Fault report:
       [] tipc: Bulk publication failure
       [] general protection fault, probably for non-canonical [#1] PREEMPT [...]
       [] KASAN: null-ptr-deref in range [0x00000000000000c8-0x00000000000000cf]
       [] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.10.0-rc4+ #2
       [] Hardware name: Bochs ..., BIOS Bochs 01/01/2011
       [] RIP: 0010:tipc_link_xmit+0xc1/0x2180
       [] Code: 24 b8 00 00 00 00 4d 39 ec 4c 0f 44 e8 e8 d7 0a 10 f9 48 [...]
       [] RSP: 0018:ffffc90000006ea0 EFLAGS: 00010202
       [] RAX: dffffc0000000000 RBX: ffff8880224da000 RCX: 1ffff11003d3cc0d
       [] RDX: 0000000000000019 RSI: ffffffff886007b9 RDI: 00000000000000c8
       [] RBP: ffffc90000007018 R08: 0000000000000001 R09: fffff52000000ded
       [] R10: 0000000000000003 R11: fffff52000000dec R12: ffffc90000007148
       [] R13: 0000000000000000 R14: 0000000000000000 R15: ffffc90000007018
       [] FS:  0000000000000000(0000) GS:ffff888037400000(0000) knlGS:000[...]
       [] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [] CR2: 00007fffd2db5000 CR3: 000000002b08f000 CR4: 00000000000006f0
      
      Fixes: af9b028e ("tipc: make media xmit call outside node spinlock context")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Link: https://lore.kernel.org/r/20210108071337.3598-1-hoang.h.le@dektech.com.auSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b7741344
    • Vadim Fedorenko's avatar
      selftests/tls: fix selftests after adding ChaCha20-Poly1305 · 3502bd9b
      Vadim Fedorenko authored
      TLS selftests where broken because of wrong variable types used.
      Fix it by changing u16 -> uint16_t
      
      Fixes: 4f336e88 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Signed-off-by: default avatarVadim Fedorenko <vfedorenko@novek.ru>
      Link: https://lore.kernel.org/r/1610141865-7142-1-git-send-email-vfedorenko@novek.ruSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3502bd9b
    • Aya Levin's avatar
      net: ipv6: Validate GSO SKB before finish IPv6 processing · b210de4f
      Aya Levin authored
      There are cases where GSO segment's length exceeds the egress MTU:
       - Forwarding of a TCP GRO skb, when DF flag is not set.
       - Forwarding of an skb that arrived on a virtualisation interface
         (virtio-net/vhost/tap) with TSO/GSO size set by other network
         stack.
       - Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
         interface with a smaller MTU.
       - Arriving GRO skb (or GSO skb in a virtualised environment) that is
         bridged to a NETIF_F_TSO tunnel stacked over an interface with an
         insufficient MTU.
      
      If so:
       - Consume the SKB and its segments.
       - Issue an ICMP packet with 'Packet Too Big' message containing the
         MTU, allowing the source host to reduce its Path MTU appropriately.
      
      Note: These cases are handled in the same manner in IPv4 output finish.
      This patch aligns the behavior of IPv6 and the one of IPv4.
      
      Fixes: 9e508490 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b210de4f
    • Manish Chopra's avatar
      netxen_nic: fix MSI/MSI-x interrupts · a2bc221b
      Manish Chopra authored
      For all PCI functions on the netxen_nic adapter, interrupt
      mode (INTx or MSI) configuration is dependent on what has
      been configured by the PCI function zero in the shared
      interrupt register, as these adapters do not support mixed
      mode interrupts among the functions of a given adapter.
      
      Logic for setting MSI/MSI-x interrupt mode in the shared interrupt
      register based on PCI function id zero check is not appropriate for
      all family of netxen adapters, as for some of the netxen family
      adapters PCI function zero is not really meant to be probed/loaded
      in the host but rather just act as a management function on the device,
      which caused all the other PCI functions on the adapter to always use
      legacy interrupt (INTx) mode instead of choosing MSI/MSI-x interrupt mode.
      
      This patch replaces that check with port number so that for all
      type of adapters driver attempts for MSI/MSI-x interrupt modes.
      
      Fixes: b37eb210 ("netxen_nic: Avoid mixed mode interrupts")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Link: https://lore.kernel.org/r/20210107101520.6735-1-manishc@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a2bc221b
    • Jakub Kicinski's avatar
      Merge branch 'net-fix-issues-around-register_netdevice-failures' · c49243e8
      Jakub Kicinski authored
      Jakub Kicinski says:
      
      ====================
      net: fix issues around register_netdevice() failures
      
      This series attempts to clean up the life cycle of struct
      net_device. Dave has added dev->needs_free_netdev in the
      past to fix double frees, we can lean on that mechanism
      a little more to fix remaining issues with register_netdevice().
      
      This is the next chapter of the saga which already includes:
      commit 0e0eee24 ("net: correct error path in rtnl_newlink()")
      commit e51fb152 ("rtnetlink: fix a memory leak when ->newlink fails")
      commit cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
      commit 93ee31f1 ("[NET]: Fix free_netdev on register_netdev failure.")
      commit 814152a8 ("net: fix memleak in register_netdevice()")
      commit 10cc514f ("net: Fix null de-reference of device refcount")
      
      The immediate problem which gets fixed here is that calling
      free_netdev() right after unregister_netdevice() is illegal
      because we need to release rtnl_lock first, to let the
      unregistration finish. Note that unregister_netdevice() is
      just a wrapper of unregister_netdevice_queue(), it only
      does half of the job.
      
      Where this limitation becomes most problematic is in failure
      modes of register_netdevice(). There is a notifier call right
      at the end of it, which lets other subsystems veto the entire
      thing. At which point we should really go through a full
      unregister_netdevice(), but we can't because callers may
      go straight to free_netdev() after the failure, and that's
      no bueno (see the previous paragraph).
      
      This set makes free_netdev() more lenient, when device
      is still being unregistered free_netdev() will simply set
      dev->needs_free_netdev and let the unregister process do
      the freeing.
      
      With the free_netdev() problem out of the way failures in
      register_netdevice() can make use of net_todo, again.
      Users are still expected to call free_netdev() right after
      failure but that will only set dev->needs_free_netdev.
      
      To prevent the pathological case of:
      
       dev->needs_free_netdev = true;
       if (register_netdevice(dev)) {
         rtnl_unlock();
         free_netdev(dev);
       }
      
      make register_netdevice()'s failure clear dev->needs_free_netdev.
      
      Problems described above are only present with register_netdevice() /
      unregister_netdevice(). We have two parallel APIs for registration
      of devices:
       - those called outside rtnl_lock (register_netdev(), and
         unregister_netdev());
       - and those to be used under rtnl_lock - register_netdevice()
         and unregister_netdevice().
      The former is trivial and has no problems. The alternative
      approach to fix the latter would be to also separate the
      freeing functions - i.e. add free_netdevice(). This has been
      implemented (incl. converting all relevant calls in the tree)
      but it feels a little unnecessary to put the burden of choosing
      the right free_netdev{,ice}() call on the programmer when we
      can "just do the right thing" by default.
      ====================
      
      Link: https://lore.kernel.org/r/20210106184007.1821480-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c49243e8
    • Jakub Kicinski's avatar
      net: make sure devices go through netdev_wait_all_refs · 766b0515
      Jakub Kicinski authored
      If register_netdevice() fails at the very last stage - the
      notifier call - some subsystems may have already seen it and
      grabbed a reference. struct net_device can't be freed right
      away without calling netdev_wait_all_refs().
      
      Now that we have a clean interface in form of dev->needs_free_netdev
      and lenient free_netdev() we can undo what commit 93ee31f1 ("[NET]:
      Fix free_netdev on register_netdev failure.") has done and complete
      the unregistration path by bringing the net_set_todo() call back.
      
      After registration fails user is still expected to explicitly
      free the net_device, so make sure ->needs_free_netdev is cleared,
      otherwise rolling back the registration will cause the old double
      free for callers who release rtnl_lock before the free.
      
      This also solves the problem of priv_destructor not being called
      on notifier error.
      
      net_set_todo() will be moved back into unregister_netdevice_queue()
      in a follow up.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Reported-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      766b0515
    • Jakub Kicinski's avatar
      net: make free_netdev() more lenient with unregistering devices · c269a24c
      Jakub Kicinski authored
      There are two flavors of handling netdev registration:
       - ones called without holding rtnl_lock: register_netdev() and
         unregister_netdev(); and
       - those called with rtnl_lock held: register_netdevice() and
         unregister_netdevice().
      
      While the semantics of the former are pretty clear, the same can't
      be said about the latter. The netdev_todo mechanism is utilized to
      perform some of the device unregistering tasks and it hooks into
      rtnl_unlock() so the locked variants can't actually finish the work.
      In general free_netdev() does not mix well with locked calls. Most
      drivers operating under rtnl_lock set dev->needs_free_netdev to true
      and expect core to make the free_netdev() call some time later.
      
      The part where this becomes most problematic is error paths. There is
      no way to unwind the state cleanly after a call to register_netdevice(),
      since unreg can't be performed fully without dropping locks.
      
      Make free_netdev() more lenient, and defer the freeing if device
      is being unregistered. This allows error paths to simply call
      free_netdev() both after register_netdevice() failed, and after
      a call to unregister_netdevice() but before dropping rtnl_lock.
      
      Simplify the error paths which are currently doing gymnastics
      around free_netdev() handling.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c269a24c
    • Jakub Kicinski's avatar
      docs: net: explain struct net_device lifetime · 2b446e65
      Jakub Kicinski authored
      Explain the two basic flows of struct net_device's operation.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2b446e65
    • Tom Parkin's avatar
      ppp: fix refcount underflow on channel unbridge · c1787ffd
      Tom Parkin authored
      When setting up a channel bridge, ppp_bridge_channels sets the
      pch->bridge field before taking the associated reference on the bridge
      file instance.
      
      This opens up a refcount underflow bug if ppp_bridge_channels called
      via. iotcl runs concurrently with ppp_unbridge_channels executing via.
      file release.
      
      The bug is triggered by ppp_bridge_channels taking the error path
      through the 'err_unset' label.  In this scenario, pch->bridge is set,
      but the reference on the bridged channel will not be taken because
      the function errors out.  If ppp_unbridge_channels observes pch->bridge
      before it is unset by the error path, it will erroneously drop the
      reference on the bridged channel and cause a refcount underflow.
      
      To avoid this, ensure that ppp_bridge_channels holds a reference on
      each channel in advance of setting the bridge pointers.
      Signed-off-by: default avatarTom Parkin <tparkin@katalix.com>
      Fixes: 4cf476ce ("ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls")
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Link: https://lore.kernel.org/r/20210107181315.3128-1-tparkin@katalix.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c1787ffd
    • Baptiste Lepers's avatar
      udp: Prevent reuseport_select_sock from reading uninitialized socks · fd2ddef0
      Baptiste Lepers authored
      reuse->socks[] is modified concurrently by reuseport_add_sock. To
      prevent reading values that have not been fully initialized, only read
      the array up until the last known safe index instead of incorrectly
      re-reading the last index of the array.
      
      Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets")
      Signed-off-by: default avatarBaptiste Lepers <baptiste.lepers@gmail.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20210107051110.12247-1-baptiste.lepers@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fd2ddef0
    • Dongseok Yi's avatar
      net: fix use-after-free when UDP GRO with shared fraglist · 53475c5d
      Dongseok Yi authored
      skbs in fraglist could be shared by a BPF filter loaded at TC. If TC
      writes, it will call skb_ensure_writable -> pskb_expand_head to create
      a private linear section for the head_skb. And then call
      skb_clone_fraglist -> skb_get on each skb in the fraglist.
      
      skb_segment_list overwrites part of the skb linear section of each
      fragment itself. Even after skb_clone, the frag_skbs share their
      linear section with their clone in PF_PACKET.
      
      Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have
      a link for the same frag_skbs chain. If a new skb (not frags) is
      queued to one of the sk_receive_queue, multiple ptypes can see and
      release this. It causes use-after-free.
      
      [ 4443.426215] ------------[ cut here ]------------
      [ 4443.426222] refcount_t: underflow; use-after-free.
      [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190
      refcount_dec_and_test_checked+0xa4/0xc8
      [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO)
      [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8
      [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8
      [ 4443.426808] Call trace:
      [ 4443.426813]  refcount_dec_and_test_checked+0xa4/0xc8
      [ 4443.426823]  skb_release_data+0x144/0x264
      [ 4443.426828]  kfree_skb+0x58/0xc4
      [ 4443.426832]  skb_queue_purge+0x64/0x9c
      [ 4443.426844]  packet_set_ring+0x5f0/0x820
      [ 4443.426849]  packet_setsockopt+0x5a4/0xcd0
      [ 4443.426853]  __sys_setsockopt+0x188/0x278
      [ 4443.426858]  __arm64_sys_setsockopt+0x28/0x38
      [ 4443.426869]  el0_svc_common+0xf0/0x1d0
      [ 4443.426873]  el0_svc_handler+0x74/0x98
      [ 4443.426880]  el0_svc+0x8/0xc
      
      Fixes: 3a1296a3 (net: Support GRO/GSO fraglist chaining.)
      Signed-off-by: default avatarDongseok Yi <dseok.yi@samsung.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53475c5d
    • Stephan Gerhold's avatar
      net: ipa: modem: add missing SET_NETDEV_DEV() for proper sysfs links · afba9dc1
      Stephan Gerhold authored
      At the moment it is quite hard to identify the network interface
      provided by IPA in userspace components: The network interface is
      created as virtual device, without any link to the IPA device.
      The interface name ("rmnet_ipa%d") is the only indication that the
      network interface belongs to IPA, but this is not very reliable.
      
      Add SET_NETDEV_DEV() to associate the network interface with the
      IPA parent device. This allows userspace services like ModemManager
      to properly identify that this network interface is provided by IPA
      and belongs to the modem.
      
      Cc: Alex Elder <elder@kernel.org>
      Fixes: a646d6ec ("soc: qcom: ipa: modem and microcontroller")
      Signed-off-by: default avatarStephan Gerhold <stephan@gerhold.net>
      Link: https://lore.kernel.org/r/20210106100755.56800-1-stephan@gerhold.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      afba9dc1
  6. 08 Jan, 2021 16 commits