1. 19 Oct, 2022 15 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · d753a050
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Missing flowi uid field in nft_fib expression, from Guillaume Nault.
         This is broken since the creation of the fib expression.
      
      2) Relax sanity check to fix bogus EINVAL error when deleting elements
         belonging set intervals. Broken since 6.0-rc.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
        netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
      ====================
      
      Link: https://lore.kernel.org/r/20221019065225.1006344-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d753a050
    • Jakub Kicinski's avatar
      genetlink: fix kdoc warnings · a1a824f4
      Jakub Kicinski authored
      Address a bunch of kdoc warnings:
      
      include/net/genetlink.h:81: warning: Function parameter or member 'module' not described in 'genl_family'
      include/net/genetlink.h:243: warning: expecting prototype for struct genl_info. Prototype was for struct genl_dumpit_info instead
      include/net/genetlink.h:419: warning: Function parameter or member 'net' not described in 'genlmsg_unicast'
      include/net/genetlink.h:438: warning: expecting prototype for gennlmsg_data(). Prototype was for genlmsg_data() instead
      include/net/genetlink.h:244: warning: Function parameter or member 'op' not described in 'genl_dumpit_info'
      
      Link: https://lore.kernel.org/r/20221018231310.1040482-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a1a824f4
    • David S. Miller's avatar
      Merge branch 'qdisc-ingress-success' · 6109ecbf
      David S. Miller authored
      Paul Blakey says:
      
      ====================
      net: Fix return value of qdisc ingress handling on success
      
      Fix patch + self-test with the currently broken scenario.
      
      v4->v3:
        Removed new line in self test and rebase (Paolo).
      
      v2->v3:
        Added DROP return to TC_ACT_SHOT case (Cong).
      
      v1->v2:
        Changed blamed commit
        Added self-test
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6109ecbf
    • Paul Blakey's avatar
      selftests: add selftest for chaining of tc ingress handling to egress · fd602f5c
      Paul Blakey authored
      This test runs a simple ingress tc setup between two veth pairs,
      then adds a egress->ingress rule to test the chaining of tc ingress
      pipeline to tc egress piepline.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd602f5c
    • Paul Blakey's avatar
      net: Fix return value of qdisc ingress handling on success · 672e97ef
      Paul Blakey authored
      Currently qdisc ingress handling (sch_handle_ingress()) doesn't
      set a return value and it is left to the old return value of
      the caller (__netif_receive_skb_core()) which is RX drop, so if
      the packet is consumed, caller will stop and return this value
      as if the packet was dropped.
      
      This causes a problem in the kernel tcp stack when having a
      egress tc rule forwarding to a ingress tc rule.
      The tcp stack sending packets on the device having the egress rule
      will see the packets as not successfully transmitted (although they
      actually were), will not advance it's internal state of sent data,
      and packets returning on such tcp stream will be dropped by the tcp
      stack with reason ack-of-unsent-data. See reproduction in [0] below.
      
      Fix that by setting the return value to RX success if
      the packet was handled successfully.
      
      [0] Reproduction steps:
       $ ip link add veth1 type veth peer name peer1
       $ ip link add veth2 type veth peer name peer2
       $ ifconfig peer1 5.5.5.6/24 up
       $ ip netns add ns0
       $ ip link set dev peer2 netns ns0
       $ ip netns exec ns0 ifconfig peer2 5.5.5.5/24 up
       $ ifconfig veth2 0 up
       $ ifconfig veth1 0 up
      
       #ingress forwarding veth1 <-> veth2
       $ tc qdisc add dev veth2 ingress
       $ tc qdisc add dev veth1 ingress
       $ tc filter add dev veth2 ingress prio 1 proto all flower \
         action mirred egress redirect dev veth1
       $ tc filter add dev veth1 ingress prio 1 proto all flower \
         action mirred egress redirect dev veth2
      
       #steal packet from peer1 egress to veth2 ingress, bypassing the veth pipe
       $ tc qdisc add dev peer1 clsact
       $ tc filter add dev peer1 egress prio 20 proto ip flower \
         action mirred ingress redirect dev veth1
      
       #run iperf and see connection not running
       $ iperf3 -s&
       $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
      
       #delete egress rule, and run again, now should work
       $ tc filter del dev peer1 egress
       $ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
      
      Fixes: f697c3e8 ("[NET]: Avoid unnecessary cloning for ingress filtering")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      672e97ef
    • David S. Miller's avatar
      Merge branch 'qdisc-null-deref' · e38cf366
      David S. Miller authored
      Zhengchao Shao says:
      
      ====================
      net: fix null pointer access issue in qdisc
      
      These three patches fix the same type of problem. Set the default qdisc,
      and then construct an init failure scenario when the dev qdisc is
      configured on mqprio to trigger the reset process. NULL pointer access
      may occur during the reset process.
      
      ---
      v2: for fq_codel, revert the patch
      ---
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e38cf366
    • Zhengchao Shao's avatar
      net: sched: sfb: fix null pointer access issue when sfb_init() fails · 2a3fc782
      Zhengchao Shao authored
      When the default qdisc is sfb, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), sfb_reset() is invoked to clear resources.
      In this case, the q->qdisc is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	sfb_init()
      		tcf_block_get()          --->failed, q->qdisc is NULL
      	...
      	qdisc_put()
      		...
      		sfb_reset()
      			qdisc_reset(q->qdisc)    --->q->qdisc is NULL
      				ops = qdisc->ops
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
      RIP: 0010:qdisc_reset+0x2b/0x6f0
      Call Trace:
      <TASK>
      sfb_reset+0x37/0xd0
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f2164122d04
      </TASK>
      
      Fixes: e13e02a3 ("net_sched: SFB flow scheduler")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a3fc782
    • Zhengchao Shao's avatar
      Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()" · f5ffa3b1
      Zhengchao Shao authored
      This reverts commit 494f5063.
      
      When the default qdisc is fq_codel, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), fq_codel_reset() is invoked to clear
      resources. In this case, the flow is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	fq_codel_init()
      		...
      		q->flows_cnt = 1024;
      		...
      		q->flows = kvcalloc(...)      --->failed, q->flows is NULL
      	...
      	qdisc_put()
      		...
      		fq_codel_reset()
      			...
      			flow = q->flows + i   --->q->flows is NULL
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      RIP: 0010:fq_codel_reset+0x14d/0x350
      Call Trace:
      <TASK>
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fd272b22d04
      </TASK>
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5ffa3b1
    • Zhengchao Shao's avatar
      net: sched: cake: fix null pointer access issue when cake_init() fails · 51f9a892
      Zhengchao Shao authored
      When the default qdisc is cake, if the qdisc of dev_queue fails to be
      inited during mqprio_init(), cake_reset() is invoked to clear
      resources. In this case, the tins is NULL, and it will cause gpf issue.
      
      The process is as follows:
      qdisc_create_dflt()
      	cake_init()
      		q->tins = kvcalloc(...)        --->failed, q->tins is NULL
      	...
      	qdisc_put()
      		...
      		cake_reset()
      			...
      			cake_dequeue_one()
      				b = &q->tins[...]   --->q->tins is NULL
      
      The following is the Call Trace information:
      general protection fault, probably for non-canonical address
      0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      RIP: 0010:cake_dequeue_one+0xc9/0x3c0
      Call Trace:
      <TASK>
      cake_reset+0xb1/0x140
      qdisc_reset+0xed/0x6f0
      qdisc_destroy+0x82/0x4c0
      qdisc_put+0x9e/0xb0
      qdisc_create_dflt+0x2c3/0x4a0
      mqprio_init+0xa71/0x1760
      qdisc_create+0x3eb/0x1000
      tc_modify_qdisc+0x408/0x1720
      rtnetlink_rcv_msg+0x38e/0xac0
      netlink_rcv_skb+0x12d/0x3a0
      netlink_unicast+0x4a2/0x740
      netlink_sendmsg+0x826/0xcc0
      sock_sendmsg+0xc5/0x100
      ____sys_sendmsg+0x583/0x690
      ___sys_sendmsg+0xe8/0x160
      __sys_sendmsg+0xbf/0x160
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f89e5122d04
      </TASK>
      
      Fixes: 046f6fd5 ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Acked-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51f9a892
    • Manank Patel's avatar
      ethernet: marvell: octeontx2 Fix resource not freed after malloc · 7b55c2ed
      Manank Patel authored
      fix rxsc and txsc not getting freed before going out of scope
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarManank Patel <pmanank200502@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b55c2ed
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements · 96df8360
      Pablo Neira Ayuso authored
      Otherwise EINVAL is bogusly reported to userspace when deleting a set
      element. NFTA_SET_ELEM_KEY_END does not need to be set in case of:
      
      - insertion: if not present, start key is used as end key.
      - deletion: only start key needs to be specified, end key is ignored.
      
      Hence, relax the sanity check.
      
      Fixes: 88cccd90 ("netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      96df8360
    • Guillaume Nault's avatar
      netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces. · 1fcc064b
      Guillaume Nault authored
      Currently netfilter's rpfilter and fib modules implicitely initialise
      ->flowic_uid with 0. This is normally the root UID. However, this isn't
      the case in user namespaces, where user ID 0 is mapped to a different
      kernel UID. By initialising ->flowic_uid with sock_net_uid(), we get
      the root UID of the user namespace, thus keeping the same behaviour
      whether or not we're running in a user namepspace.
      
      Note, this is similar to commit 8bcfd092 ("ipv4: add missing
      initialization for flowi4_uid"), which fixed the rp_filter sysctl.
      
      Fixes: 622ec2c9 ("net: core: add UID to flows, rules, and routes")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      1fcc064b
    • Brett Creeley's avatar
      ionic: catch NULL pointer issue on reconfig · aa1d7e12
      Brett Creeley authored
      It's possible that the driver will dereference a qcq that doesn't exist
      when calling ionic_reconfigure_queues(), which causes a page fault BUG.
      
      If a reduction in the number of queues is followed by a different
      reconfig such as changing the ring size, the driver can hit a NULL
      pointer when trying to clean up non-existent queues.
      
      Fix this by checking to make sure both the qcqs array and qcq entry
      exists bofore trying to use and free the entry.
      
      Fixes: 101b40a0 ("ionic: change queue count with no reset")
      Signed-off-by: default avatarBrett Creeley <brett@pensando.io>
      Signed-off-by: default avatarShannon Nelson <snelson@pensando.io>
      Link: https://lore.kernel.org/r/20221017233123.15869-1-snelson@pensando.ioSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa1d7e12
    • Eric Dumazet's avatar
      net: hsr: avoid possible NULL deref in skb_clone() · d8b57135
      Eric Dumazet authored
      syzbot got a crash [1] in skb_clone(), caused by a bug
      in hsr_get_untagged_frame().
      
      When/if create_stripped_skb_hsr() returns NULL, we must
      not attempt to call skb_clone().
      
      While we are at it, replace a WARN_ONCE() by netdev_warn_once().
      
      [1]
      general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f]
      CPU: 1 PID: 754 Comm: syz-executor.0 Not tainted 6.0.0-syzkaller-02734-g0326074f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      RIP: 0010:skb_clone+0x108/0x3c0 net/core/skbuff.c:1641
      Code: 93 02 00 00 49 83 7c 24 28 00 0f 85 e9 00 00 00 e8 5d 4a 29 fa 4c 8d 75 7e 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6 04 02 4c 89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 9e 01 00 00
      RSP: 0018:ffffc90003ccf4e0 EFLAGS: 00010207
      
      RAX: dffffc0000000000 RBX: ffffc90003ccf5f8 RCX: ffffc9000c24b000
      RDX: 000000000000000f RSI: ffffffff8751cb13 RDI: 0000000000000000
      RBP: 0000000000000000 R08: 00000000000000f0 R09: 0000000000000140
      R10: fffffbfff181d972 R11: 0000000000000000 R12: ffff888161fc3640
      R13: 0000000000000a20 R14: 000000000000007e R15: ffffffff8dc5f620
      FS: 00007feb621e4700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007feb621e3ff8 CR3: 00000001643a9000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      hsr_get_untagged_frame+0x4e/0x610 net/hsr/hsr_forward.c:164
      hsr_forward_do net/hsr/hsr_forward.c:461 [inline]
      hsr_forward_skb+0xcca/0x1d50 net/hsr/hsr_forward.c:623
      hsr_handle_frame+0x588/0x7c0 net/hsr/hsr_slave.c:69
      __netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
      __netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
      __netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
      netif_receive_skb_internal net/core/dev.c:5685 [inline]
      netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
      tun_rx_batched+0x4ab/0x7a0 drivers/net/tun.c:1544
      tun_get_user+0x2686/0x3a00 drivers/net/tun.c:1995
      tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2025
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x9e9/0xdd0 fs/read_write.c:584
      ksys_write+0x127/0x250 fs/read_write.c:637
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: f266a683 ("net/hsr: Better frame dispatch")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221017165928.2150130-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d8b57135
    • Vikas Gupta's avatar
      bnxt_en: fix memory leak in bnxt_nvm_test() · ba077d68
      Vikas Gupta authored
      Free the kzalloc'ed buffer before returning in the success path.
      
      Fixes: 5b6ff128 ("bnxt_en: implement callbacks for devlink selftests")
      Signed-off-by: default avatarVikas Gupta <vikas.gupta@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/1666020742-25834-1-git-send-email-michael.chan@broadcom.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ba077d68
  2. 18 Oct, 2022 2 commits
    • Zhengchao Shao's avatar
      ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed · 1ca69520
      Zhengchao Shao authored
      If the initialization fails in calling addrconf_init_net(), devconf_all is
      the pointer that has been released. Then ip6mr_sk_done() is called to
      release the net, accessing devconf->mc_forwarding directly causes invalid
      pointer access.
      
      The process is as follows:
      setup_net()
      	ops_init()
      		addrconf_init_net()
      		all = kmemdup(...)           ---> alloc "all"
      		...
      		net->ipv6.devconf_all = all;
      		__addrconf_sysctl_register() ---> failed
      		...
      		kfree(all);                  ---> ipv6.devconf_all invalid
      		...
      	ops_exit_list()
      		...
      		ip6mr_sk_done()
      			devconf = net->ipv6.devconf_all;
      			//devconf is invalid pointer
      			if (!devconf || !atomic_read(&devconf->mc_forwarding))
      
      The following is the Call Trace information:
      BUG: KASAN: use-after-free in ip6mr_sk_done+0x112/0x3a0
      Read of size 4 at addr ffff888075508e88 by task ip/14554
      Call Trace:
      <TASK>
      dump_stack_lvl+0x8e/0xd1
      print_report+0x155/0x454
      kasan_report+0xba/0x1f0
      kasan_check_range+0x35/0x1b0
      ip6mr_sk_done+0x112/0x3a0
      rawv6_close+0x48/0x70
      inet_release+0x109/0x230
      inet6_release+0x4c/0x70
      sock_release+0x87/0x1b0
      igmp6_net_exit+0x6b/0x170
      ops_exit_list+0xb0/0x170
      setup_net+0x7ac/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f7963322547
      
      </TASK>
      Allocated by task 14554:
      kasan_save_stack+0x1e/0x40
      kasan_set_track+0x21/0x30
      __kasan_kmalloc+0xa1/0xb0
      __kmalloc_node_track_caller+0x4a/0xb0
      kmemdup+0x28/0x60
      addrconf_init_net+0x1be/0x840
      ops_init+0xa5/0x410
      setup_net+0x5aa/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Freed by task 14554:
      kasan_save_stack+0x1e/0x40
      kasan_set_track+0x21/0x30
      kasan_save_free_info+0x2a/0x40
      ____kasan_slab_free+0x155/0x1b0
      slab_free_freelist_hook+0x11b/0x220
      __kmem_cache_free+0xa4/0x360
      addrconf_init_net+0x623/0x840
      ops_init+0xa5/0x410
      setup_net+0x5aa/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: 7d9b1b57 ("ip6mr: fix use-after-free in ip6mr_sk_done()")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221017080331.16878-1-shaozhengchao@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1ca69520
    • Kuniyuki Iwashima's avatar
      udp: Update reuse->has_conns under reuseport_lock. · 69421bf9
      Kuniyuki Iwashima authored
      When we call connect() for a UDP socket in a reuseport group, we have
      to update sk->sk_reuseport_cb->has_conns to 1.  Otherwise, the kernel
      could select a unconnected socket wrongly for packets sent to the
      connected socket.
      
      However, the current way to set has_conns is illegal and possible to
      trigger that problem.  reuseport_has_conns() changes has_conns under
      rcu_read_lock(), which upgrades the RCU reader to the updater.  Then,
      it must do the update under the updater's lock, reuseport_lock, but
      it doesn't for now.
      
      For this reason, there is a race below where we fail to set has_conns
      resulting in the wrong socket selection.  To avoid the race, let's split
      the reader and updater with proper locking.
      
       cpu1                               cpu2
      +----+                             +----+
      
      __ip[46]_datagram_connect()        reuseport_grow()
      .                                  .
      |- reuseport_has_conns(sk, true)   |- more_reuse = __reuseport_alloc(more_socks_size)
      |  .                               |
      |  |- rcu_read_lock()
      |  |- reuse = rcu_dereference(sk->sk_reuseport_cb)
      |  |
      |  |                               |  /* reuse->has_conns == 0 here */
      |  |                               |- more_reuse->has_conns = reuse->has_conns
      |  |- reuse->has_conns = 1         |  /* more_reuse->has_conns SHOULD BE 1 HERE */
      |  |                               |
      |  |                               |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
      |  |                               |                     more_reuse)
      |  `- rcu_read_unlock()            `- kfree_rcu(reuse, rcu)
      |
      |- sk->sk_state = TCP_ESTABLISHED
      
      Note the likely(reuse) in reuseport_has_conns_set() is always true,
      but we put the test there for ease of review.  [0]
      
      For the record, usually, sk_reuseport_cb is changed under lock_sock().
      The only exception is reuseport_grow() & TCP reqsk migration case.
      
        1) shutdown() TCP listener, which is moved into the latter part of
           reuse->socks[] to migrate reqsk.
      
        2) New listen() overflows reuse->socks[] and call reuseport_grow().
      
        3) reuse->max_socks overflows u16 with the new listener.
      
        4) reuseport_grow() pops the old shutdown()ed listener from the array
           and update its sk->sk_reuseport_cb as NULL without lock_sock().
      
      shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
      but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
      so likely(reuse) never be false in reuseport_has_conns_set().
      
      [0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/
      
      Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      69421bf9
  3. 17 Oct, 2022 5 commits
  4. 16 Oct, 2022 1 commit
    • Eric Dumazet's avatar
      skmsg: pass gfp argument to alloc_sk_msg() · 2d1f274b
      Eric Dumazet authored
      syzbot found that alloc_sk_msg() could be called from a
      non sleepable context. sk_psock_verdict_recv() uses
      rcu_read_lock() protection.
      
      We need the callers to pass a gfp_t argument to avoid issues.
      
      syzbot report was:
      
      BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
      in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3613, name: syz-executor414
      preempt_count: 0, expected: 0
      RCU nest depth: 1, expected: 0
      INFO: lockdep is turned off.
      CPU: 0 PID: 3613 Comm: syz-executor414 Not tainted 6.0.0-syzkaller-09589-g55be6084 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
      __might_resched+0x538/0x6a0 kernel/sched/core.c:9877
      might_alloc include/linux/sched/mm.h:274 [inline]
      slab_pre_alloc_hook mm/slab.h:700 [inline]
      slab_alloc_node mm/slub.c:3162 [inline]
      slab_alloc mm/slub.c:3256 [inline]
      kmem_cache_alloc_trace+0x59/0x310 mm/slub.c:3287
      kmalloc include/linux/slab.h:600 [inline]
      kzalloc include/linux/slab.h:733 [inline]
      alloc_sk_msg net/core/skmsg.c:507 [inline]
      sk_psock_skb_ingress_self+0x5c/0x330 net/core/skmsg.c:600
      sk_psock_verdict_apply+0x395/0x440 net/core/skmsg.c:1014
      sk_psock_verdict_recv+0x34d/0x560 net/core/skmsg.c:1201
      tcp_read_skb+0x4a1/0x790 net/ipv4/tcp.c:1770
      tcp_rcv_established+0x129d/0x1a10 net/ipv4/tcp_input.c:5971
      tcp_v4_do_rcv+0x479/0xac0 net/ipv4/tcp_ipv4.c:1681
      sk_backlog_rcv include/net/sock.h:1109 [inline]
      __release_sock+0x1d8/0x4c0 net/core/sock.c:2906
      release_sock+0x5d/0x1c0 net/core/sock.c:3462
      tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1483
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg net/socket.c:734 [inline]
      __sys_sendto+0x46d/0x5f0 net/socket.c:2117
      __do_sys_sendto net/socket.c:2129 [inline]
      __se_sys_sendto net/socket.c:2125 [inline]
      __x64_sys_sendto+0xda/0xf0 net/socket.c:2125
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 43312915 ("skmsg: Get rid of unncessary memset()")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cong.wang@bytedance.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d1f274b
  5. 15 Oct, 2022 11 commits
  6. 14 Oct, 2022 6 commits