An error occurred fetching the project authors.
  1. 29 Mar, 2024 1 commit
    • Antoine Tenart's avatar
      udp: do not accept non-tunnel GSO skbs landing in a tunnel · 3d010c80
      Antoine Tenart authored
      When rx-udp-gro-forwarding is enabled UDP packets might be GROed when
      being forwarded. If such packets might land in a tunnel this can cause
      various issues and udp_gro_receive makes sure this isn't the case by
      looking for a matching socket. This is performed in
      udp4/6_gro_lookup_skb but only in the current netns. This is an issue
      with tunneled packets when the endpoint is in another netns. In such
      cases the packets will be GROed at the UDP level, which leads to various
      issues later on. The same thing can happen with rx-gro-list.
      
      We saw this with geneve packets being GROed at the UDP level. In such
      case gso_size is set; later the packet goes through the geneve rx path,
      the geneve header is pulled, the offset are adjusted and frag_list skbs
      are not adjusted with regard to geneve. When those skbs hit
      skb_fragment, it will misbehave. Different outcomes are possible
      depending on what the GROed skbs look like; from corrupted packets to
      kernel crashes.
      
      One example is a BUG_ON[1] triggered in skb_segment while processing the
      frag_list. Because gso_size is wrong (geneve header was pulled)
      skb_segment thinks there is "geneve header size" of data in frag_list,
      although it's in fact the next packet. The BUG_ON itself has nothing to
      do with the issue. This is only one of the potential issues.
      
      Looking up for a matching socket in udp_gro_receive is fragile: the
      lookup could be extended to all netns (not speaking about performances)
      but nothing prevents those packets from being modified in between and we
      could still not find a matching socket. It's OK to keep the current
      logic there as it should cover most cases but we also need to make sure
      we handle tunnel packets being GROed too early.
      
      This is done by extending the checks in udp_unexpected_gso: GSO packets
      lacking the SKB_GSO_UDP_TUNNEL/_CSUM bits and landing in a tunnel must
      be segmented.
      
      [1] kernel BUG at net/core/skbuff.c:4408!
          RIP: 0010:skb_segment+0xd2a/0xf70
          __udp_gso_segment+0xaa/0x560
      
      Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.")
      Fixes: 36707061 ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3d010c80
  2. 11 Mar, 2024 2 commits
  3. 08 Mar, 2024 1 commit
  4. 22 Feb, 2024 1 commit
  5. 21 Feb, 2024 1 commit
    • Eric Dumazet's avatar
      net: implement lockless setsockopt(SO_PEEK_OFF) · 56667da7
      Eric Dumazet authored
      syzbot reported a lockdep violation [1] involving af_unix
      support of SO_PEEK_OFF.
      
      Since SO_PEEK_OFF is inherently not thread safe (it uses a per-socket
      sk_peek_off field), there is really no point to enforce a pointless
      thread safety in the kernel.
      
      After this patch :
      
      - setsockopt(SO_PEEK_OFF) no longer acquires the socket lock.
      
      - skb_consume_udp() no longer has to acquire the socket lock.
      
      - af_unix no longer needs a special version of sk_set_peek_off(),
        because it does not lock u->iolock anymore.
      
      As a followup, we could replace prot->set_peek_off to be a boolean
      and avoid an indirect call, since we always use sk_set_peek_off().
      
      [1]
      
      WARNING: possible circular locking dependency detected
      6.8.0-rc4-syzkaller-00267-g0f1dd5e9 #0 Not tainted
      
      syz-executor.2/30025 is trying to acquire lock:
       ffff8880765e7d80 (&u->iolock){+.+.}-{3:3}, at: unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
      
      but task is already holding lock:
       ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
       ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sockopt_lock_sock net/core/sock.c:1060 [inline]
       ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sk_setsockopt+0xe52/0x3360 net/core/sock.c:1193
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (sk_lock-AF_UNIX){+.+.}-{0:0}:
              lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
              lock_sock_nested+0x48/0x100 net/core/sock.c:3524
              lock_sock include/net/sock.h:1691 [inline]
              __unix_dgram_recvmsg+0x1275/0x12c0 net/unix/af_unix.c:2415
              sock_recvmsg_nosec+0x18e/0x1d0 net/socket.c:1046
              ____sys_recvmsg+0x3c0/0x470 net/socket.c:2801
              ___sys_recvmsg net/socket.c:2845 [inline]
              do_recvmmsg+0x474/0xae0 net/socket.c:2939
              __sys_recvmmsg net/socket.c:3018 [inline]
              __do_sys_recvmmsg net/socket.c:3041 [inline]
              __se_sys_recvmmsg net/socket.c:3034 [inline]
              __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3034
             do_syscall_64+0xf9/0x240
             entry_SYSCALL_64_after_hwframe+0x6f/0x77
      
      -> #0 (&u->iolock){+.+.}-{3:3}:
              check_prev_add kernel/locking/lockdep.c:3134 [inline]
              check_prevs_add kernel/locking/lockdep.c:3253 [inline]
              validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
              __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
              lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
              __mutex_lock_common kernel/locking/mutex.c:608 [inline]
              __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
              unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
             sk_setsockopt+0x207e/0x3360
              do_sock_setsockopt+0x2fb/0x720 net/socket.c:2307
              __sys_setsockopt+0x1ad/0x250 net/socket.c:2334
              __do_sys_setsockopt net/socket.c:2343 [inline]
              __se_sys_setsockopt net/socket.c:2340 [inline]
              __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
             do_syscall_64+0xf9/0x240
             entry_SYSCALL_64_after_hwframe+0x6f/0x77
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(sk_lock-AF_UNIX);
                                     lock(&u->iolock);
                                     lock(sk_lock-AF_UNIX);
        lock(&u->iolock);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor.2/30025:
        #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1691 [inline]
        #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sockopt_lock_sock net/core/sock.c:1060 [inline]
        #0: ffff8880765e7930 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: sk_setsockopt+0xe52/0x3360 net/core/sock.c:1193
      
      stack backtrace:
      CPU: 0 PID: 30025 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-00267-g0f1dd5e9 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
        check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
        check_prev_add kernel/locking/lockdep.c:3134 [inline]
        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
        validate_chain+0x18ca/0x58e0 kernel/locking/lockdep.c:3869
        __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        __mutex_lock_common kernel/locking/mutex.c:608 [inline]
        __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
        unix_set_peek_off+0x26/0xa0 net/unix/af_unix.c:789
       sk_setsockopt+0x207e/0x3360
        do_sock_setsockopt+0x2fb/0x720 net/socket.c:2307
        __sys_setsockopt+0x1ad/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xf9/0x240
       entry_SYSCALL_64_after_hwframe+0x6f/0x77
      RIP: 0033:0x7f78a1c7dda9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f78a0fde0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007f78a1dac050 RCX: 00007f78a1c7dda9
      RDX: 000000000000002a RSI: 0000000000000001 RDI: 0000000000000006
      RBP: 00007f78a1cca47a R08: 0000000000000004 R09: 0000000000000000
      R10: 0000000020000180 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000006e R14: 00007f78a1dac050 R15: 00007ffe5cd81ae8
      
      Fixes: 859051dd ("bpf: Implement cgroup sockaddr hooks for unix sockets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Daan De Meyer <daan.j.demeyer@gmail.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Martin KaFai Lau <martin.lau@kernel.org>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56667da7
  6. 27 Jan, 2024 1 commit
    • Nicolas Dichtel's avatar
      ipmr: fix kernel panic when forwarding mcast packets · e622502c
      Nicolas Dichtel authored
      The stacktrace was:
      [   86.305548] BUG: kernel NULL pointer dereference, address: 0000000000000092
      [   86.306815] #PF: supervisor read access in kernel mode
      [   86.307717] #PF: error_code(0x0000) - not-present page
      [   86.308624] PGD 0 P4D 0
      [   86.309091] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [   86.309883] CPU: 2 PID: 3139 Comm: pimd Tainted: G     U             6.8.0-6wind-knet #1
      [   86.311027] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
      [   86.312728] RIP: 0010:ip_mr_forward (/build/work/knet/net/ipv4/ipmr.c:1985)
      [ 86.313399] Code: f9 1f 0f 87 85 03 00 00 48 8d 04 5b 48 8d 04 83 49 8d 44 c5 00 48 8b 40 70 48 39 c2 0f 84 d9 00 00 00 49 8b 46 58 48 83 e0 fe <80> b8 92 00 00 00 00 0f 84 55 ff ff ff 49 83 47 38 01 45 85 e4 0f
      [   86.316565] RSP: 0018:ffffad21c0583ae0 EFLAGS: 00010246
      [   86.317497] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [   86.318596] RDX: ffff9559cb46c000 RSI: 0000000000000000 RDI: 0000000000000000
      [   86.319627] RBP: ffffad21c0583b30 R08: 0000000000000000 R09: 0000000000000000
      [   86.320650] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
      [   86.321672] R13: ffff9559c093a000 R14: ffff9559cc00b800 R15: ffff9559c09c1d80
      [   86.322873] FS:  00007f85db661980(0000) GS:ffff955a79d00000(0000) knlGS:0000000000000000
      [   86.324291] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   86.325314] CR2: 0000000000000092 CR3: 000000002f13a000 CR4: 0000000000350ef0
      [   86.326589] Call Trace:
      [   86.327036]  <TASK>
      [   86.327434] ? show_regs (/build/work/knet/arch/x86/kernel/dumpstack.c:479)
      [   86.328049] ? __die (/build/work/knet/arch/x86/kernel/dumpstack.c:421 /build/work/knet/arch/x86/kernel/dumpstack.c:434)
      [   86.328508] ? page_fault_oops (/build/work/knet/arch/x86/mm/fault.c:707)
      [   86.329107] ? do_user_addr_fault (/build/work/knet/arch/x86/mm/fault.c:1264)
      [   86.329756] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
      [   86.330350] ? __irq_work_queue_local (/build/work/knet/kernel/irq_work.c:111 (discriminator 1))
      [   86.331013] ? exc_page_fault (/build/work/knet/./arch/x86/include/asm/paravirt.h:693 /build/work/knet/arch/x86/mm/fault.c:1515 /build/work/knet/arch/x86/mm/fault.c:1563)
      [   86.331702] ? asm_exc_page_fault (/build/work/knet/./arch/x86/include/asm/idtentry.h:570)
      [   86.332468] ? ip_mr_forward (/build/work/knet/net/ipv4/ipmr.c:1985)
      [   86.333183] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
      [   86.333920] ipmr_mfc_add (/build/work/knet/./include/linux/rcupdate.h:782 /build/work/knet/net/ipv4/ipmr.c:1009 /build/work/knet/net/ipv4/ipmr.c:1273)
      [   86.334583] ? __pfx_ipmr_hash_cmp (/build/work/knet/net/ipv4/ipmr.c:363)
      [   86.335357] ip_mroute_setsockopt (/build/work/knet/net/ipv4/ipmr.c:1470)
      [   86.336135] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
      [   86.336854] ? ip_mroute_setsockopt (/build/work/knet/net/ipv4/ipmr.c:1470)
      [   86.337679] do_ip_setsockopt (/build/work/knet/net/ipv4/ip_sockglue.c:944)
      [   86.338408] ? __pfx_unix_stream_read_actor (/build/work/knet/net/unix/af_unix.c:2862)
      [   86.339232] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
      [   86.339809] ? aa_sk_perm (/build/work/knet/security/apparmor/include/cred.h:153 /build/work/knet/security/apparmor/net.c:181)
      [   86.340342] ip_setsockopt (/build/work/knet/net/ipv4/ip_sockglue.c:1415)
      [   86.340859] raw_setsockopt (/build/work/knet/net/ipv4/raw.c:836)
      [   86.341408] ? security_socket_setsockopt (/build/work/knet/security/security.c:4561 (discriminator 13))
      [   86.342116] sock_common_setsockopt (/build/work/knet/net/core/sock.c:3716)
      [   86.342747] do_sock_setsockopt (/build/work/knet/net/socket.c:2313)
      [   86.343363] __sys_setsockopt (/build/work/knet/./include/linux/file.h:32 /build/work/knet/net/socket.c:2336)
      [   86.344020] __x64_sys_setsockopt (/build/work/knet/net/socket.c:2340)
      [   86.344766] do_syscall_64 (/build/work/knet/arch/x86/entry/common.c:52 /build/work/knet/arch/x86/entry/common.c:83)
      [   86.345433] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
      [   86.346161] ? syscall_exit_work (/build/work/knet/./include/linux/audit.h:357 /build/work/knet/kernel/entry/common.c:160)
      [   86.346938] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
      [   86.347657] ? syscall_exit_to_user_mode (/build/work/knet/kernel/entry/common.c:215)
      [   86.348538] ? srso_return_thunk (/build/work/knet/arch/x86/lib/retpoline.S:223)
      [   86.349262] ? do_syscall_64 (/build/work/knet/./arch/x86/include/asm/cpufeature.h:171 /build/work/knet/arch/x86/entry/common.c:98)
      [   86.349971] entry_SYSCALL_64_after_hwframe (/build/work/knet/arch/x86/entry/entry_64.S:129)
      
      The original packet in ipmr_cache_report() may be queued and then forwarded
      with ip_mr_forward(). This last function has the assumption that the skb
      dst is set.
      
      After the below commit, the skb dst is dropped by ipv4_pktinfo_prepare(),
      which causes the oops.
      
      Fixes: bb740365 ("ipmr: support IP_PKTINFO on cache report IGMP msg")
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240125141847.1931933-1-nicolas.dichtel@6wind.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e622502c
  7. 13 Jan, 2024 3 commits
    • Martin KaFai Lau's avatar
      bpf: Avoid iter->offset making backward progress in bpf_iter_udp · 2242fd53
      Martin KaFai Lau authored
      There is a bug in the bpf_iter_udp_batch() function that stops
      the userspace from making forward progress.
      
      The case that triggers the bug is the userspace passed in
      a very small read buffer. When the bpf prog does bpf_seq_printf,
      the userspace read buffer is not enough to capture the whole bucket.
      
      When the read buffer is not large enough, the kernel will remember
      the offset of the bucket in iter->offset such that the next userspace
      read() can continue from where it left off.
      
      The kernel will skip the number (== "iter->offset") of sockets in
      the next read(). However, the code directly decrements the
      "--iter->offset". This is incorrect because the next read() may
      not consume the whole bucket either and then the next-next read()
      will start from offset 0. The net effect is the userspace will
      keep reading from the beginning of a bucket and the process will
      never finish. "iter->offset" must always go forward until the
      whole bucket is consumed.
      
      This patch fixes it by using a local variable "resume_offset"
      and "resume_bucket". "iter->offset" is always reset to 0 before
      it may be used. "iter->offset" will be advanced to the
      "resume_offset" when it continues from the "resume_bucket" (i.e.
      "state->bucket == resume_bucket"). This brings it closer to
      the bpf_iter_tcp's offset handling which does not suffer
      the same bug.
      
      Cc: Aditi Ghag <aditi.ghag@isovalent.com>
      Fixes: c96dac8d ("bpf: udp: Implement batching for sockets iterator")
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Reviewed-by: default avatarAditi Ghag <aditi.ghag@isovalent.com>
      Link: https://lore.kernel.org/r/20240112190530.3751661-3-martin.lau@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2242fd53
    • Martin KaFai Lau's avatar
      bpf: iter_udp: Retry with a larger batch size without going back to the previous bucket · 19ca0823
      Martin KaFai Lau authored
      The current logic is to use a default size 16 to batch the whole bucket.
      If it is too small, it will retry with a larger batch size.
      
      The current code accidentally does a state->bucket-- before retrying.
      This goes back to retry with the previous bucket which has already
      been done. This patch fixed it.
      
      It is hard to create a selftest. I added a WARN_ON(state->bucket < 0),
      forced a particular port to be hashed to the first bucket,
      created >16 sockets, and observed the for-loop went back
      to the "-1" bucket.
      
      Cc: Aditi Ghag <aditi.ghag@isovalent.com>
      Fixes: c96dac8d ("bpf: udp: Implement batching for sockets iterator")
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Reviewed-by: default avatarAditi Ghag <aditi.ghag@isovalent.com>
      Link: https://lore.kernel.org/r/20240112190530.3751661-2-martin.lau@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      19ca0823
    • Eric Dumazet's avatar
      udp: annotate data-races around up->pending · 482521d8
      Eric Dumazet authored
      up->pending can be read without holding the socket lock,
      as pointed out by syzbot [1]
      
      Add READ_ONCE() in lockless contexts, and WRITE_ONCE()
      on write side.
      
      [1]
      BUG: KCSAN: data-race in udpv6_sendmsg / udpv6_sendmsg
      
      write to 0xffff88814e5eadf0 of 4 bytes by task 15547 on cpu 1:
       udpv6_sendmsg+0x1405/0x1530 net/ipv6/udp.c:1596
       inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:657
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       __sys_sendto+0x257/0x310 net/socket.c:2192
       __do_sys_sendto net/socket.c:2204 [inline]
       __se_sys_sendto net/socket.c:2200 [inline]
       __x64_sys_sendto+0x78/0x90 net/socket.c:2200
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      read to 0xffff88814e5eadf0 of 4 bytes by task 15551 on cpu 0:
       udpv6_sendmsg+0x22c/0x1530 net/ipv6/udp.c:1373
       inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:657
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2586
       ___sys_sendmsg net/socket.c:2640 [inline]
       __sys_sendmmsg+0x269/0x500 net/socket.c:2726
       __do_sys_sendmmsg net/socket.c:2755 [inline]
       __se_sys_sendmmsg net/socket.c:2752 [inline]
       __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2752
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      value changed: 0x00000000 -> 0x0000000a
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 15551 Comm: syz-executor.1 Tainted: G        W          6.7.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+8d482d0e407f665d9d10@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/netdev/0000000000009e46c3060ebcdffd@google.com/Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      482521d8
  8. 11 Oct, 2023 1 commit
  9. 06 Oct, 2023 2 commits
    • Steffen Klassert's avatar
      xfrm: Support GRO for IPv6 ESP in UDP encapsulation · 221ddb72
      Steffen Klassert authored
      This patch enables the GRO codepath for IPv6 ESP in UDP encapsulated
      packets. Decapsulation happens at L2 and saves a full round through
      the stack for each packet. This is also needed to support HW offload
      for ESP in UDP encapsulation.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Co-developed-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarAntony Antony <antony.antony@secunet.com>
      Reviewed-by: default avatarEyal Birger <eyal.birger@gmail.com>
      221ddb72
    • Steffen Klassert's avatar
      xfrm: Support GRO for IPv4 ESP in UDP encapsulation · 172bf009
      Steffen Klassert authored
      This patch enables the GRO codepath for IPv4 ESP in UDP encapsulated
      packets. Decapsulation happens at L2 and saves a full round through
      the stack for each packet. This is also needed to support HW offload
      for ESP in UDP encapsulation.
      
      Enabling this would imporove performance for ESP in UDP datapath, i.e
      IPsec with NAT in between.
      
      By default GRP for ESP-in-UDP is disabled for UDP sockets.
      To enable this feature for an ESP socket, the following two options
      need to be set:
      1. enable ESP-in-UDP: (this is already set by an IKE daemon).
         int type = UDP_ENCAP_ESPINUDP;
         setsockopt(fd, SOL_UDP, UDP_ENCAP, &type, sizeof(type));
      
      2. To enable GRO for ESP in UDP socket:
         type = true;
         setsockopt(fd, SOL_UDP, UDP_GRO, &type, sizeof(type));
      
      Enabling ESP-in-UDP has the side effect of preventing the Linux stack from
      seeing ESP packets at the L3 (when ESP OFFLOAD is disabled), as packets are
      immediately decapsulated from UDP and decrypted.
      This change may affect nftable rules that match on ESP packets at L3.
      Also tcpdump won't see the ESP packet.
      
      Developers/admins are advised to review and adapt any nftable rules
      accordingly before enabling this feature to prevent potential rule breakage.
      Also tcpdump will not see from ESP packets from a ESP in UDP flow, when this
      is enabled.
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Co-developed-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarAntony Antony <antony.antony@secunet.com>
      Reviewed-by: default avatarEyal Birger <eyal.birger@gmail.com>
      172bf009
  10. 01 Oct, 2023 3 commits
  11. 14 Sep, 2023 9 commits
  12. 01 Sep, 2023 1 commit
  13. 16 Aug, 2023 2 commits
  14. 29 Jul, 2023 1 commit
  15. 25 Jul, 2023 4 commits
  16. 20 Jul, 2023 1 commit
  17. 24 Jun, 2023 1 commit
  18. 16 Jun, 2023 1 commit
    • Breno Leitao's avatar
      net: ioctl: Use kernel memory on protocol ioctl callbacks · e1d001fa
      Breno Leitao authored
      Most of the ioctls to net protocols operates directly on userspace
      argument (arg). Usually doing get_user()/put_user() directly in the
      ioctl callback.  This is not flexible, because it is hard to reuse these
      functions without passing userspace buffers.
      
      Change the "struct proto" ioctls to avoid touching userspace memory and
      operate on kernel buffers, i.e., all protocol's ioctl callbacks is
      adapted to operate on a kernel memory other than on userspace (so, no
      more {put,get}_user() and friends being called in the ioctl callback).
      
      This changes the "struct proto" ioctl format in the following way:
      
          int                     (*ioctl)(struct sock *sk, int cmd,
      -                                        unsigned long arg);
      +                                        int *karg);
      
      (Important to say that this patch does not touch the "struct proto_ops"
      protocols)
      
      So, the "karg" argument, which is passed to the ioctl callback, is a
      pointer allocated to kernel space memory (inside a function wrapper).
      This buffer (karg) may contain input argument (copied from userspace in
      a prep function) and it might return a value/buffer, which is copied
      back to userspace if necessary. There is not one-size-fits-all format
      (that is I am using 'may' above), but basically, there are three type of
      ioctls:
      
      1) Do not read from userspace, returns a result to userspace
      2) Read an input parameter from userspace, and does not return anything
        to userspace
      3) Read an input from userspace, and return a buffer to userspace.
      
      The default case (1) (where no input parameter is given, and an "int" is
      returned to userspace) encompasses more than 90% of the cases, but there
      are two other exceptions. Here is a list of exceptions:
      
      * Protocol RAW:
         * cmd = SIOCGETVIFCNT:
           * input and output = struct sioc_vif_req
         * cmd = SIOCGETSGCNT
           * input and output = struct sioc_sg_req
         * Explanation: for the SIOCGETVIFCNT case, userspace passes the input
           argument, which is struct sioc_vif_req. Then the callback populates
           the struct, which is copied back to userspace.
      
      * Protocol RAW6:
         * cmd = SIOCGETMIFCNT_IN6
           * input and output = struct sioc_mif_req6
         * cmd = SIOCGETSGCNT_IN6
           * input and output = struct sioc_sg_req6
      
      * Protocol PHONET:
        * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE
           * input int (4 bytes)
        * Nothing is copied back to userspace.
      
      For the exception cases, functions sock_sk_ioctl_inout() will
      copy the userspace input, and copy it back to kernel space.
      
      The wrapper that prepare the buffer and put the buffer back to user is
      sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now
      calls sk_ioctl(), which will handle all cases.
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e1d001fa
  19. 10 Jun, 2023 1 commit
  20. 09 Jun, 2023 1 commit
  21. 24 May, 2023 2 commits