1. 11 Aug, 2022 2 commits
    • Hawkins Jiawei's avatar
      net: refactor bpf_sk_reuseport_detach() · cf8c1e96
      Hawkins Jiawei authored
      Refactor sk_user_data dereference using more generic function
      __rcu_dereference_sk_user_data_with_flags(), which improve its
      maintainability
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cf8c1e96
    • Hawkins Jiawei's avatar
      net: fix refcount bug in sk_psock_get (2) · 2a013372
      Hawkins Jiawei authored
      Syzkaller reports refcount bug as follows:
      ------------[ cut here ]------------
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
      Modules linked in:
      CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda #0
       <TASK>
       __refcount_add_not_zero include/linux/refcount.h:163 [inline]
       __refcount_inc_not_zero include/linux/refcount.h:227 [inline]
       refcount_inc_not_zero include/linux/refcount.h:245 [inline]
       sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
       tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
       tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
       tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
       tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
       tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
       sk_backlog_rcv include/net/sock.h:1061 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2849
       release_sock+0x54/0x1b0 net/core/sock.c:3404
       inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
       __sys_shutdown_sock net/socket.c:2331 [inline]
       __sys_shutdown_sock net/socket.c:2325 [inline]
       __sys_shutdown+0xf1/0x1b0 net/socket.c:2343
       __do_sys_shutdown net/socket.c:2351 [inline]
       __se_sys_shutdown net/socket.c:2349 [inline]
       __x64_sys_shutdown+0x50/0x70 net/socket.c:2349
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
       </TASK>
      
      During SMC fallback process in connect syscall, kernel will
      replaces TCP with SMC. In order to forward wakeup
      smc socket waitqueue after fallback, kernel will sets
      clcsk->sk_user_data to origin smc socket in
      smc_fback_replace_callbacks().
      
      Later, in shutdown syscall, kernel will calls
      sk_psock_get(), which treats the clcsk->sk_user_data
      as psock type, triggering the refcnt warning.
      
      So, the root cause is that smc and psock, both will use
      sk_user_data field. So they will mismatch this field
      easily.
      
      This patch solves it by using another bit(defined as
      SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
      sk_user_data points to a psock object or not.
      This patch depends on a PTRMASK introduced in commit f1ff5ce2
      ("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
      
      For there will possibly be more flags in the sk_user_data field,
      this patch also refactor sk_user_data flags code to be more generic
      to improve its maintainability.
      
      Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2a013372
  2. 10 Aug, 2022 18 commits
    • Jakub Kicinski's avatar
      genetlink: correct uAPI defines · f329a0eb
      Jakub Kicinski authored
      Commit 50a896cf ("genetlink: properly support per-op policy dumping")
      seems to have copy'n'pasted things a little incorrectly.
      
      The #define CTRL_ATTR_MCAST_GRP_MAX should have stayed right
      after the previous enum. The new CTRL_ATTR_POLICY_* needs
      its own define for MAX and that max should not contain the
      superfluous _DUMP in the name.
      
      We probably can't do anything about the CTRL_ATTR_POLICY_DUMP_MAX
      any more, there's likely code which uses it. For consistency
      (*cough* codegen *cough*) let's add the correctly name define
      nonetheless.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f329a0eb
    • Ido Schimmel's avatar
      devlink: Fix use-after-free after a failed reload · 6b4db2e5
      Ido Schimmel authored
      After a failed devlink reload, devlink parameters are still registered,
      which means user space can set and get their values. In the case of the
      mlxsw "acl_region_rehash_interval" parameter, these operations will
      trigger a use-after-free [1].
      
      Fix this by rejecting set and get operations while in the failed state.
      Return the "-EOPNOTSUPP" error code which does not abort the parameters
      dump, but instead causes it to skip over the problematic parameter.
      
      Another possible fix is to perform these checks in the mlxsw parameter
      callbacks, but other drivers might be affected by the same problem and I
      am not aware of scenarios where these stricter checks will cause a
      regression.
      
      [1]
      mlxsw_spectrum3 0000:00:10.0: Port 125: Failed to register netdev
      mlxsw_spectrum3 0000:00:10.0: Failed to create ports
      
      ==================================================================
      BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
      Read of size 4 at addr ffff8880099dcfd8 by task kworker/u4:4/777
      
      CPU: 1 PID: 777 Comm: kworker/u4:4 Not tainted 5.19.0-rc7-custom-126601-gfe26f28c586d #1
      Hardware name: QEMU MSN4700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: netns cleanup_net
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x92/0xbd lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:313 [inline]
       print_report.cold+0x5e/0x5cf mm/kasan/report.c:429
       kasan_report+0xb9/0xf0 mm/kasan/report.c:491
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report_generic.c:306
       mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904
       mlxsw_sp_acl_region_rehash_intrvl_get+0x49/0x60 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c:1106
       mlxsw_sp_params_acl_region_rehash_intrvl_get+0x33/0x80 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3854
       devlink_param_get net/core/devlink.c:4981 [inline]
       devlink_nl_param_fill+0x238/0x12d0 net/core/devlink.c:5089
       devlink_param_notify+0xe5/0x230 net/core/devlink.c:5168
       devlink_ns_change_notify net/core/devlink.c:4417 [inline]
       devlink_ns_change_notify net/core/devlink.c:4396 [inline]
       devlink_reload+0x15f/0x700 net/core/devlink.c:4507
       devlink_pernet_pre_exit+0x112/0x1d0 net/core/devlink.c:12272
       ops_pre_exit_list net/core/net_namespace.c:152 [inline]
       cleanup_net+0x494/0xc00 net/core/net_namespace.c:582
       process_one_work+0x9fc/0x1710 kernel/workqueue.c:2289
       worker_thread+0x675/0x10b0 kernel/workqueue.c:2436
       kthread+0x30c/0x3d0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
       </TASK>
      
      The buggy address belongs to the physical page:
      page:ffffea0000267700 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99dc
      flags: 0x100000000000000(node=0|zone=1)
      raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880099dce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8880099dcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff8880099dcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                          ^
       ffff8880099dd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8880099dd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      
      Fixes: 98bbf70c ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b4db2e5
    • Sun Shouxin's avatar
      net:bonding:support balance-alb interface with vlan to bridge · d5410ac7
      Sun Shouxin authored
      In my test, balance-alb bonding with two slaves eth0 and eth1,
      and then Bond0.150 is created with vlan id attached bond0.
      After adding bond0.150 into one linux bridge, I noted that Bond0,
      bond0.150 and  bridge were assigned to the same MAC as eth0.
      Once bond0.150 receives a packet whose dest IP is bridge's
      and dest MAC is eth1's, the linux bridge will not match
      eth1's MAC entry in FDB, and not handle it as expected.
      The patch fix the issue, and diagram as below:
      
      eth1(mac:eth1_mac)--bond0(balance-alb,mac:eth0_mac)--eth0(mac:eth0_mac)
                            |
                         bond0.150(mac:eth0_mac)
                            |
                         bridge(ip:br_ip, mac:eth0_mac)--other port
      Suggested-by: default avatarHu Yadi <huyd12@chinatelecom.cn>
      Signed-off-by: default avatarSun Shouxin <sunshouxin@chinatelecom.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d5410ac7
    • Clayton Yager's avatar
      macsec: Fix traffic counters/statistics · 91ec9bd5
      Clayton Yager authored
      OutOctetsProtected, OutOctetsEncrypted, InOctetsValidated, and
      InOctetsDecrypted were incrementing by the total number of octets in frames
      instead of by the number of octets of User Data in frames.
      
      The Controlled Port statistics ifOutOctets and ifInOctets were incrementing
      by the total number of octets instead of the number of octets of the MSDUs
      plus octets of the destination and source MAC addresses.
      
      The Controlled Port statistics ifInDiscards and ifInErrors were not
      incrementing each time the counters they aggregate were.
      
      The Controlled Port statistic ifInErrors was not included in the output of
      macsec_get_stats64 so the value was not present in ip commands output.
      
      The ReceiveSA counters InPktsNotValid, InPktsNotUsingSA, and InPktsUnusedSA
      were not incrementing.
      Signed-off-by: default avatarClayton Yager <Clayton_Yager@selinc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91ec9bd5
    • Peilin Ye's avatar
      vsock: Set socket state back to SS_UNCONNECTED in vsock_connect_timeout() · a3e7b29e
      Peilin Ye authored
      Imagine two non-blocking vsock_connect() requests on the same socket.
      The first request schedules @connect_work, and after it times out,
      vsock_connect_timeout() sets *sock* state back to TCP_CLOSE, but keeps
      *socket* state as SS_CONNECTING.
      
      Later, the second request returns -EALREADY, meaning the socket "already
      has a pending connection in progress", even though the first request has
      already timed out.
      
      As suggested by Stefano, fix it by setting *socket* state back to
      SS_UNCONNECTED, so that the second request will return -ETIMEDOUT.
      Suggested-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3e7b29e
    • Peilin Ye's avatar
      vsock: Fix memory leak in vsock_connect() · 7e97cfed
      Peilin Ye authored
      An O_NONBLOCK vsock_connect() request may try to reschedule
      @connect_work.  Imagine the following sequence of vsock_connect()
      requests:
      
        1. The 1st, non-blocking request schedules @connect_work, which will
           expire after 200 jiffies.  Socket state is now SS_CONNECTING;
      
        2. Later, the 2nd, blocking request gets interrupted by a signal after
           a few jiffies while waiting for the connection to be established.
           Socket state is back to SS_UNCONNECTED, but @connect_work is still
           pending, and will expire after 100 jiffies.
      
        3. Now, the 3rd, non-blocking request tries to schedule @connect_work
           again.  Since @connect_work is already scheduled,
           schedule_delayed_work() silently returns.  sock_hold() is called
           twice, but sock_put() will only be called once in
           vsock_connect_timeout(), causing a memory leak reported by syzbot:
      
        BUG: memory leak
        unreferenced object 0xffff88810ea56a40 (size 1232):
          comm "syz-executor756", pid 3604, jiffies 4294947681 (age 12.350s)
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            28 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  (..@............
          backtrace:
            [<ffffffff837c830e>] sk_prot_alloc+0x3e/0x1b0 net/core/sock.c:1930
            [<ffffffff837cbe22>] sk_alloc+0x32/0x2e0 net/core/sock.c:1989
            [<ffffffff842ccf68>] __vsock_create.constprop.0+0x38/0x320 net/vmw_vsock/af_vsock.c:734
            [<ffffffff842ce8f1>] vsock_create+0xc1/0x2d0 net/vmw_vsock/af_vsock.c:2203
            [<ffffffff837c0cbb>] __sock_create+0x1ab/0x2b0 net/socket.c:1468
            [<ffffffff837c3acf>] sock_create net/socket.c:1519 [inline]
            [<ffffffff837c3acf>] __sys_socket+0x6f/0x140 net/socket.c:1561
            [<ffffffff837c3bba>] __do_sys_socket net/socket.c:1570 [inline]
            [<ffffffff837c3bba>] __se_sys_socket net/socket.c:1568 [inline]
            [<ffffffff837c3bba>] __x64_sys_socket+0x1a/0x20 net/socket.c:1568
            [<ffffffff84512815>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
            [<ffffffff84512815>] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:80
            [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
        <...>
      
      Use mod_delayed_work() instead: if @connect_work is already scheduled,
      reschedule it, and undo sock_hold() to keep the reference count
      balanced.
      
      Reported-and-tested-by: syzbot+b03f55bf128f9a38f064@syzkaller.appspotmail.com
      Fixes: d021c344 ("VSOCK: Introduce VM Sockets")
      Co-developed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e97cfed
    • Jose Alonso's avatar
      Revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" · 6fd2c17f
      Jose Alonso authored
      This reverts commit 36a15e1c.
      
      The usage of FLAG_SEND_ZLP causes problems to other firmware/hardware
      versions that have no issues.
      
      The FLAG_SEND_ZLP is not safe to use in this context.
      See:
      https://patchwork.ozlabs.org/project/netdev/patch/1270599787.8900.8.camel@Linuxdev4-laptop/#118378
      The original problem needs another way to solve.
      
      Fixes: 36a15e1c ("net: usb: ax88179_178a needs FLAG_SEND_ZLP")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216327
      Link: https://bugs.archlinux.org/task/75491Signed-off-by: default avatarJose Alonso <joalonsof@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6fd2c17f
    • Topi Miettinen's avatar
      netlabel: fix typo in comment · 2cd0e8db
      Topi Miettinen authored
      'IPv4 and IPv4' should be 'IPv4 and IPv6'.
      Signed-off-by: default avatarTopi Miettinen <toiwoton@gmail.com>
      Acked-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cd0e8db
    • David S. Miller's avatar
      Merge tag 'linux-can-fixes-for-6.0-20220810' of... · e7f16495
      David S. Miller authored
      Merge tag 'linux-can-fixes-for-6.0-20220810' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      this is a pull request of 4 patches for net/master, with the
      whitespace issue fixed.
      
      Fedor Pchelkin contributes 2 fixes for the j1939 CAN protocol.
      
      A patch by me for the ems_usb driver fixes an unaligned access
      warning.
      
      Sebastian Würl's patch for the mcp251x driver fixes a race condition
      in the receive interrupt.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7f16495
    • Jakub Kicinski's avatar
      Merge branch 'do-not-use-rt_tos-for-ipv6-flowlabel' · 996237d9
      Jakub Kicinski authored
      Matthias May says:
      
      ====================
      Do not use RT_TOS for IPv6 flowlabel
      
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      ====================
      
      Link: https://lore.kernel.org/r/20220805191906.9323-1-matthias.may@westermo.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      996237d9
    • Matthias May's avatar
      ipv6: do not use RT_TOS for IPv6 flowlabel · ab7e2e0d
      Matthias May authored
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: 571912c6 ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ab7e2e0d
    • Matthias May's avatar
      mlx5: do not use RT_TOS for IPv6 flowlabel · bcb0da7f
      Matthias May authored
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: ce99f6b9 ("net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels")
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bcb0da7f
    • Matthias May's avatar
      vxlan: do not use RT_TOS for IPv6 flowlabel · e488d4f5
      Matthias May authored
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: 1400615d ("vxlan: allow setting ipv6 traffic class")
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e488d4f5
    • Matthias May's avatar
      geneve: do not use RT_TOS for IPv6 flowlabel · ca2bb695
      Matthias May authored
      According to Guillaume Nault RT_TOS should never be used for IPv6.
      
      Quote:
      RT_TOS() is an old macro used to interprete IPv4 TOS as described in
      the obsolete RFC 1349. It's conceptually wrong to use it even in IPv4
      code, although, given the current state of the code, most of the
      existing calls have no consequence.
      
      But using RT_TOS() in IPv6 code is always a bug: IPv6 never had a "TOS"
      field to be interpreted the RFC 1349 way. There's no historical
      compatibility to worry about.
      
      Fixes: 3a56f86f ("geneve: handle ipv6 priority like ipv4 tos")
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ca2bb695
    • Matthias May's avatar
      geneve: fix TOS inheriting for ipv4 · b4ab94d6
      Matthias May authored
      The current code retrieves the TOS field after the lookup
      on the ipv4 routing table. The routing process currently
      only allows routing based on the original 3 TOS bits, and
      not on the full 6 DSCP bits.
      As a result the retrieved TOS is cut to the 3 bits.
      However for inheriting purposes the full 6 bits should be used.
      
      Extract the full 6 bits before the route lookup and use
      that instead of the cut off 3 TOS bits.
      
      Fixes: e305ac6c ("geneve: Add support to collect tunnel metadata.")
      Signed-off-by: default avatarMatthias May <matthias.may@westermo.com>
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Link: https://lore.kernel.org/r/20220805190006.8078-1-matthias.may@westermo.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b4ab94d6
    • Chia-Lin Kao (AceLan)'s avatar
      net: atlantic: fix aq_vec index out of range error · 2ba5e47f
      Chia-Lin Kao (AceLan) authored
      The final update statement of the for loop exceeds the array range, the
      dereference of self->aq_vec[i] is not checked and then leads to the
      index out of range error.
      Also fixed this kind of coding style in other for loop.
      
      [   97.937604] UBSAN: array-index-out-of-bounds in drivers/net/ethernet/aquantia/atlantic/aq_nic.c:1404:48
      [   97.937607] index 8 is out of range for type 'aq_vec_s *[8]'
      [   97.937608] CPU: 38 PID: 3767 Comm: kworker/u256:18 Not tainted 5.19.0+ #2
      [   97.937610] Hardware name: Dell Inc. Precision 7865 Tower/, BIOS 1.0.0 06/12/2022
      [   97.937611] Workqueue: events_unbound async_run_entry_fn
      [   97.937616] Call Trace:
      [   97.937617]  <TASK>
      [   97.937619]  dump_stack_lvl+0x49/0x63
      [   97.937624]  dump_stack+0x10/0x16
      [   97.937626]  ubsan_epilogue+0x9/0x3f
      [   97.937627]  __ubsan_handle_out_of_bounds.cold+0x44/0x49
      [   97.937629]  ? __scm_send+0x348/0x440
      [   97.937632]  ? aq_vec_stop+0x72/0x80 [atlantic]
      [   97.937639]  aq_nic_stop+0x1b6/0x1c0 [atlantic]
      [   97.937644]  aq_suspend_common+0x88/0x90 [atlantic]
      [   97.937648]  aq_pm_suspend_poweroff+0xe/0x20 [atlantic]
      [   97.937653]  pci_pm_suspend+0x7e/0x1a0
      [   97.937655]  ? pci_pm_suspend_noirq+0x2b0/0x2b0
      [   97.937657]  dpm_run_callback+0x54/0x190
      [   97.937660]  __device_suspend+0x14c/0x4d0
      [   97.937661]  async_suspend+0x23/0x70
      [   97.937663]  async_run_entry_fn+0x33/0x120
      [   97.937664]  process_one_work+0x21f/0x3f0
      [   97.937666]  worker_thread+0x4a/0x3c0
      [   97.937668]  ? process_one_work+0x3f0/0x3f0
      [   97.937669]  kthread+0xf0/0x120
      [   97.937671]  ? kthread_complete_and_exit+0x20/0x20
      [   97.937672]  ret_from_fork+0x22/0x30
      [   97.937676]  </TASK>
      
      v2. fixed "warning: variable 'aq_vec' set but not used"
      
      v3. simplified a for loop
      
      Fixes: 97bde5c4 ("net: ethernet: aquantia: Support for NIC-specific code")
      Signed-off-by: default avatarChia-Lin Kao (AceLan) <acelan.kao@canonical.com>
      Acked-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Link: https://lore.kernel.org/r/20220808081845.42005-1-acelan.kao@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2ba5e47f
    • Christophe JAILLET's avatar
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 690bf643
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Harden set element field checks to avoid out-of-bound memory access,
         this patch also fixes the type of issue described in 7e6bc1f6
         ("netfilter: nf_tables: stricter validation of element data") in a
         broader way.
      
      2) Patches to restrict the chain, set, and rule id lookup in the
         transaction to the corresponding top-level table, patches from
         Thadeu Lima de Souza Cascardo.
      
      3) Fix incorrect comment in ip6t_LOG.h
      
      4) nft_data_init() performs upfront validation of the expected data.
         struct nft_data_desc is used to describe the expected data to be
         received from userspace. The .size field represents the maximum size
         that can be stored, for bound checks. Then, .len is an input/output field
         which stores the expected length as input (this is optional, to restrict
         the checks), as output it stores the real length received from userspace
         (if it was not specified as input). This patch comes in response to
         7e6bc1f6 ("netfilter: nf_tables: stricter validation of element data")
         to address this type of issue in a more generic way by avoid opencoded
         data validation. Next patch requires this as a dependency.
      
      5) Disallow jump to implicit chain from set element, this configuration
         is invalid. Only allow jump to chain via immediate expression is
         supported at this stage.
      
      6) Fix possible null-pointer derefence in the error path of table updates,
         if memory allocation of the transaction fails. From Florian Westphal.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: fix null deref due to zeroed list head
        netfilter: nf_tables: disallow jump to implicit chain from set element
        netfilter: nf_tables: upfront validation of data via nft_data_init()
        netfilter: ip6t_LOG: Fix a typo in a comment
        netfilter: nf_tables: do not allow RULE_ID to refer to another chain
        netfilter: nf_tables: do not allow CHAIN_ID to refer to another table
        netfilter: nf_tables: do not allow SET_ID to refer to another table
        netfilter: nf_tables: validate variable length element extension
      ====================
      
      Link: https://lore.kernel.org/r/20220809220532.130240-1-pablo@netfilter.org/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      690bf643
  3. 09 Aug, 2022 20 commits