1. 05 Apr, 2023 8 commits
    • Oliver Hartkopp's avatar
      can: isotp: isotp_recvmsg(): use sock_recv_cmsgs() to get SOCK_RXQ_OVFL infos · 0145462f
      Oliver Hartkopp authored
      isotp.c was still using sock_recv_timestamp() which does not provide
      control messages to detect dropped PDUs in the receive path.
      
      Fixes: e057dd3f ("can: add ISO 15765-2:2016 transport protocol")
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/all/20230330170248.62342-1-socketcan@hartkopp.net
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      0145462f
    • Oleksij Rempel's avatar
      can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access · b45193cb
      Oleksij Rempel authored
      In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access
      could occur during the memcpy() operation if the size of skb->cb is
      larger than the size of struct j1939_sk_buff_cb. This is because the
      memcpy() operation uses the size of skb->cb, leading to a read beyond
      the struct j1939_sk_buff_cb.
      
      Updated the memcpy() operation to use the size of struct
      j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the
      memcpy() operation only reads the memory within the bounds of struct
      j1939_sk_buff_cb, preventing out-of-bounds memory access.
      
      Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb
      is greater than or equal to the size of struct j1939_sk_buff_cb. This
      ensures that the skb->cb buffer is large enough to hold the
      j1939_sk_buff_cb structure.
      
      Fixes: 9d71dd0c ("can: add support of SAE J1939 protocol")
      Reported-by: default avatarShuangpeng Bai <sjb7183@psu.edu>
      Tested-by: default avatarShuangpeng Bai <sjb7183@psu.edu>
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ
      Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de
      Cc: stable@vger.kernel.org
      [mkl: rephrase commit message]
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      b45193cb
    • Shailend Chand's avatar
      gve: Secure enough bytes in the first TX desc for all TCP pkts · 3ce93455
      Shailend Chand authored
      Non-GSO TCP packets whose SKBs' linear portion did not include the
      entire TCP header were not populating the first Tx descriptor with
      as many bytes as the vNIC expected. This change ensures that all
      TCP packets populate the first descriptor with the correct number of
      bytes.
      
      Fixes: 893ce44d ("gve: Add basic driver framework for Compute Engine Virtual NIC")
      Signed-off-by: default avatarShailend Chand <shailend@google.com>
      Link: https://lore.kernel.org/r/20230403172809.2939306-1-shailend@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3ce93455
    • Eric Dumazet's avatar
      netlink: annotate lockless accesses to nlk->max_recvmsg_len · a1865f2e
      Eric Dumazet authored
      syzbot reported a data-race in data-race in netlink_recvmsg() [1]
      
      Indeed, netlink_recvmsg() can be run concurrently,
      and netlink_dump() also needs protection.
      
      [1]
      BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
      
      read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
      netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
      sock_recvmsg_nosec net/socket.c:1017 [inline]
      sock_recvmsg net/socket.c:1038 [inline]
      __sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
      __do_sys_recvfrom net/socket.c:2212 [inline]
      __se_sys_recvfrom net/socket.c:2208 [inline]
      __x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
      netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
      sock_recvmsg_nosec net/socket.c:1017 [inline]
      sock_recvmsg net/socket.c:1038 [inline]
      ____sys_recvmsg+0x156/0x310 net/socket.c:2720
      ___sys_recvmsg net/socket.c:2762 [inline]
      do_recvmmsg+0x2e5/0x710 net/socket.c:2856
      __sys_recvmmsg net/socket.c:2935 [inline]
      __do_sys_recvmmsg net/socket.c:2958 [inline]
      __se_sys_recvmmsg net/socket.c:2951 [inline]
      __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x0000000000000000 -> 0x0000000000001000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48f #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
      
      Fixes: 9063e21f ("netlink: autosize skb lengthes")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230403214643.768555-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a1865f2e
    • Andy Roulin's avatar
      ethtool: reset #lanes when lanes is omitted · e847c767
      Andy Roulin authored
      If the number of lanes was forced and then subsequently the user
      omits this parameter, the ksettings->lanes is reset. The driver
      should then reset the number of lanes to the device's default
      for the specified speed.
      
      However, although the ksettings->lanes is set to 0, the mod variable
      is not set to true to indicate the driver and userspace should be
      notified of the changes.
      
      The consequence is that the same ethtool operation will produce
      different results based on the initial state.
      
      If the initial state is:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 2
              Duplex: Full
              Auto-negotiation: on
      
      then executing 'ethtool -s swp1 speed 50000 autoneg off' will yield:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 2
              Duplex: Full
              Auto-negotiation: off
      
      While if the initial state is:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 1
              Duplex: Full
              Auto-negotiation: off
      
      executing the same 'ethtool -s swp1 speed 50000 autoneg off' results in:
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 1
              Duplex: Full
              Auto-negotiation: off
      
      This patch fixes this behavior. Omitting lanes will always results in
      the driver choosing the default lane width for the chosen speed. In this
      scenario, regardless of the initial state, the end state will be, e.g.,
      
      $ ethtool swp1 | grep -A 3 'Speed: '
              Speed: 500000Mb/s
              Lanes: 2
              Duplex: Full
              Auto-negotiation: off
      
      Fixes: 012ce4dd ("ethtool: Extend link modes settings uAPI with lanes")
      Signed-off-by: default avatarAndy Roulin <aroulin@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/ac238d6b-8726-8156-3810-6471291dbc7f@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e847c767
    • Jakub Kicinski's avatar
      Merge branch 'raw-ping-fix-locking-in-proc-net-raw-icmp' · 95fac540
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      raw/ping: Fix locking in /proc/net/{raw,icmp}.
      
      The first patch fixes a NULL deref for /proc/net/raw and second one fixes
      the same issue for ping sockets.
      
      The first patch also converts hlist_nulls to hlist, but this is because
      the current code uses sk_nulls_for_each() for lockless readers, instead
      of sk_nulls_for_each_rcu() which adds memory barrier, but raw sockets
      does not use the nulls marker nor SLAB_TYPESAFE_BY_RCU in the first place.
      
      OTOH, the ping sockets already uses sk_nulls_for_each_rcu(), and such
      conversion can be posted later for net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20230403194959.48928-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95fac540
    • Kuniyuki Iwashima's avatar
      ping: Fix potentail NULL deref for /proc/net/icmp. · ab5fb73f
      Kuniyuki Iwashima authored
      After commit dbca1596 ("ping: convert to RCU lookups, get rid
      of rwlock"), we use RCU for ping sockets, but we should use spinlock
      for /proc/net/icmp to avoid a potential NULL deref mentioned in
      the previous patch.
      
      Let's go back to using spinlock there.
      
      Note we can convert ping sockets to use hlist instead of hlist_nulls
      because we do not use SLAB_TYPESAFE_BY_RCU for ping sockets.
      
      Fixes: dbca1596 ("ping: convert to RCU lookups, get rid of rwlock")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ab5fb73f
    • Kuniyuki Iwashima's avatar
      raw: Fix NULL deref in raw_get_next(). · 0a78cf72
      Kuniyuki Iwashima authored
      Dae R. Jeong reported a NULL deref in raw_get_next() [0].
      
      It seems that the repro was running these sequences in parallel so
      that one thread was iterating on a socket that was being freed in
      another netns.
      
        unshare(0x40060200)
        r0 = syz_open_procfs(0x0, &(0x7f0000002080)='net/raw\x00')
        socket$inet_icmp_raw(0x2, 0x3, 0x1)
        pread64(r0, &(0x7f0000000000)=""/10, 0xa, 0x10000000007f)
      
      After commit 0daf07e5 ("raw: convert raw sockets to RCU"), we
      use RCU and hlist_nulls_for_each_entry() to iterate over SOCK_RAW
      sockets.  However, we should use spinlock for slow paths to avoid
      the NULL deref.
      
      Also, SOCK_RAW does not use SLAB_TYPESAFE_BY_RCU, and the slab object
      is not reused during iteration in the grace period.  In fact, the
      lockless readers do not check the nulls marker with get_nulls_value().
      So, SOCK_RAW should use hlist instead of hlist_nulls.
      
      Instead of adding an unnecessary barrier by sk_nulls_for_each_rcu(),
      let's convert hlist_nulls to hlist and use sk_for_each_rcu() for
      fast paths and sk_for_each() and spinlock for /proc/net/raw.
      
      [0]:
      general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
      CPU: 2 PID: 20952 Comm: syz-executor.0 Not tainted 6.2.0-g048ec869bafd-dirty #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
      RIP: 0010:sock_net include/net/sock.h:649 [inline]
      RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
      RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
      RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
      Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
      RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
      RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
      RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
      RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
      R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
      R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
      FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055bb9614b35f CR3: 000000003c672000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       seq_read_iter+0x4c6/0x10f0 fs/seq_file.c:225
       seq_read+0x224/0x320 fs/seq_file.c:162
       pde_read fs/proc/inode.c:316 [inline]
       proc_reg_read+0x23f/0x330 fs/proc/inode.c:328
       vfs_read+0x31e/0xd30 fs/read_write.c:468
       ksys_pread64 fs/read_write.c:665 [inline]
       __do_sys_pread64 fs/read_write.c:675 [inline]
       __se_sys_pread64 fs/read_write.c:672 [inline]
       __x64_sys_pread64+0x1e9/0x280 fs/read_write.c:672
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x4e/0xa0 arch/x86/entry/common.c:82
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x478d29
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f843ae8dbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000011
      RAX: ffffffffffffffda RBX: 0000000000791408 RCX: 0000000000478d29
      RDX: 000000000000000a RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000f477909a R08: 0000000000000000 R09: 0000000000000000
      R10: 000010000000007f R11: 0000000000000246 R12: 0000000000791740
      R13: 0000000000791414 R14: 0000000000791408 R15: 00007ffc2eb48a50
       </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      RIP: 0010:read_pnet include/net/net_namespace.h:383 [inline]
      RIP: 0010:sock_net include/net/sock.h:649 [inline]
      RIP: 0010:raw_get_next net/ipv4/raw.c:974 [inline]
      RIP: 0010:raw_get_idx net/ipv4/raw.c:986 [inline]
      RIP: 0010:raw_seq_start+0x431/0x800 net/ipv4/raw.c:995
      Code: ef e8 33 3d 94 f7 49 8b 6d 00 4c 89 ef e8 b7 65 5f f7 49 89 ed 49 83 c5 98 0f 84 9a 00 00 00 48 83 c5 c8 48 89 e8 48 c1 e8 03 <42> 80 3c 30 00 74 08 48 89 ef e8 00 3d 94 f7 4c 8b 7d 00 48 89 ef
      RSP: 0018:ffffc9001154f9b0 EFLAGS: 00010206
      RAX: 0000000000000005 RBX: 1ffff1100302c8fd RCX: 0000000000000000
      RDX: 0000000000000028 RSI: ffffc9001154f988 RDI: ffffc9000f77a338
      RBP: 0000000000000029 R08: ffffffff8a50ffb4 R09: fffffbfff24b6bd9
      R10: fffffbfff24b6bd9 R11: 0000000000000000 R12: ffff88801db73b78
      R13: fffffffffffffff9 R14: dffffc0000000000 R15: 0000000000000030
      FS:  00007f843ae8e700(0000) GS:ffff888063700000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f92ff166000 CR3: 000000003c672000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 0daf07e5 ("raw: convert raw sockets to RCU")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reported-by: default avatarDae R. Jeong <threeearcat@gmail.com>
      Link: https://lore.kernel.org/netdev/ZCA2mGV_cmq7lIfV@dragonet/Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0a78cf72
  2. 04 Apr, 2023 2 commits
    • Corinna Vinschen's avatar
      net: stmmac: fix up RX flow hash indirection table when setting channels · 218c5973
      Corinna Vinschen authored
      stmmac_reinit_queues() fails to fix up the RX hash.  Even if the number
      of channels gets restricted, the output of `ethtool -x' indicates that
      all RX queues are used:
      
        $ ethtool -l enp0s29f2
        Channel parameters for enp0s29f2:
        Pre-set maximums:
        RX:		8
        TX:		8
        Other:		n/a
        Combined:	n/a
        Current hardware settings:
        RX:		8
        TX:		8
        Other:		n/a
        Combined:	n/a
        $ ethtool -x enp0s29f2
        RX flow hash indirection table for enp0s29f2 with 8 RX ring(s):
            0:      0     1     2     3     4     5     6     7
            8:      0     1     2     3     4     5     6     7
        [...]
        $ ethtool -L enp0s29f2 rx 3
        $ ethtool -x enp0s29f2
        RX flow hash indirection table for enp0s29f2 with 3 RX ring(s):
            0:      0     1     2     3     4     5     6     7
            8:      0     1     2     3     4     5     6     7
        [...]
      
      Fix this by setting the indirection table according to the number
      of specified queues.  The result is now as expected:
      
        $ ethtool -L enp0s29f2 rx 3
        $ ethtool -x enp0s29f2
        RX flow hash indirection table for enp0s29f2 with 3 RX ring(s):
            0:      0     1     2     0     1     2     0     1
            8:      2     0     1     2     0     1     2     0
        [...]
      
      Tested on Intel Elkhart Lake.
      
      Fixes: 0366f7e0 ("net: stmmac: add ethtool support for get/set channels")
      Signed-off-by: default avatarCorinna Vinschen <vinschen@redhat.com>
      Link: https://lore.kernel.org/r/20230403121120.489138-1-vinschen@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      218c5973
    • Siddharth Vadapalli's avatar
      net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe · c6b486fb
      Siddharth Vadapalli authored
      In the am65_cpsw_nuss_probe() function's cleanup path, the call to
      of_platform_device_destroy() for the common->mdio_dev device is invoked
      unconditionally. It is possible that either the MDIO node is not present
      in the device-tree, or the MDIO node is disabled in the device-tree. In
      both these cases, the MDIO device is not created, resulting in a NULL
      pointer dereference when the of_platform_device_destroy() function is
      invoked on the common->mdio_dev device on the cleanup path.
      
      Fix this by ensuring that the common->mdio_dev device exists, before
      attempting to invoke of_platform_device_destroy().
      
      Fixes: a45cfcc6 ("net: ethernet: ti: am65-cpsw-nuss: use of_platform_device_create() for mdio")
      Signed-off-by: default avatarSiddharth Vadapalli <s-vadapalli@ti.com>
      Reviewed-by: default avatarRoger Quadros <rogerq@kernel.org>
      Link: https://lore.kernel.org/r/20230403090321.835877-1-s-vadapalli@ti.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c6b486fb
  3. 03 Apr, 2023 3 commits
    • Ziyang Xuan's avatar
      ipv6: Fix an uninit variable access bug in __ip6_make_skb() · ea30388b
      Ziyang Xuan authored
      Syzbot reported a bug as following:
      
      =====================================================
      BUG: KMSAN: uninit-value in arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
      BUG: KMSAN: uninit-value in arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
      BUG: KMSAN: uninit-value in atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
      BUG: KMSAN: uninit-value in __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
       arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline]
       arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline]
       atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline]
       __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956
       ip6_finish_skb include/net/ipv6.h:1122 [inline]
       ip6_push_pending_frames+0x10e/0x550 net/ipv6/ip6_output.c:1987
       rawv6_push_pending_frames+0xb12/0xb90 net/ipv6/raw.c:579
       rawv6_sendmsg+0x297e/0x2e60 net/ipv6/raw.c:922
       inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
       ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
       __sys_sendmsg net/socket.c:2559 [inline]
       __do_sys_sendmsg net/socket.c:2568 [inline]
       __se_sys_sendmsg net/socket.c:2566 [inline]
       __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Uninit was created at:
       slab_post_alloc_hook mm/slab.h:766 [inline]
       slab_alloc_node mm/slub.c:3452 [inline]
       __kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491
       __do_kmalloc_node mm/slab_common.c:967 [inline]
       __kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988
       kmalloc_reserve net/core/skbuff.c:492 [inline]
       __alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565
       alloc_skb include/linux/skbuff.h:1270 [inline]
       __ip6_append_data+0x51c1/0x6bb0 net/ipv6/ip6_output.c:1684
       ip6_append_data+0x411/0x580 net/ipv6/ip6_output.c:1854
       rawv6_sendmsg+0x2882/0x2e60 net/ipv6/raw.c:915
       inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476
       ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530
       __sys_sendmsg net/socket.c:2559 [inline]
       __do_sys_sendmsg net/socket.c:2568 [inline]
       __se_sys_sendmsg net/socket.c:2566 [inline]
       __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      It is because icmp6hdr does not in skb linear region under the scenario
      of SOCK_RAW socket. Access icmp6_hdr(skb)->icmp6_type directly will
      trigger the uninit variable access bug.
      
      Use a local variable icmp6_type to carry the correct value in different
      scenarios.
      
      Fixes: 14878f75 ("[IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2]")
      Reported-by: syzbot+8257f4dcef79de670baf@syzkaller.appspotmail.com
      Link: https://syzkaller.appspot.com/bug?id=3d605ec1d0a7f2a269a1a6936ac7f2b85975ee9cSigned-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea30388b
    • Sricharan Ramabadhran's avatar
      net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT · 839349d1
      Sricharan Ramabadhran authored
      On the remote side, when QRTR socket is removed, af_qrtr will call
      qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours
      including local NS. NS upon receiving the DEL_CLIENT packet, will remove
      the lookups associated with the node:port and broadcasts the DEL_SERVER
      packet.
      
      But on the host side, due to the arrival of the DEL_CLIENT packet, the NS
      would've already deleted the server belonging to that port. So when the
      remote's NS again broadcasts the DEL_SERVER for that port, it throws below
      error message on the host:
      
      "failed while handling packet from 2:-2"
      
      So fix this error by not broadcasting the DEL_SERVER packet when the
      DEL_CLIENT packet gets processed."
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Reviewed-by: default avatarManivannan Sadhasivam <mani@kernel.org>
      Signed-off-by: default avatarRam Kumar Dharuman <quic_ramd@quicinc.com>
      Signed-off-by: default avatarSricharan Ramabadhran <quic_srichara@quicinc.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      839349d1
    • Daniel Golle's avatar
      net: sfp: add quirk enabling 2500Base-x for HG MXPD-483II · ad651d68
      Daniel Golle authored
      The HG MXPD-483II 1310nm SFP module is meant to operate with 2500Base-X,
      however, in their EEPROM they incorrectly specify:
          Transceiver type                          : Ethernet: 1000BASE-LX
          ...
          BR, Nominal                               : 2600MBd
      
      Use sfp_quirk_2500basex for this module to allow 2500Base-X mode anyway.
      
      https://forum.banana-pi.org/t/bpi-r3-sfp-module-compatibility/14573/60Reported-by: default avatarchowtom <chowtom@gmail.com>
      Tested-by: default avatarchowtom <chowtom@gmail.com>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad651d68
  4. 02 Apr, 2023 4 commits
    • Xin Long's avatar
      sctp: check send stream number after wait_for_sndbuf · 2584024b
      Xin Long authored
      This patch fixes a corner case where the asoc out stream count may change
      after wait_for_sndbuf.
      
      When the main thread in the client starts a connection, if its out stream
      count is set to N while the in stream count in the server is set to N - 2,
      another thread in the client keeps sending the msgs with stream number
      N - 1, and waits for sndbuf before processing INIT_ACK.
      
      However, after processing INIT_ACK, the out stream count in the client is
      shrunk to N - 2, the same to the in stream count in the server. The crash
      occurs when the thread waiting for sndbuf is awake and sends the msg in a
      non-existing stream(N - 1), the call trace is as below:
      
        KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f]
        Call Trace:
         <TASK>
         sctp_cmd_send_msg net/sctp/sm_sideeffect.c:1114 [inline]
         sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1777 [inline]
         sctp_side_effects net/sctp/sm_sideeffect.c:1199 [inline]
         sctp_do_sm+0x197d/0x5310 net/sctp/sm_sideeffect.c:1170
         sctp_primitive_SEND+0x9f/0xc0 net/sctp/primitive.c:163
         sctp_sendmsg_to_asoc+0x10eb/0x1a30 net/sctp/socket.c:1868
         sctp_sendmsg+0x8d4/0x1d90 net/sctp/socket.c:2026
         inet_sendmsg+0x9d/0xe0 net/ipv4/af_inet.c:825
         sock_sendmsg_nosec net/socket.c:722 [inline]
         sock_sendmsg+0xde/0x190 net/socket.c:745
      
      The fix is to add an unlikely check for the send stream number after the
      thread wakes up from the wait_for_sndbuf.
      
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Reported-by: syzbot+47c24ca20a2fa01f082e@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2584024b
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: fix remaining throughput regression · e669ce46
      Felix Fietkau authored
      Based on further tests, it seems that the QDMA shaper is not able to
      perform shaping close to the MAC link rate without throughput loss.
      This cannot be compensated by increasing the shaping rate, so it seems
      to be an internal limit.
      
      Fix the remaining throughput regression by detecting that condition and
      limiting shaping to ports with lower link speed.
      
      This patch intentionally ignores link speed gain from TRGMII, because
      even on such links, shaping to 1000 Mbit/s incurs some throughput
      degradation.
      
      Fixes: f63959c7 ("net: ethernet: mtk_eth_soc: implement multi-queue support for per-port queues")
      Tested-By: default avatarFrank Wunderlich <frank-w@public-files.de>
      Reported-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e669ce46
    • Gustav Ekelund's avatar
      net: dsa: mv88e6xxx: Reset mv88e6393x force WD event bit · 089b91a0
      Gustav Ekelund authored
      The force watchdog event bit is not cleared during SW reset in the
      mv88e6393x switch. This is a different behavior compared to mv886390 which
      clears the force WD event bit as advertised. This causes a force WD event
      to be handled over and over again as the SW reset following the event never
      clears the force WD event bit.
      
      Explicitly clear the watchdog event register to 0 in irq_action when
      handling an event to prevent the switch from sending continuous interrupts.
      Marvell aren't aware of any other stuck bits apart from the force WD
      bit.
      
      Fixes: de776d0d ("net: dsa: mv88e6xxx: add support for mv88e6393x family"
      Signed-off-by: default avatarGustav Ekelund <gustaek@axis.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      089b91a0
    • Jakub Kicinski's avatar
      net: don't let netpoll invoke NAPI if in xmit context · 275b471e
      Jakub Kicinski authored
      Commit 0db3dc73 ("[NETPOLL]: tx lock deadlock fix") narrowed
      down the region under netif_tx_trylock() inside netpoll_send_skb().
      (At that point in time netif_tx_trylock() would lock all queues of
      the device.) Taking the tx lock was problematic because driver's
      cleanup method may take the same lock. So the change made us hold
      the xmit lock only around xmit, and expected the driver to take
      care of locking within ->ndo_poll_controller().
      
      Unfortunately this only works if netpoll isn't itself called with
      the xmit lock already held. Netpoll code is careful and uses
      trylock(). The drivers, however, may be using plain lock().
      Printing while holding the xmit lock is going to result in rare
      deadlocks.
      
      Luckily we record the xmit lock owners, so we can scan all the queues,
      the same way we scan NAPI owners. If any of the xmit locks is held
      by the local CPU we better not attempt any polling.
      
      It would be nice if we could narrow down the check to only the NAPIs
      and the queue we're trying to use. I don't see a way to do that now.
      Reported-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Fixes: 0db3dc73 ("[NETPOLL]: tx lock deadlock fix")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      275b471e
  5. 01 Apr, 2023 2 commits
    • Eric Dumazet's avatar
      icmp: guard against too small mtu · 7d63b671
      Eric Dumazet authored
      syzbot was able to trigger a panic [1] in icmp_glue_bits(), or
      more exactly in skb_copy_and_csum_bits()
      
      There is no repro yet, but I think the issue is that syzbot
      manages to lower device mtu to a small value, fooling __icmp_send()
      
      __icmp_send() must make sure there is enough room for the
      packet to include at least the headers.
      
      We might in the future refactor skb_copy_and_csum_bits() and its
      callers to no longer crash when something bad happens.
      
      [1]
      kernel BUG at net/core/skbuff.c:3343 !
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 15766 Comm: syz-executor.0 Not tainted 6.3.0-rc4-syzkaller-00039-gffe78bbd #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      RIP: 0010:skb_copy_and_csum_bits+0x798/0x860 net/core/skbuff.c:3343
      Code: f0 c1 c8 08 41 89 c6 e9 73 ff ff ff e8 61 48 d4 f9 e9 41 fd ff ff 48 8b 7c 24 48 e8 52 48 d4 f9 e9 c3 fc ff ff e8 c8 27 84 f9 <0f> 0b 48 89 44 24 28 e8 3c 48 d4 f9 48 8b 44 24 28 e9 9d fb ff ff
      RSP: 0018:ffffc90000007620 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 00000000000001e8 RCX: 0000000000000100
      RDX: ffff8880276f6280 RSI: ffffffff87fdd138 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
      R10: 00000000000001e8 R11: 0000000000000001 R12: 000000000000003c
      R13: 0000000000000000 R14: ffff888028244868 R15: 0000000000000b0e
      FS: 00007fbc81f1c700(0000) GS:ffff88802ca00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2df43000 CR3: 00000000744db000 CR4: 0000000000150ef0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <IRQ>
      icmp_glue_bits+0x7b/0x210 net/ipv4/icmp.c:353
      __ip_append_data+0x1d1b/0x39f0 net/ipv4/ip_output.c:1161
      ip_append_data net/ipv4/ip_output.c:1343 [inline]
      ip_append_data+0x115/0x1a0 net/ipv4/ip_output.c:1322
      icmp_push_reply+0xa8/0x440 net/ipv4/icmp.c:370
      __icmp_send+0xb80/0x1430 net/ipv4/icmp.c:765
      ipv4_send_dest_unreach net/ipv4/route.c:1239 [inline]
      ipv4_link_failure+0x5a9/0x9e0 net/ipv4/route.c:1246
      dst_link_failure include/net/dst.h:423 [inline]
      arp_error_report+0xcb/0x1c0 net/ipv4/arp.c:296
      neigh_invalidate+0x20d/0x560 net/core/neighbour.c:1079
      neigh_timer_handler+0xc77/0xff0 net/core/neighbour.c:1166
      call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700
      expire_timers+0x29b/0x4b0 kernel/time/timer.c:1751
      __run_timers kernel/time/timer.c:2022 [inline]
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+d373d60fddbdc915e666@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230330174502.1915328-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d63b671
    • Jakub Kicinski's avatar
      Revert "net: netcp: MAX_SKB_FRAGS is now 'int'" · adef41b0
      Jakub Kicinski authored
      This reverts commit c5b959ee.
      
      Reverted change is required after commit 3948b059 ("net: introduce
      a config option to tweak MAX_SKB_FRAGS") which does not exist
      in this tree, yet. It's only present in -next trees at the time
      of writing.
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Link: https://lore.kernel.org/all/20230331214444.GA1426512@dev-arch.thelio-3990X/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      adef41b0
  6. 31 Mar, 2023 11 commits
  7. 30 Mar, 2023 10 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · b2bc47e9
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from CAN and WPAN.
      
        Still quite a few bugs from this release. This pull is a bit smaller
        because major subtrees went into the previous one. Or maybe people
        took spring break off?
      
        Current release - regressions:
      
         - phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement
      
        Current release - new code bugs:
      
         - eth: wangxun: fix vector length of interrupt cause
      
         - vsock/loopback: consistently protect the packet queue with
           sk_buff_head.lock
      
         - virtio/vsock: fix header length on skb merging
      
         - wpan: ca8210: fix unsigned mac_len comparison with zero
      
        Previous releases - regressions:
      
         - eth: stmmac: don't reject VLANs when IFF_PROMISC is set
      
         - eth: smsc911x: avoid PHY being resumed when interface is not up
      
         - eth: mtk_eth_soc: fix tx throughput regression with direct 1G links
      
         - eth: bnx2x: use the right build_skb() helper after core rework
      
         - wwan: iosm: fix 7560 modem crash on use on unsupported channel
      
        Previous releases - always broken:
      
         - eth: sfc: don't overwrite offload features at NIC reset
      
         - eth: r8169: fix RTL8168H and RTL8107E rx crc error
      
         - can: j1939: prevent deadlock by moving j1939_sk_errqueue()
      
         - virt: vmxnet3: use GRO callback when UPT is enabled
      
         - virt: xen: don't do grant copy across page boundary
      
         - phy: dp83869: fix default value for tx-/rx-internal-delay
      
         - dsa: ksz8: fix multiple issues with ksz8_fdb_dump
      
         - eth: mvpp2: fix classification/RSS of VLAN and fragmented packets
      
         - eth: mtk_eth_soc: fix flow block refcounting logic
      
        Misc:
      
         - constify fwnode pointers in SFP handling"
      
      * tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits)
        net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow
        net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload
        net: ethernet: mtk_eth_soc: fix flow block refcounting logic
        net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit()
        net: dsa: sync unicast and multicast addresses for VLAN filters too
        net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only
        xen/netback: use same error messages for same errors
        test/vsock: new skbuff appending test
        virtio/vsock: WARN_ONCE() for invalid state of socket
        virtio/vsock: fix header length on skb merging
        bnxt_en: Add missing 200G link speed reporting
        bnxt_en: Fix typo in PCI id to device description string mapping
        bnxt_en: Fix reporting of test result in ethtool selftest
        i40e: fix registers dump after run ethtool adapter self test
        bnx2x: use the right build_skb() helper
        net: ipa: compute DMA pool size properly
        net: wwan: iosm: fixes 7560 modem crash
        net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links
        ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg()
        ice: add profile conflict check for AVF FDIR
        ...
      b2bc47e9
    • Linus Torvalds's avatar
      Merge tag 'for-6.3/dm-fixes-2' of... · b527ac44
      Linus Torvalds authored
      Merge tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - Fix two DM core bugs in the code that handles splitting "abnormal" IO
         (discards, write same and secure erase) and issuing that IO to the
         correct underlying devices (and offsets within those devices).
      
      * tag 'for-6.3/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm: fix __send_duplicate_bios() to always allow for splitting IO
        dm: fix improper splitting for abnormal bios
      b527ac44
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm · 0d3ff808
      Linus Torvalds authored
      Pull drm fixes from Daniel Vetter:
       "Two regression fixes in here, otherwise just the usual stuff:
      
         - i915 fixes for color mgmt, psr, lmem flush, hibernate oops, and
           more
      
         - amdgpu: dp mst and hibernate regression fix
      
         - etnaviv: revert fdinfo support (incl drm/sched revert), leak fix
      
         - misc ivpu fixes, nouveau backlight, drm buddy allocator 32bit
           fixes"
      
      * tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm: (27 commits)
        Revert "drm/scheduler: track GPU active time per entity"
        Revert "drm/etnaviv: export client GPU usage statistics via fdinfo"
        drm/etnaviv: fix reference leak when mmaping imported buffer
        drm/amdgpu: allow more APUs to do mode2 reset when go to S4
        drm/amd/display: Take FEC Overhead into Timeslot Calculation
        drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub
        drm: test: Fix 32-bit issue in drm_buddy_test
        drm: buddy_allocator: Fix buddy allocator init on 32-bit systems
        drm/nouveau/kms: Fix backlight registration
        drm/i915/perf: Drop wakeref on GuC RC error
        drm/i915/dpt: Treat the DPT BO as a framebuffer
        drm/i915/gem: Flush lmem contents after construction
        drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state
        drm/i915: Disable DC states for all commits
        drm/i915: Workaround ICL CSC_MODE sticky arming
        drm/i915: Add a .color_post_update() hook
        drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk
        drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm()
        drm/i915/pmu: Use functions common with sysfs to read actual freq
        accel/ivpu: Fix IPC buffer header status field value
        ...
      0d3ff808
    • Mike Snitzer's avatar
      dm: fix __send_duplicate_bios() to always allow for splitting IO · 666eed46
      Mike Snitzer authored
      Commit 7dd76d1f ("dm: improve bio splitting and associated IO
      accounting") only called setup_split_accounting() from
      __send_duplicate_bios() if a single bio were being issued. But the case
      where duplicate bios are issued must call it too.
      
      Otherwise the bio won't be split and resubmitted (via recursion through
      block core back to DM) to submit the later portions of a bio (which may
      map to an entirely different target).
      
      For example, when discarding an entire DM striped device with the
      following DM table:
       vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
       vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
      
      Before (broken, discards the first striped target's devices twice):
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2049 len=22528
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=22528
      
      After (works as expected):
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
       device-mapper: striped: target_stripe=0, bdev=7:2, start=2048 len=22528
       device-mapper: striped: target_stripe=1, bdev=7:3, start=2048 len=22528
      
      Fixes: 7dd76d1f ("dm: improve bio splitting and associated IO accounting")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarOrange Kao <orange@aiven.io>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      666eed46
    • Mike Snitzer's avatar
      dm: fix improper splitting for abnormal bios · f7b58a69
      Mike Snitzer authored
      "Abnormal" bios include discards, write zeroes and secure erase. By no
      longer passing the calculated 'len' pointer, commit 7dd06a25 ("dm:
      allow dm_accept_partial_bio() for dm_io without duplicate bios") took a
      senseless approach to disallowing dm_accept_partial_bio() from working
      for duplicate bios processed using __send_duplicate_bios().
      
      It inadvertently and incorrectly stopped the use of 'len' when
      initializing a target's io (in alloc_tio). As such the resulting tio
      could address more area of a device than it should.
      
      For example, when discarding an entire DM striped device with the
      following DM table:
       vg-lvol0: 0 159744 striped 2 128 7:0 2048 7:1 2048
       vg-lvol0: 159744 45056 striped 2 128 7:2 2048 7:3 2048
      
      Before this fix:
      
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=102400
       blkdiscard: attempt to access beyond end of device
       loop0: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
      
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=102400
       blkdiscard: attempt to access beyond end of device
       loop1: rw=2051, sector=2048, nr_sectors = 102400 limit=81920
      
      After this fix;
      
       device-mapper: striped: target_stripe=0, bdev=7:0, start=2048 len=79872
       device-mapper: striped: target_stripe=1, bdev=7:1, start=2048 len=79872
      
      Fixes: 7dd06a25 ("dm: allow dm_accept_partial_bio() for dm_io without duplicate bios")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarOrange Kao <orange@aiven.io>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      f7b58a69
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow · 92453132
      Felix Fietkau authored
      The cache needs to be flushed to ensure that the hardware stops offloading
      the flow immediately.
      
      Fixes: 33fc42de ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230330120840.52079-3-nbd@nbd.nameSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      92453132
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload · 5f36ca1b
      Felix Fietkau authored
      Check for skb metadata in order to detect the case where the DSA header
      is not present.
      
      Fixes: 2d7605a7 ("net: ethernet: mtk_eth_soc: enable hardware DSA untagging")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230330120840.52079-2-nbd@nbd.nameSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5f36ca1b
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: fix flow block refcounting logic · 8c1cb87c
      Felix Fietkau authored
      Since we call flow_block_cb_decref on FLOW_BLOCK_UNBIND, we also need to
      call flow_block_cb_incref for a newly allocated cb.
      Also fix the accidentally inverted refcount check on unbind.
      
      Fixes: 502e84e2 ("net: ethernet: mtk_eth_soc: add flow offloading support")
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20230330120840.52079-1-nbd@nbd.nameSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c1cb87c
    • Russell King (Oracle)'s avatar
      net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit() · 2960a2d3
      Russell King (Oracle) authored
      Reported on the Turris forum, mvneta provokes kernel warnings in the
      architecture DMA mapping code when mvneta_setup_txqs() fails to
      allocate memory. This happens because when mvneta_cleanup_txqs() is
      called in the mvneta_stop() path, we leave pointers in the structure
      that have been freed.
      
      Then on mvneta_open(), we call mvneta_setup_txqs(), which starts
      allocating memory. On memory allocation failure, mvneta_cleanup_txqs()
      will walk all the queues freeing any non-NULL pointers - which includes
      pointers that were previously freed in mvneta_stop().
      
      Fix this by setting these pointers to NULL to prevent double-freeing
      of the same memory.
      
      Fixes: 2adb719d ("net: mvneta: Implement software TSO")
      Link: https://forum.turris.cz/t/random-kernel-exceptions-on-hbl-tos-7-0/18865/8Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Link: https://lore.kernel.org/r/E1phUe5-00EieL-7q@rmk-PC.armlinux.org.ukSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2960a2d3
    • Vladimir Oltean's avatar
      net: dsa: sync unicast and multicast addresses for VLAN filters too · 64fdc5f3
      Vladimir Oltean authored
      If certain conditions are met, DSA can install all necessary MAC
      addresses on the CPU ports as FDB entries and disable flooding towards
      the CPU (we call this RX filtering).
      
      There is one corner case where this does not work.
      
      ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
      ip link set swp0 master br0 && ip link set swp0 up
      ip link add link swp0 name swp0.100 type vlan id 100
      ip link set swp0.100 up && ip addr add 192.168.100.1/24 dev swp0.100
      
      Traffic through swp0.100 is broken, because the bridge turns on VLAN
      filtering in the swp0 port (causing RX packets to be classified to the
      FDB database corresponding to the VID from their 802.1Q header), and
      although the 8021q module does call dev_uc_add() towards the real
      device, that API is VLAN-unaware, so it only contains the MAC address,
      not the VID; and DSA's current implementation of ndo_set_rx_mode() is
      only for VID 0 (corresponding to FDB entries which are installed in an
      FDB database which is only hit when the port is VLAN-unaware).
      
      It's interesting to understand why the bridge does not turn on
      IFF_PROMISC for its swp0 bridge port, and it may appear at first glance
      that this is a regression caused by the logic in commit 2796d0c6
      ("bridge: Automatically manage port promiscuous mode."). After all,
      a bridge port needs to have IFF_PROMISC by its very nature - it needs to
      receive and forward frames with a MAC DA different from the bridge
      ports' MAC addresses.
      
      While that may be true, when the bridge is VLAN-aware *and* it has a
      single port, there is no real reason to enable promiscuity even if that
      is an automatic port, with flooding and learning (there is nowhere for
      packets to go except to the BR_FDB_LOCAL entries), and this is how the
      corner case appears. Adding a second automatic interface to the bridge
      would make swp0 promisc as well, and would mask the corner case.
      
      Given the dev_uc_add() / ndo_set_rx_mode() API is what it is (it doesn't
      pass a VLAN ID), the only way to address that problem is to install host
      FDB entries for the cartesian product of RX filtering MAC addresses and
      VLAN RX filters.
      
      Fixes: 7569459a ("net: dsa: manage flooding on the CPU ports")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20230329151821.745752-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      64fdc5f3