1. 13 Oct, 2023 5 commits
    • Jakub Kicinski's avatar
      Merge branch 'selftests-fib_tests-fixes-for-multipath-list-receive-tests' · dda5e1ee
      Jakub Kicinski authored
      Ido Schimmel says:
      
      ====================
      selftests: fib_tests: Fixes for multipath list receive tests
      
      Fix two issues in recently added FIB multipath list receive tests.
      ====================
      
      Link: https://lore.kernel.org/r/20231010132113.3014691-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dda5e1ee
    • Ido Schimmel's avatar
      selftests: fib_tests: Count all trace point invocations · aa13e524
      Ido Schimmel authored
      The tests rely on the IPv{4,6} FIB trace points being triggered once for
      each forwarded packet. If receive processing is deferred to the
      ksoftirqd task these invocations will not be counted and the tests will
      fail. Fix by specifying the '-a' flag to avoid perf from filtering on
      the mausezahn task.
      
      Before:
      
       # ./fib_tests.sh -t ipv4_mpath_list
      
       IPv4 multipath list receive tests
           TEST: Multipath route hit ratio (.68)                               [FAIL]
      
       # ./fib_tests.sh -t ipv6_mpath_list
      
       IPv6 multipath list receive tests
           TEST: Multipath route hit ratio (.27)                               [FAIL]
      
      After:
      
       # ./fib_tests.sh -t ipv4_mpath_list
      
       IPv4 multipath list receive tests
           TEST: Multipath route hit ratio (1.00)                              [ OK ]
      
       # ./fib_tests.sh -t ipv6_mpath_list
      
       IPv6 multipath list receive tests
           TEST: Multipath route hit ratio (.99)                               [ OK ]
      
      Fixes: 8ae9efb8 ("selftests: fib_tests: Add multipath list receive tests")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/Tested-by: default avatarkernel test robot <oliver.sang@intel.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Tested-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Link: https://lore.kernel.org/r/20231010132113.3014691-3-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa13e524
    • Ido Schimmel's avatar
      selftests: fib_tests: Disable RP filter in multipath list receive test · dbb13378
      Ido Schimmel authored
      The test relies on the fib:fib_table_lookup trace point being triggered
      once for each forwarded packet. If RP filter is not disabled, the trace
      point will be triggered twice for each packet (for source validation and
      forwarding), potentially masking actual bugs. Fix by explicitly
      disabling RP filter.
      
      Before:
      
       # ./fib_tests.sh -t ipv4_mpath_list
      
       IPv4 multipath list receive tests
           TEST: Multipath route hit ratio (1.99)                              [ OK ]
      
      After:
      
       # ./fib_tests.sh -t ipv4_mpath_list
      
       IPv4 multipath list receive tests
           TEST: Multipath route hit ratio (.99)                               [ OK ]
      
      Fixes: 8ae9efb8 ("selftests: fib_tests: Add multipath list receive tests")
      Reported-by: default avatarkernel test robot <oliver.sang@intel.com>
      Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/Tested-by: default avatarkernel test robot <oliver.sang@intel.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Tested-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Link: https://lore.kernel.org/r/20231010132113.3014691-2-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dbb13378
    • Kuniyuki Iwashima's avatar
      tcp: Fix listen() warning with v4-mapped-v6 address. · 8702cf12
      Kuniyuki Iwashima authored
      syzbot reported a warning [0] introduced by commit c48ef9c4 ("tcp: Fix
      bind() regression for v4-mapped-v6 non-wildcard address.").
      
      After the cited commit, a v4 socket's address matches the corresponding
      v4-mapped-v6 tb2 in inet_bind2_bucket_match_addr(), not vice versa.
      
      During X.X.X.X -> ::ffff:X.X.X.X order bind()s, the second bind() uses
      bhash and conflicts properly without checking bhash2 so that we need not
      check if a v4-mapped-v6 sk matches the corresponding v4 address tb2 in
      inet_bind2_bucket_match_addr().  However, the repro shows that we need
      to check that in a no-conflict case.
      
      The repro bind()s two sockets to the 2-tuples using SO_REUSEPORT and calls
      listen() for the first socket:
      
        from socket import *
      
        s1 = socket()
        s1.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1)
        s1.bind(('127.0.0.1', 0))
      
        s2 = socket(AF_INET6)
        s2.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1)
        s2.bind(('::ffff:127.0.0.1', s1.getsockname()[1]))
      
        s1.listen()
      
      The second socket should belong to the first socket's tb2, but the second
      bind() creates another tb2 bucket because inet_bind2_bucket_find() returns
      NULL in inet_csk_get_port() as the v4-mapped-v6 sk does not match the
      corresponding v4 address tb2.
      
        bhash2[] -> tb2(::ffff:X.X.X.X) -> tb2(X.X.X.X)
      
      Then, listen() for the first socket calls inet_csk_get_port(), where the
      v4 address matches the v4-mapped-v6 tb2 and WARN_ON() is triggered.
      
      To avoid that, we need to check if v4-mapped-v6 sk address matches with
      the corresponding v4 address tb2 in inet_bind2_bucket_match().
      
      The same checks are needed in inet_bind2_bucket_addr_match() too, so we
      can move all checks there and call it from inet_bind2_bucket_match().
      
      Note that now tb->family is just an address family of tb->(v6_)?rcv_saddr
      and not of sockets in the bucket.  This could be refactored later by
      defining tb->rcv_saddr as tb->v6_rcv_saddr.s6_addr32[3] and prepending
      ::ffff: when creating v4 tb2.
      
      [0]:
      WARNING: CPU: 0 PID: 5049 at net/ipv4/inet_connection_sock.c:587 inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587
      Modules linked in:
      CPU: 0 PID: 5049 Comm: syz-executor288 Not tainted 6.6.0-rc2-syzkaller-00018-g2cf0f715 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
      RIP: 0010:inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587
      Code: 7c 24 08 e8 4c b6 8a 01 31 d2 be 88 01 00 00 48 c7 c7 e0 94 ae 8b e8 59 2e a3 f8 2e 2e 2e 31 c0 e9 04 fe ff ff e8 ca 88 d0 f8 <0f> 0b e9 0f f9 ff ff e8 be 88 d0 f8 49 8d 7e 48 e8 65 ca 5a 00 31
      RSP: 0018:ffffc90003abfbf0 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffff888026429100 RCX: 0000000000000000
      RDX: ffff88807edcbb80 RSI: ffffffff88b73d66 RDI: ffff888026c49f38
      RBP: ffff888026c49f30 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9260f200
      R13: ffff888026c49880 R14: 0000000000000000 R15: ffff888026429100
      FS:  00005555557d5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000045ad50 CR3: 0000000025754000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       inet_csk_listen_start+0x155/0x360 net/ipv4/inet_connection_sock.c:1256
       __inet_listen_sk+0x1b8/0x5c0 net/ipv4/af_inet.c:217
       inet_listen+0x93/0xd0 net/ipv4/af_inet.c:239
       __sys_listen+0x194/0x270 net/socket.c:1866
       __do_sys_listen net/socket.c:1875 [inline]
       __se_sys_listen net/socket.c:1873 [inline]
       __x64_sys_listen+0x53/0x80 net/socket.c:1873
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f3a5bce3af9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffc1a1c79e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000032
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a5bce3af9
      RDX: 00007f3a5bce3af9 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00007f3a5bd565f0 R08: 0000000000000006 R09: 0000000000000006
      R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000001
      R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
       </TASK>
      
      Fixes: c48ef9c4 ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.")
      Reported-by: syzbot+71e724675ba3958edb31@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=71e724675ba3958edb31Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20231010013814.70571-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8702cf12
    • Jiri Wiesner's avatar
      bonding: Return pointer to data after pull on skb · d93f3f99
      Jiri Wiesner authored
      Since 429e3d12 ("bonding: Fix extraction of ports from the packet
      headers"), header offsets used to compute a hash in bond_xmit_hash() are
      relative to skb->data and not skb->head. If the tail of the header buffer
      of an skb really needs to be advanced and the operation is successful, the
      pointer to the data must be returned (and not a pointer to the head of the
      buffer).
      
      Fixes: 429e3d12 ("bonding: Fix extraction of ports from the packet headers")
      Signed-off-by: default avatarJiri Wiesner <jwiesner@suse.de>
      Acked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d93f3f99
  2. 12 Oct, 2023 14 commits
  3. 11 Oct, 2023 16 commits
  4. 10 Oct, 2023 5 commits
    • Ilya Leoshkevich's avatar
      s390/bpf: Fix unwinding past the trampoline · 5356ba1f
      Ilya Leoshkevich authored
      When functions called by the trampoline panic, the backtrace that is
      printed stops at the trampoline, because the trampoline does not store
      its caller's frame address (backchain) on stack; it also stores the
      return address at a wrong location.
      
      Store both the same way as is already done for the regular eBPF programs.
      
      Fixes: 528eb2cb ("s390/bpf: Implement arch_prepare_bpf_trampoline()")
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20231010203512.385819-3-iii@linux.ibm.com
      5356ba1f
    • Ilya Leoshkevich's avatar
      s390/bpf: Fix clobbering the caller's backchain in the trampoline · ce10fc06
      Ilya Leoshkevich authored
      One of the first things that s390x kernel functions do is storing the
      the caller's frame address (backchain) on stack. This makes unwinding
      possible. The backchain is always stored at frame offset 152, which is
      inside the 160-byte stack area, that the functions allocate for their
      callees. The callees must preserve the backchain; the remaining 152
      bytes they may use as they please.
      
      Currently the trampoline uses all 160 bytes, clobbering the backchain.
      This causes kernel panics when using __builtin_return_address() in
      functions called by the trampoline.
      
      Fix by reducing the usage of the caller-reserved stack area by 8 bytes
      in the trampoline.
      
      Fixes: 528eb2cb ("s390/bpf: Implement arch_prepare_bpf_trampoline()")
      Reported-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20231010203512.385819-2-iii@linux.ibm.com
      ce10fc06
    • Linus Torvalds's avatar
      Merge tag 'xsa441-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 1c8b86a3
      Linus Torvalds authored
      Pull xen fix from Juergen Gross:
       "A fix for the xen events driver:
      
        Closing of an event channel in the Linux kernel can result in a
        deadlock. This happens when the close is being performed in parallel
        to an unrelated Xen console action and the handling of a Xen console
        interrupt in an unprivileged guest.
      
        The closing of an event channel is e.g. triggered by removal of a
        paravirtual device on the other side. As this action will cause
        console messages to be issued on the other side quite often, the
        chance of triggering the deadlock is not negligible"
      
      * tag 'xsa441-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/events: replace evtchn_rwlock with RCU
      1c8b86a3
    • Sumit Garg's avatar
      KEYS: trusted: Remove redundant static calls usage · 01bbafc6
      Sumit Garg authored
      Static calls invocations aren't well supported from module __init and
      __exit functions. Especially the static call from cleanup_trusted() led
      to a crash on x86 kernel with CONFIG_DEBUG_VIRTUAL=y.
      
      However, the usage of static call invocations for trusted_key_init()
      and trusted_key_exit() don't add any value from either a performance or
      security perspective. Hence switch to use indirect function calls instead.
      
      Note here that although it will fix the current crash report, ultimately
      the static call infrastructure should be fixed to either support its
      future usage from module __init and __exit functions or not.
      Reported-and-tested-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Link: https://lore.kernel.org/lkml/ZRhKq6e5nF%2F4ZIV1@fedora/#t
      Fixes: 5d0682be ("KEYS: trusted: Add generic trusted keys framework")
      Signed-off-by: default avatarSumit Garg <sumit.garg@linaro.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01bbafc6
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2023-10-10-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 87813e13
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "A set of updates for interrupt chip drivers:
      
         - Fix the fail of the Qualcomm PDC driver on v3.2 hardware which is
           caused by a control bit being moved to a different location
      
         - Update the SM8150 device tree PDC resource so the version register
           can be read
      
         - Make the Renesas RZG2L driver correct for interrupts which are
           outside of the LSB in the TSSR register by using the proper macro
           for calculating the mask
      
         - Document the Renesas RZ2GL device tree binding correctly and update
           them for a few devices which faul to boot otherwise
      
         - Use the proper accessor in the RZ2GL driver instead of blindly
           dereferencing an unchecked pointer
      
         - Make GICv3 handle the dma-non-coherent attribute correctly
      
         - Ensure that all interrupt controller nodes on RISCV are marked as
           initialized correctly
      
        Maintainer changes:
      
         - Add a new entry for GIC interrupt controllers and assign Marc
           Zyngier as the maintainer
      
         - Remove Marc Zyngier from the core and driver maintainer entries as
           he is burried in work and short of time to handle that.
      
        Thanks to Marc for all the great work he has done in the past couple
        of years!
      
        Also note that commit 5873d380 ("irqchip/qcom-pdc: Add support for
        v3.2 HW") has a incorrect SOB chain.
      
        The real author is Neil. His patch was posted by Dmitry once and Neil
        picked it up from the list and reposted it with the bogus SOB chain.
      
        Not a big deal, but worth to mention. I wanted to fix that up, but
        then got distracted and Marc piled more changes on top. So I decided
        to leave it as is instead of rebasing world"
      
      * tag 'irq-urgent-2023-10-10-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MAINTAINERS: Remove myself from the general IRQ subsystem maintenance
        MAINTAINERS: Add myself as the ARM GIC maintainer
        irqchip/renesas-rzg2l: Convert to irq_data_get_irq_chip_data()
        irqchip/stm32-exti: add missing DT IRQ flag translation
        irqchip/riscv-intc: Mark all INTC nodes as initialized
        irqchip/gic-v3: Enable non-coherent redistributors/ITSes DT probing
        irqchip/gic-v3-its: Split allocation from initialisation of its_node
        dt-bindings: interrupt-controller: arm,gic-v3: Add dma-noncoherent property
        dt-bindings: interrupt-controller: renesas,irqc: Add r8a779f0 support
        dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/G2UL SoC
        irqchip: renesas-rzg2l: Fix logic to clear TINT interrupt source
        dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Update description for '#interrupt-cells' property
        arm64: dts: qcom: sm8150: extend the size of the PDC resource
        irqchip/qcom-pdc: Add support for v3.2 HW
      87813e13