1. 24 Nov, 2023 2 commits
    • Andrii Nakryiko's avatar
      bpf: Validate global subprogs lazily · 2afae08c
      Andrii Nakryiko authored
      Slightly change BPF verifier logic around eagerness and order of global
      subprog validation. Instead of going over every global subprog eagerly
      and validating it before main (entry) BPF program is verified, turn it
      around. Validate main program first, mark subprogs that were called from
      main program for later verification, but otherwise assume it is valid.
      Afterwards, go over marked global subprogs and validate those,
      potentially marking some more global functions as being called. Continue
      this process until all (transitively) callable global subprogs are
      validated. It's a BFS traversal at its heart and will always converge.
      
      This is an important change because it allows to feature-gate some
      subprograms that might not be verifiable on some older kernel, depending
      on supported set of features.
      
      E.g., at some point, global functions were allowed to accept a pointer
      to memory, which size is identified by user-provided type.
      Unfortunately, older kernels don't support this feature. With BPF CO-RE
      approach, the natural way would be to still compile BPF object file once
      and guard calls to this global subprog with some CO-RE check or using
      .rodata variables. That's what people do to guard usage of new helpers
      or kfuncs, and any other new BPF-side feature that might be missing on
      old kernels.
      
      That's currently impossible to do with global subprogs, unfortunately,
      because they are eagerly and unconditionally validated. This patch set
      aims to change this, so that in the future when global funcs gain new
      features, those can be guarded using BPF CO-RE techniques in the same
      fashion as any other new kernel feature.
      
      Two selftests had to be adjusted in sync with these changes.
      
      test_global_func12 relied on eager global subprog validation failing
      before main program failure is detected (unknown return value). Fix by
      making sure that main program is always valid.
      
      verifier_subprog_precision's parent_stack_slot_precise subtest relied on
      verifier checkpointing heuristic to do a checkpoint at instruction #5,
      but that's no longer true because we don't have enough jumps validated
      before reaching insn #5 due to global subprogs being validated later.
      
      Other than that, no changes, as one would expect.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20231124035937.403208-3-andrii@kernel.org
      2afae08c
    • Andrii Nakryiko's avatar
      bpf: Emit global subprog name in verifier logs · 491dd8ed
      Andrii Nakryiko authored
      We have the name, instead of emitting just func#N to identify global
      subprog, augment verifier log messages with actual function name to make
      it more user-friendly.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20231124035937.403208-2-andrii@kernel.org
      491dd8ed
  2. 23 Nov, 2023 28 commits
  3. 22 Nov, 2023 10 commits
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.7-1' of... · 9b6de136
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Fix several build errors, a potential kernel panic, a cpu hotplug
        issue and update links in documentations"
      
      * tag 'loongarch-fixes-6.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        Docs/zh_CN/LoongArch: Update links in LoongArch introduction.rst
        Docs/LoongArch: Update links in LoongArch introduction.rst
        LoongArch: Implement constant timer shutdown interface
        LoongArch: Mark {dmw,tlb}_virt_to_page() exports as non-GPL
        LoongArch: Silence the boot warning about 'nokaslr'
        LoongArch: Add __percpu annotation for __percpu_read()/__percpu_write()
        LoongArch: Record pc instead of offset in la_abs relocation
        LoongArch: Explicitly set -fdirect-access-external-data for vmlinux
        LoongArch: Add dependency between vmlinuz.efi and vmlinux.efi
      9b6de136
    • Linus Torvalds's avatar
      Merge tag 'hyperv-fixes-signed-20231121' of... · 05c8c94e
      Linus Torvalds authored
      Merge tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv fixes from Wei Liu:
      
       - One fix for the KVP daemon (Ani Sinha)
      
       - Fix for the detection of E820_TYPE_PRAM in a Gen2 VM (Saurabh Sengar)
      
       - Micro-optimization for hv_nmi_unknown() (Uros Bizjak)
      
      * tag 'hyperv-fixes-signed-20231121' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown()
        x86/hyperv: Fix the detection of E820_TYPE_PRAM in a Gen2 VM
        hv/hv_kvp_daemon: Some small fixes for handling NM keyfiles
      05c8c94e
    • Linus Torvalds's avatar
      asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation · 125b0bb9
      Linus Torvalds authored
      We really don't want to do atomic_read() or anything like that, since we
      already have the value, not the lock.  The whole point of this is that
      we've loaded the lock from memory, and we want to check whether the
      value we loaded was a locked one or not.
      
      The main use of this is the lockref code, which loads both the lock and
      the reference count in one atomic operation, and then works on that
      combined value.  With the atomic_read(), the compiler would pointlessly
      spill the value to the stack, in order to then be able to read it back
      "atomically".
      
      This is the qspinlock version of commit c6f4a900 ("asm-generic:
      ticket-lock: Optimize arch_spin_value_unlocked()") which fixed this same
      bug for ticket locks.
      
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Link: https://lore.kernel.org/all/CAHk-=whNRv0v6kQiV5QO6DJhjH4KEL36vWQ6Re8Csrnh4zbRkQ@mail.gmail.com/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      125b0bb9
    • Heiner Kallweit's avatar
      Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E" · 6a263102
      Heiner Kallweit authored
      This reverts commit efa5f131.
      
      I couldn't reproduce the reported issue. What I did, based on a pcap
      packet log provided by the reporter:
      - Used same chip version (RTL8168h)
      - Set MAC address to the one used on the reporters system
      - Replayed the EAPOL unicast packet that, according to the reporter,
        was filtered out by the mc filter.
      The packet was properly received.
      
      Therefore the root cause of the reported issue seems to be somewhere
      else. Disabling mc filtering completely for the most common chip
      version is a quite big hammer. Therefore revert the change and wait
      for further analysis results from the reporter.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a263102
    • D. Wythe's avatar
      net/smc: avoid data corruption caused by decline · e6d71b43
      D. Wythe authored
      We found a data corruption issue during testing of SMC-R on Redis
      applications.
      
      The benchmark has a low probability of reporting a strange error as
      shown below.
      
      "Error: Protocol error, got "\xe2" as reply type byte"
      
      Finally, we found that the retrieved error data was as follows:
      
      0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C
      0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00
      0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2
      
      It is quite obvious that this is a SMC DECLINE message, which means that
      the applications received SMC protocol message.
      We found that this was caused by the following situations:
      
      client                  server
              ¦  clc proposal
              ------------->
              ¦  clc accept
              <-------------
              ¦  clc confirm
              ------------->
      wait llc confirm
      			send llc confirm
              ¦failed llc confirm
              ¦   x------
      (after 2s)timeout
                              wait llc confirm rsp
      
      wait decline
      
      (after 1s) timeout
                              (after 2s) timeout
              ¦   decline
              -------------->
              ¦   decline
              <--------------
      
      As a result, a decline message was sent in the implementation, and this
      message was read from TCP by the already-fallback connection.
      
      This patch double the client timeout as 2x of the server value,
      With this simple change, the Decline messages should never cross or
      collide (during Confirm link timeout).
      
      This issue requires an immediate solution, since the protocol updates
      involve a more long-term solution.
      
      Fixes: 0fb0b02b ("net/smc: adapt SMC client code to use the LLC flow")
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Reviewed-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e6d71b43
    • Nguyen Dinh Phi's avatar
      nfc: virtual_ncidev: Add variable to check if ndev is running · 84d2db91
      Nguyen Dinh Phi authored
      syzbot reported an memory leak that happens when an skb is add to
      send_buff after virtual nci closed.
      This patch adds a variable to track if the ndev is running before
      handling new skb in send function.
      Signed-off-by: default avatarNguyen Dinh Phi <phind.uet@gmail.com>
      Reported-by: syzbot+6eb09d75211863f15e3e@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/lkml/00000000000075472b06007df4fb@google.com
      Reviewed-by: Bongsu Jeon
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84d2db91
    • Gan, Yi Fang's avatar
      net: stmmac: Add support for HW-accelerated VLAN stripping · 750011e2
      Gan, Yi Fang authored
      Current implementation supports driver level VLAN tag stripping only.
      The features is always on if CONFIG_VLAN_8021Q is enabled in kernel
      config and is not user configurable.
      
      This patch add support to MAC level VLAN tag stripping and can be
      configured through ethtool. If the rx-vlan-offload is off, the VLAN tag
      will be stripped by driver. If the rx-vlan-offload is on, the VLAN tag
      will be stripped by MAC.
      
      Command: ethtool -K <interface> rx-vlan-offload off | on
      Signed-off-by: default avatarLai Peter Jun Ann <jun.ann.lai@intel.com>
      Signed-off-by: default avatarGan, Yi Fang <yi.fang.gan@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      750011e2
    • Murali Karicheri's avatar
      net: hsr: Add support for MC filtering at the slave device · 36b20fcd
      Murali Karicheri authored
      When MC (multicast) list is updated by the networking layer due to a
      user command and as well as when allmulti flag is set, it needs to be
      passed to the enslaved Ethernet devices. This patch allows this
      to happen by implementing ndo_change_rx_flags() and ndo_set_rx_mode()
      API calls that in turns pass it to the slave devices using
      existing API calls.
      Signed-off-by: default avatarMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: default avatarRavi Gunasekaran <r-gunasekaran@ti.com>
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36b20fcd
    • Uros Bizjak's avatar
      x86/hyperv: Use atomic_try_cmpxchg() to micro-optimize hv_nmi_unknown() · 18286883
      Uros Bizjak authored
      Use atomic_try_cmpxchg() instead of atomic_cmpxchg(*ptr, old, new) == old
      in hv_nmi_unknown(). On x86 the CMPXCHG instruction returns success in
      the ZF flag, so this change saves a compare after CMPXCHG. The generated
      asm code improves from:
      
        3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
        45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
        4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
        51:	00
        52:	83 f8 ff             	cmp    $0xffffffff,%eax
        55:	0f 95 c0             	setne  %al
      
      to:
      
        3e:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
        45:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
        4a:	f0 0f b1 15 00 00 00 	lock cmpxchg %edx,0x0(%rip)
        51:	00
        52:	0f 95 c0             	setne  %al
      
      No functional change intended.
      
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Reviewed-by: default avatarMichael Kelley <mhklinux@outlook.com>
      Link: https://lore.kernel.org/r/20231114170038.381634-1-ubizjak@gmail.comSigned-off-by: default avatarWei Liu <wei.liu@kernel.org>
      Message-ID: <20231114170038.381634-1-ubizjak@gmail.com>
      18286883
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 53475287
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2023-11-21
      
      We've added 85 non-merge commits during the last 12 day(s) which contain
      a total of 63 files changed, 4464 insertions(+), 1484 deletions(-).
      
      The main changes are:
      
      1) Huge batch of verifier changes to improve BPF register bounds logic
         and range support along with a large test suite, and verifier log
         improvements, all from Andrii Nakryiko.
      
      2) Add a new kfunc which acquires the associated cgroup of a task within
         a specific cgroup v1 hierarchy where the latter is identified by its id,
         from Yafang Shao.
      
      3) Extend verifier to allow bpf_refcount_acquire() of a map value field
         obtained via direct load which is a use-case needed in sched_ext,
         from Dave Marchevsky.
      
      4) Fix bpf_get_task_stack() helper to add the correct crosstask check
         for the get_perf_callchain(), from Jordan Rome.
      
      5) Fix BPF task_iter internals where lockless usage of next_thread()
         was wrong. The rework also simplifies the code, from Oleg Nesterov.
      
      6) Fix uninitialized tail padding via LIBBPF_OPTS_RESET, and another
         fix for certain BPF UAPI structs to fix verifier failures seen
         in bpf_dynptr usage, from Yonghong Song.
      
      7) Add BPF selftest fixes for map_percpu_stats flakes due to per-CPU BPF
         memory allocator not being able to allocate per-CPU pointer successfully,
         from Hou Tao.
      
      8) Add prep work around dynptr and string handling for kfuncs which
         is later going to be used by file verification via BPF LSM and fsverity,
         from Song Liu.
      
      9) Improve BPF selftests to update multiple prog_tests to use ASSERT_*
         macros, from Yuran Pereira.
      
      10) Optimize LPM trie lookup to check prefixlen before walking the trie,
          from Florian Lehner.
      
      11) Consolidate virtio/9p configs from BPF selftests in config.vm file
          given they are needed consistently across archs, from Manu Bretelle.
      
      12) Small BPF verifier refactor to remove register_is_const(),
          from Shung-Hsi Yu.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits)
        selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in vmlinux
        selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
        selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bind_perm
        selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
        selftests/bpf: reduce verboseness of reg_bounds selftest logs
        bpf: bpf_iter_task_next: use next_task(kit->task) rather than next_task(kit->pos)
        bpf: bpf_iter_task_next: use __next_thread() rather than next_thread()
        bpf: task_group_seq_get_next: use __next_thread() rather than next_thread()
        bpf: emit frameno for PTR_TO_STACK regs if it differs from current one
        bpf: smarter verifier log number printing logic
        bpf: omit default off=0 and imm=0 in register state log
        bpf: emit map name in register state if applicable and available
        bpf: print spilled register state in stack slot
        bpf: extract register state printing
        bpf: move verifier state printing code to kernel/bpf/log.c
        bpf: move verbose_linfo() into kernel/bpf/log.c
        bpf: rename BPF_F_TEST_SANITY_STRICT to BPF_F_TEST_REG_INVARIANTS
        bpf: Remove test for MOVSX32 with offset=32
        selftests/bpf: add iter test requiring range x range logic
        veristat: add ability to set BPF_F_TEST_SANITY_STRICT flag with -r flag
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20231122000500.28126-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53475287