1. 30 Nov, 2023 2 commits
  2. 24 Oct, 2023 1 commit
  3. 23 Oct, 2023 1 commit
    • Uros Bizjak's avatar
      x86/percpu: Introduce const-qualified const_pcpu_hot to micro-optimize code generation · ed2f752e
      Uros Bizjak authored
      Some variables in pcpu_hot, currently current_task and top_of_stack
      are actually per-thread variables implemented as per-CPU variables
      and thus stable for the duration of the respective task.  There is
      already an attempt to eliminate redundant reads from these variables
      using this_cpu_read_stable() asm macro, which hides the dependency
      on the read memory address. However, the compiler has limited ability
      to eliminate asm common subexpressions, so this approach results in a
      limited success.
      
      The solution is to allow more aggressive elimination by aliasing
      pcpu_hot into a const-qualified const_pcpu_hot, and to read stable
      per-CPU variables from this constant copy.
      
      The current per-CPU infrastructure does not support reads from
      const-qualified variables. However, when the compiler supports segment
      qualifiers, it is possible to declare the const-aliased variable in
      the relevant named address space. The compiler considers access to the
      variable, declared in this way, as a read from a constant location,
      and will optimize reads from the variable accordingly.
      
      By implementing constant-qualified const_pcpu_hot, the compiler can
      eliminate redundant reads from the constant variables, reducing the
      number of loads from current_task from 3766 to 3217 on a test build,
      a -14.6% reduction.
      
      The reduction of loads translates to the following code savings:
      
              text           data     bss      dec            hex filename
        25,477,353        4389456  808452 30675261        1d4113d vmlinux-old.o
        25,476,074        4389440  808452 30673966        1d40c2e vmlinux-new.o
      
      representing a code size reduction of -1279 bytes.
      
      [ mingo: Updated the changelog, EXPORT(const_pcpu_hot). ]
      Co-developed-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20231020162004.135244-1-ubizjak@gmail.com
      ed2f752e
  4. 20 Oct, 2023 4 commits
    • Uros Bizjak's avatar
      x86/percpu: Introduce %rip-relative addressing to PER_CPU_VAR() · 59bec00a
      Uros Bizjak authored
      Introduce x86_64 %rip-relative addressing to the PER_CPU_VAR() macro.
      Instructions using %rip-relative address operand are one byte shorter
      than their absolute address counterparts and are also compatible with
      position independent executable (-fpie) builds. The patch reduces
      code size of a test kernel build by 150 bytes.
      
      The PER_CPU_VAR() macro is intended to be applied to a symbol and should
      not be used with register operands. Introduce the new __percpu macro and
      use it in cmpxchg{8,16}b_emu.S instead.
      
      Also add a missing function comment to this_cpu_cmpxchg8b_emu().
      
      No functional changes intended.
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sean Christopherson <seanjc@google.com>
      59bec00a
    • Uros Bizjak's avatar
      x86/percpu, xen: Correct PER_CPU_VAR() usage to include symbol and its addend · aa47f90c
      Uros Bizjak authored
      The PER_CPU_VAR() macro should be applied to a symbol and its addend.
      Inconsistent usage is currently harmless, but needs to be corrected
      before %rip-relative addressing is introduced to the PER_CPU_VAR() macro.
      
      No functional changes intended.
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sean Christopherson <seanjc@google.com>
      aa47f90c
    • Uros Bizjak's avatar
      x86/percpu: Correct PER_CPU_VAR() usage to include symbol and its addend · 39d64ee5
      Uros Bizjak authored
      The PER_CPU_VAR() macro should be applied to a symbol and its addend.
      Inconsistent usage is currently harmless, but needs to be corrected
      before %rip-relative addressing is introduced to the PER_CPU_VAR() macro.
      
      No functional changes intended.
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sean Christopherson <seanjc@google.com>
      39d64ee5
    • Linus Torvalds's avatar
      x86/fpu: Clean up FPU switching in the middle of task switching · 24b8a236
      Linus Torvalds authored
      It happens to work, but it's very very wrong, because our 'current'
      macro is magic that is supposedly loading a stable value.
      
      It just happens to be not quite stable enough and the compilers
      re-load the value enough for this code to work.  But it's wrong.
      
      The whole
      
              struct fpu *prev_fpu = &prev->fpu;
      
      thing in __switch_to() is pretty ugly. There's no reason why we
      should look at that 'prev_fpu' pointer there, or pass it down.
      
      And it only generates worse code, in how it loads 'current' when
      __switch_to() has the right task pointers.
      
      The attached patch not only cleans this up, it actually
      generates better code too:
      
       (a) it removes one push/pop pair at entry/exit because there's one
           less register used (no 'current')
      
       (b) it removes that pointless load of 'current' because it just uses
           the right argument:
      
      	-       movq    %gs:pcpu_hot(%rip), %r12
      	-       testq   $16384, (%r12)
      	+       testq   $16384, (%rdi)
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20231018184227.446318-1-ubizjak@gmail.com
      24b8a236
  5. 18 Oct, 2023 1 commit
    • Uros Bizjak's avatar
      x86/percpu: Use the correct asm operand modifier in percpu_stable_op() · e39828d2
      Uros Bizjak authored
      The "P" asm operand modifier is a x86 target-specific modifier.
      
      When used for a constant, it drops all syntax-specific prefixes and
      issues the bare constant. This modifier is not correct for address
      handling, in this case a generic "a" operand modifier should be used.
      
      The "a" asm operand modifier substitutes a memory reference, with the
      actual operand treated as address.  For x86_64, when a symbol is
      provided, the "a" modifier emits "sym(%rip)" instead of "sym",
      enabling shorter %rip-relative addressing.
      
      Clang allows only "i" and "r" operand constraints with an "a" modifier,
      so the patch normalizes the modifier/constraint pair to "a"/"i"
      which is consistent between both compilers.
      
      The patch reduces code size of a test build by 4072 bytes:
      
         text            data     bss    dec             hex     filename
         25523268        4388300  808452 30720020        1d4c014 vmlinux-old.o
         25519196        4388300  808452 30715948        1d4b02c vmlinux-new.o
      
      [ mingo: Changelog clarity. ]
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Uros Bizjak <ubizjak@gmail.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/r/20231016200755.287403-1-ubizjak@gmail.com
      e39828d2
  6. 16 Oct, 2023 2 commits
    • Uros Bizjak's avatar
      x86/percpu: Use C for arch_raw_cpu_ptr(), to improve code generation · 1d10f3ae
      Uros Bizjak authored
      Implement arch_raw_cpu_ptr() in C to allow the compiler to perform
      better optimizations, such as setting an appropriate base to compute
      the address. The compiler is free to choose either MOV or ADD from
      this_cpu_off address to construct the optimal final address.
      
      There are some other issues when memory access to the percpu area is
      implemented with an asm. Compilers can not eliminate asm common
      subexpressions over basic block boundaries, but are extremely good
      at optimizing memory access. By implementing arch_raw_cpu_ptr() in C,
      the compiler can eliminate additional redundant loads from this_cpu_off,
      further reducing the number of percpu offset reads from 1646 to 1631
      on a test build, a -0.9% reduction.
      Co-developed-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Uros Bizjak <ubizjak@gmail.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/r/20231015202523.189168-2-ubizjak@gmail.com
      1d10f3ae
    • Uros Bizjak's avatar
      x86/percpu: Rewrite arch_raw_cpu_ptr() to be easier for compilers to optimize · a048d3ab
      Uros Bizjak authored
      Implement arch_raw_cpu_ptr() as a load from this_cpu_off and then
      add the ptr value to the base. This way, the compiler can propagate
      addend to the following instruction and simplify address calculation.
      
      E.g.: address calcuation in amd_pmu_enable_virt() improves from:
      
          48 c7 c0 00 00 00 00	mov    $0x0,%rax
      	87b7: R_X86_64_32S	cpu_hw_events
      
          65 48 03 05 00 00 00	add    %gs:0x0(%rip),%rax
          00
      	87bf: R_X86_64_PC32	this_cpu_off-0x4
      
          48 c7 80 28 13 00 00	movq   $0x0,0x1328(%rax)
          00 00 00 00
      
      to:
      
          65 48 8b 05 00 00 00	mov    %gs:0x0(%rip),%rax
          00
      	8798: R_X86_64_PC32	this_cpu_off-0x4
          48 c7 80 00 00 00 00	movq   $0x0,0x0(%rax)
          00 00 00 00
      	87a6: R_X86_64_32S	cpu_hw_events+0x1328
      
      The compiler also eliminates additional redundant loads from
      this_cpu_off, reducing the number of percpu offset reads
      from 1668 to 1646 on a test build, a -1.3% reduction.
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Uros Bizjak <ubizjak@gmail.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Link: https://lore.kernel.org/r/20231015202523.189168-1-ubizjak@gmail.com
      a048d3ab
  7. 10 Oct, 2023 1 commit
  8. 05 Oct, 2023 3 commits
    • Uros Bizjak's avatar
      x86/percpu: Use C for percpu read/write accessors · ca425634
      Uros Bizjak authored
      The percpu code mostly uses inline assembly. Using segment qualifiers
      allows to use C code instead, which enables the compiler to perform
      various optimizations (e.g. propagation of memory arguments). Convert
      percpu read and write accessors to C code, so the memory argument can
      be propagated to the instruction that uses this argument.
      
      Some examples of propagations:
      
      a) into sign/zero extensions:
      
      the code improves from:
      
          65 8a 05 00 00 00 00    mov    %gs:0x0(%rip),%al
          0f b6 c0                movzbl %al,%eax
      
      to:
      
          65 0f b6 05 00 00 00    movzbl %gs:0x0(%rip),%eax
          00
      
      and in a similar way for:
      
          movzbl %gs:0x0(%rip),%edx
          movzwl %gs:0x0(%rip),%esi
          movzbl %gs:0x78(%rbx),%eax
      
          movslq %gs:0x0(%rip),%rdx
          movslq %gs:(%rdi),%rbx
      
      b) into compares:
      
      the code improves from:
      
          65 8b 05 00 00 00 00    mov    %gs:0x0(%rip),%eax
          a9 00 00 0f 00          test   $0xf0000,%eax
      
      to:
      
          65 f7 05 00 00 00 00    testl  $0xf0000,%gs:0x0(%rip)
          00 00 0f 00
      
      and in a similar way for:
      
          testl  $0xf0000,%gs:0x0(%rip)
          testb  $0x1,%gs:0x0(%rip)
          testl  $0xff00,%gs:0x0(%rip)
      
          cmpb   $0x0,%gs:0x0(%rip)
          cmp    %gs:0x0(%rip),%r14d
          cmpw   $0x8,%gs:0x0(%rip)
          cmpb   $0x0,%gs:(%rax)
      
      c) into other insns:
      
      the code improves from:
      
         1a355:	83 fa ff             	cmp    $0xffffffff,%edx
         1a358:	75 07                	jne    1a361 <...>
         1a35a:	65 8b 15 00 00 00 00 	mov    %gs:0x0(%rip),%edx
         1a361:
      
      to:
      
         1a35a:	83 fa ff             	cmp    $0xffffffff,%edx
         1a35d:	65 0f 44 15 00 00 00 	cmove  %gs:0x0(%rip),%edx
         1a364:	00
      
      The above propagations result in the following code size
      improvements for current mainline kernel (with the default config),
      compiled with:
      
         # gcc (GCC) 12.3.1 20230508 (Red Hat 12.3.1-1)
      
         text            data     bss    dec             filename
         25508862        4386540  808388 30703790        vmlinux-vanilla.o
         25500922        4386532  808388 30695842        vmlinux-new.o
      Co-developed-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Link: https://lore.kernel.org/r/20231004192404.31733-1-ubizjak@gmail.com
      ca425634
    • Nadav Amit's avatar
      x86/percpu: Use compiler segment prefix qualifier · 9a462b9e
      Nadav Amit authored
      Using a segment prefix qualifier is cleaner than using a segment prefix
      in the inline assembly, and provides the compiler with more information,
      telling it that __seg_gs:[addr] is different than [addr] when it
      analyzes data dependencies. It also enables various optimizations that
      will be implemented in the next patches.
      
      Use segment prefix qualifiers when they are supported. Unfortunately,
      gcc does not provide a way to remove segment qualifiers, which is needed
      to use typeof() to create local instances of the per-CPU variable. For
      this reason, do not use the segment qualifier for per-CPU variables, and
      do casting using the segment qualifier instead.
      
      Uros: Improve compiler support detection and update the patch
      to the current mainline.
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Link: https://lore.kernel.org/r/20231004145137.86537-4-ubizjak@gmail.com
      9a462b9e
    • Uros Bizjak's avatar
      x86/percpu: Enable named address spaces with known compiler version · 1ca3683c
      Uros Bizjak authored
      Enable named address spaces with known compiler versions
      (GCC 12.1 and later) in order to avoid possible issues with named
      address spaces with older compilers. Set CC_HAS_NAMED_AS when the
      compiler satisfies version requirements and set USE_X86_SEG_SUPPORT
      to signal when segment qualifiers could be used.
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Link: https://lore.kernel.org/r/20231004145137.86537-3-ubizjak@gmail.com
      1ca3683c
  9. 03 Oct, 2023 1 commit
    • Zhu Wang's avatar
      x86/lib: Address kernel-doc warnings · 8ae292c6
      Zhu Wang authored
      Fix all kernel-doc warnings in csum-wrappers_64.c:
      
        arch/x86/lib/csum-wrappers_64.c:25: warning: Excess function parameter 'isum' description in 'csum_and_copy_from_user'
        arch/x86/lib/csum-wrappers_64.c:25: warning: Excess function parameter 'errp' description in 'csum_and_copy_from_user'
        arch/x86/lib/csum-wrappers_64.c:49: warning: Excess function parameter 'isum' description in 'csum_and_copy_to_user'
        arch/x86/lib/csum-wrappers_64.c:49: warning: Excess function parameter 'errp' description in 'csum_and_copy_to_user'
        arch/x86/lib/csum-wrappers_64.c:71: warning: Excess function parameter 'sum' description in 'csum_partial_copy_nocheck'
      Signed-off-by: default avatarZhu Wang <wangzhu9@huawei.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      8ae292c6
  10. 27 Sep, 2023 2 commits
  11. 22 Sep, 2023 1 commit
    • Ingo Molnar's avatar
      x86/bitops: Remove unused __sw_hweight64() assembly implementation on x86-32 · ad424743
      Ingo Molnar authored
      Header cleanups in the fast-headers tree highlighted that we have an
      unused assembly implementation for __sw_hweight64():
      
          WARNING: modpost: EXPORT symbol "__sw_hweight64" [vmlinux] version ...
      
      __arch_hweight64() on x86-32 is defined in the
      arch/x86/include/asm/arch_hweight.h header as an inline, using
      __arch_hweight32():
      
        #ifdef CONFIG_X86_32
        static inline unsigned long __arch_hweight64(__u64 w)
        {
                return  __arch_hweight32((u32)w) +
                        __arch_hweight32((u32)(w >> 32));
        }
      
      *But* there's also a __sw_hweight64() assembly implementation:
      
        arch/x86/lib/hweight.S
      
        SYM_FUNC_START(__sw_hweight64)
        #ifdef CONFIG_X86_64
        ...
        #else /* CONFIG_X86_32 */
              /* We're getting an u64 arg in (%eax,%edx): unsigned long hweight64(__u64 w) */
              pushl   %ecx
      
              call    __sw_hweight32
              movl    %eax, %ecx                      # stash away result
              movl    %edx, %eax                      # second part of input
              call    __sw_hweight32
              addl    %ecx, %eax                      # result
      
              popl    %ecx
              ret
        #endif
      
      But this __sw_hweight64 assembly implementation is unused - and it's
      essentially doing the same thing that the inline wrapper does.
      
      Remove the assembly version and add a comment about it.
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      ad424743
  12. 21 Sep, 2023 1 commit
    • Uros Bizjak's avatar
      x86/percpu: Do not clobber %rsi in percpu_{try_,}cmpxchg{64,128}_op · 7c097ca5
      Uros Bizjak authored
      The fallback alternative uses %rsi register to manually load pointer
      to the percpu variable before the call to the emulation function.
      This is unoptimal, because the load is hidden from the compiler.
      
      Move the load of %rsi outside inline asm, so the compiler can
      reuse the value. The code in slub.o improves from:
      
          55ac:	49 8b 3c 24          	mov    (%r12),%rdi
          55b0:	48 8d 4a 40          	lea    0x40(%rdx),%rcx
          55b4:	49 8b 1c 07          	mov    (%r15,%rax,1),%rbx
          55b8:	4c 89 f8             	mov    %r15,%rax
          55bb:	48 8d 37             	lea    (%rdi),%rsi
          55be:	e8 00 00 00 00       	callq  55c3 <...>
      			55bf: R_X86_64_PLT32	this_cpu_cmpxchg16b_emu-0x4
          55c3:	75 a3                	jne    5568 <...>
          55c5:	...
      
       0000000000000000 <.altinstr_replacement>:
         5:	65 48 0f c7 0f       	cmpxchg16b %gs:(%rdi)
      
      to:
      
          55ac:	49 8b 34 24          	mov    (%r12),%rsi
          55b0:	48 8d 4a 40          	lea    0x40(%rdx),%rcx
          55b4:	49 8b 1c 07          	mov    (%r15,%rax,1),%rbx
          55b8:	4c 89 f8             	mov    %r15,%rax
          55bb:	e8 00 00 00 00       	callq  55c0 <...>
      			55bc: R_X86_64_PLT32	this_cpu_cmpxchg16b_emu-0x4
          55c0:	75 a6                	jne    5568 <...>
          55c2:	...
      
      Where the alternative replacement instruction now uses %rsi:
      
       0000000000000000 <.altinstr_replacement>:
         5:	65 48 0f c7 0e       	cmpxchg16b %gs:(%rsi)
      
      The instruction (effectively a reg-reg move) at 55bb: in the original
      assembly is removed. Also, both the CALL and replacement CMPXCHG16B
      are 5 bytes long, removing the need for NOPs in the asm code.
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20230918151452.62344-1-ubizjak@gmail.com
      7c097ca5
  13. 15 Sep, 2023 3 commits
  14. 06 Sep, 2023 1 commit
  15. 04 Sep, 2023 6 commits
    • Linus Torvalds's avatar
      Merge tag 'timers-core-2023-09-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4accdb98
      Linus Torvalds authored
      Pull clocksource/clockevent driver updates from Thomas Gleixner:
      
       - Remove the OXNAS driver instead of adding a new one!
      
       - A set of boring fixes, cleanups and improvements
      
      * tag 'timers-core-2023-09-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource: Explicitly include correct DT includes
        clocksource/drivers/sun5i: Convert to platform device driver
        clocksource/drivers/sun5i: Remove pointless struct
        clocksource/drivers/sun5i: Remove duplication of code and data
        clocksource/drivers/loongson1: Set variable ls1x_timer_lock storage-class-specifier to static
        clocksource/drivers/arm_arch_timer: Disable timer before programming CVAL
        dt-bindings: timer: oxsemi,rps-timer: remove obsolete bindings
        clocksource/drivers/timer-oxnas-rps: Remove obsolete timer driver
      4accdb98
    • Linus Torvalds's avatar
      Merge tag 'm68knommu-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 7a1415ee
      Linus Torvalds authored
      Pull m68knommu updates from Greg Ungerer:
       "Two changes, one a trivial white space clean up, the other removes the
        unnecessary local pcibios_setup() code"
      
      * tag 'm68knommu-for-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k: coldfire: dma_timer: ERROR: "foo __init bar" should be "foo __init bar"
        m68k/pci: Drop useless pcibios_setup()
      7a1415ee
    • Linus Torvalds's avatar
      Merge tag 'uml-for-linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · 68d76d4e
      Linus Torvalds authored
      Pull UML updates from Richard Weinberger:
      
       - Drop 32-bit checksum implementation and re-use it from arch/x86
      
       - String function cleanup
      
       - Fixes for -Wmissing-variable-declarations and -Wmissing-prototypes
         builds
      
      * tag 'uml-for-linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux:
        um: virt-pci: fix missing declaration warning
        um: Refactor deprecated strncpy to memcpy
        um: fix 3 instances of -Wmissing-prototypes
        um: port_kern: fix -Wmissing-variable-declarations
        uml: audio: fix -Wmissing-variable-declarations
        um: vector: refactor deprecated strncpy
        um: use obj-y to descend into arch/um/*/
        um: Hard-code the result of 'uname -s'
        um: Use the x86 checksum implementation on 32-bit
        asm-generic: current: Don't include thread-info.h if building asm
        um: Remove unsued extern declaration ldt_host_info()
        um: Fix hostaudio build errors
        um: Remove strlcpy usage
      68d76d4e
    • Linus Torvalds's avatar
      Merge tag 'hyperv-next-signed-20230902' of... · 0b90c563
      Linus Torvalds authored
      Merge tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv updates from Wei Liu:
      
       - Support for SEV-SNP guests on Hyper-V (Tianyu Lan)
      
       - Support for TDX guests on Hyper-V (Dexuan Cui)
      
       - Use SBRM API in Hyper-V balloon driver (Mitchell Levy)
      
       - Avoid dereferencing ACPI root object handle in VMBus driver (Maciej
         Szmigiero)
      
       - A few misecllaneous fixes (Jiapeng Chong, Nathan Chancellor, Saurabh
         Sengar)
      
      * tag 'hyperv-next-signed-20230902' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits)
        x86/hyperv: Remove duplicate include
        x86/hyperv: Move the code in ivm.c around to avoid unnecessary ifdef's
        x86/hyperv: Remove hv_isolation_type_en_snp
        x86/hyperv: Use TDX GHCI to access some MSRs in a TDX VM with the paravisor
        Drivers: hv: vmbus: Bring the post_msg_page back for TDX VMs with the paravisor
        x86/hyperv: Introduce a global variable hyperv_paravisor_present
        Drivers: hv: vmbus: Support >64 VPs for a fully enlightened TDX/SNP VM
        x86/hyperv: Fix serial console interrupts for fully enlightened TDX guests
        Drivers: hv: vmbus: Support fully enlightened TDX guests
        x86/hyperv: Support hypercalls for fully enlightened TDX guests
        x86/hyperv: Add hv_isolation_type_tdx() to detect TDX guests
        x86/hyperv: Fix undefined reference to isolation_type_en_snp without CONFIG_HYPERV
        x86/hyperv: Add missing 'inline' to hv_snp_boot_ap() stub
        hv: hyperv.h: Replace one-element array with flexible-array member
        Drivers: hv: vmbus: Don't dereference ACPI root object handle
        x86/hyperv: Add hyperv-specific handling for VMMCALL under SEV-ES
        x86/hyperv: Add smp support for SEV-SNP guest
        clocksource: hyper-v: Mark hyperv tsc page unencrypted in sev-snp enlightened guest
        x86/hyperv: Use vmmcall to implement Hyper-V hypercall in sev-snp enlightened guest
        drivers: hv: Mark percpu hvcall input arg page unencrypted in SEV-SNP enlightened guest
        ...
      0b90c563
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · e4f1b820
      Linus Torvalds authored
      Pull virtio updates from Michael Tsirkin:
       "A small pull request this time around, mostly because the vduse
        network got postponed to next relase so we can be sure we got the
        security store right"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_ring: fix avail_wrap_counter in virtqueue_add_packed
        virtio_vdpa: build affinity masks conditionally
        virtio_net: merge dma operations when filling mergeable buffers
        virtio_ring: introduce dma sync api for virtqueue
        virtio_ring: introduce dma map api for virtqueue
        virtio_ring: introduce virtqueue_reset()
        virtio_ring: separate the logic of reset/enable from virtqueue_resize
        virtio_ring: correct the expression of the description of virtqueue_resize()
        virtio_ring: skip unmap for premapped
        virtio_ring: introduce virtqueue_dma_dev()
        virtio_ring: support add premapped buf
        virtio_ring: introduce virtqueue_set_dma_premapped()
        virtio_ring: put mapping error check in vring_map_one_sg
        virtio_ring: check use_dma_api before unmap desc for indirect
        vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
        vdpa: add get_backend_features vdpa operation
        vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature
        vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag
        vdpa/mlx5: Remove unused function declarations
      e4f1b820
    • Linus Torvalds's avatar
      Merge tag 'tomoyo-pr-20230903' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1 · 5c5e0e81
      Linus Torvalds authored
      Pull tomoyo updates from Tetsuo Handa:
       "Three cleanup patches, no behavior changes"
      
      * tag 'tomoyo-pr-20230903' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1:
        tomoyo: remove unused function declaration
        tomoyo: refactor deprecated strncpy
        tomoyo: add format attributes to functions
      5c5e0e81
  16. 03 Sep, 2023 10 commits
    • Yuan Yao's avatar
      virtio_ring: fix avail_wrap_counter in virtqueue_add_packed · 1acfe2c1
      Yuan Yao authored
      In current packed virtqueue implementation, the avail_wrap_counter won't
      flip, in the case when the driver supplies a descriptor chain with a
      length equals to the queue size; total_sg == vq->packed.vring.num.
      
      Let’s assume the following situation:
      vq->packed.vring.num=4
      vq->packed.next_avail_idx: 1
      vq->packed.avail_wrap_counter: 0
      
      Then the driver adds a descriptor chain containing 4 descriptors.
      
      We expect the following result with avail_wrap_counter flipped:
      vq->packed.next_avail_idx: 1
      vq->packed.avail_wrap_counter: 1
      
      But, the current implementation gives the following result:
      vq->packed.next_avail_idx: 1
      vq->packed.avail_wrap_counter: 0
      
      To reproduce the bug, you can set a packed queue size as small as
      possible, so that the driver is more likely to provide a descriptor
      chain with a length equal to the packed queue size. For example, in
      qemu run following commands:
      sudo qemu-system-x86_64 \
      -enable-kvm \
      -nographic \
      -kernel "path/to/kernel_image" \
      -m 1G \
      -drive file="path/to/rootfs",if=none,id=disk \
      -device virtio-blk,drive=disk \
      -drive file="path/to/disk_image",if=none,id=rwdisk \
      -device virtio-blk,drive=rwdisk,packed=on,queue-size=4,\
      indirect_desc=off \
      -append "console=ttyS0 root=/dev/vda rw init=/bin/bash"
      
      Inside the VM, create a directory and mount the rwdisk device on it. The
      rwdisk will hang and mount operation will not complete.
      
      This commit fixes the wrap counter error by flipping the
      packed.avail_wrap_counter, when start of descriptor chain equals to the
      end of descriptor chain (head == i).
      
      Fixes: 1ce9e605 ("virtio_ring: introduce packed ring support")
      Signed-off-by: default avatarYuan Yao <yuanyaogoog@chromium.org>
      Message-Id: <20230808051110.3492693-1-yuanyaogoog@chromium.org>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      1acfe2c1
    • Jason Wang's avatar
      virtio_vdpa: build affinity masks conditionally · ae15acea
      Jason Wang authored
      We try to build affinity mask via create_affinity_masks()
      unconditionally which may lead several issues:
      
      - the affinity mask is not used for parent without affinity support
        (only VDUSE support the affinity now)
      - the logic of create_affinity_masks() might not work for devices
        other than block. For example it's not rare in the networking device
        where the number of queues could exceed the number of CPUs. Such
        case breaks the current affinity logic which is based on
        group_cpus_evenly() who assumes the number of CPUs are not less than
        the number of groups. This can trigger a warning[1]:
      
      	if (ret >= 0)
      		WARN_ON(nr_present + nr_others < numgrps);
      
      Fixing this by only build the affinity masks only when
      
      - Driver passes affinity descriptor, driver like virtio-blk can make
        sure to limit the number of queues when it exceeds the number of CPUs
      - Parent support affinity setting config ops
      
      This help to avoid the warning. More optimizations could be done on
      top.
      
      [1]
      [  682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 group_cpus_evenly+0x1aa/0x1c0
      [  682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
      [  682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
      [  682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
      [  682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
      [  682.146679] RSP: 0018:ffffc9000215f498 EFLAGS: 00010293
      [  682.146682] RAX: 000000000001f1e0 RBX: 0000000000000041 RCX: 0000000000000000
      [  682.146684] RDX: ffff888109922058 RSI: 0000000000000041 RDI: 0000000000000030
      [  682.146686] RBP: ffff888109922058 R08: ffffc9000215f498 R09: ffffc9000215f4a0
      [  682.146687] R10: 00000000000198d0 R11: 0000000000000030 R12: ffff888107e02800
      [  682.146689] R13: 0000000000000030 R14: 0000000000000030 R15: 0000000000000041
      [  682.146692] FS:  00007fef52315740(0000) GS:ffff888237380000(0000) knlGS:0000000000000000
      [  682.146695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  682.146696] CR2: 00007fef52509000 CR3: 0000000110dbc004 CR4: 0000000000370ee0
      [  682.146698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  682.146700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  682.146701] Call Trace:
      [  682.146703]  <TASK>
      [  682.146705]  ? __warn+0x7b/0x130
      [  682.146709]  ? group_cpus_evenly+0x1aa/0x1c0
      [  682.146712]  ? report_bug+0x1c8/0x1e0
      [  682.146717]  ? handle_bug+0x3c/0x70
      [  682.146721]  ? exc_invalid_op+0x14/0x70
      [  682.146723]  ? asm_exc_invalid_op+0x16/0x20
      [  682.146727]  ? group_cpus_evenly+0x1aa/0x1c0
      [  682.146729]  ? group_cpus_evenly+0x15c/0x1c0
      [  682.146731]  create_affinity_masks+0xaf/0x1a0
      [  682.146735]  virtio_vdpa_find_vqs+0x83/0x1d0
      [  682.146738]  ? __pfx_default_calc_sets+0x10/0x10
      [  682.146742]  virtnet_find_vqs+0x1f0/0x370
      [  682.146747]  virtnet_probe+0x501/0xcd0
      [  682.146749]  ? vp_modern_get_status+0x12/0x20
      [  682.146751]  ? get_cap_addr.isra.0+0x10/0xc0
      [  682.146754]  virtio_dev_probe+0x1af/0x260
      [  682.146759]  really_probe+0x1a5/0x410
      
      Fixes: 3dad5682 ("virtio-vdpa: Support interrupt affinity spreading mechanism")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230811091539.1359865-1-jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      ae15acea
    • Xuan Zhuo's avatar
      virtio_net: merge dma operations when filling mergeable buffers · 295525e2
      Xuan Zhuo authored
      Currently, the virtio core will perform a dma operation for each
      buffer. Although, the same page may be operated multiple times.
      
      This patch, the driver does the dma operation and manages the dma
      address based the feature premapped of virtio core.
      
      This way, we can perform only one dma operation for the pages of the
      alloc frag. This is beneficial for the iommu device.
      
      kernel command line: intel_iommu=on iommu.passthrough=0
      
             |  strict=0  | strict=1
      Before |  775496pps | 428614pps
      After  | 1109316pps | 742853pps
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-13-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      295525e2
    • Xuan Zhuo's avatar
      virtio_ring: introduce dma sync api for virtqueue · 8bd2f710
      Xuan Zhuo authored
      These API has been introduced:
      
      * virtqueue_dma_need_sync
      * virtqueue_dma_sync_single_range_for_cpu
      * virtqueue_dma_sync_single_range_for_device
      
      These APIs can be used together with the premapped mechanism to sync the
      DMA address.
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-12-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      8bd2f710
    • Xuan Zhuo's avatar
      virtio_ring: introduce dma map api for virtqueue · b6253b4e
      Xuan Zhuo authored
      Added virtqueue_dma_map_api* to map DMA addresses for virtual memory in
      advance. The purpose is to keep memory mapped across multiple add/get
      buf operations.
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-11-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      b6253b4e
    • Xuan Zhuo's avatar
      virtio_ring: introduce virtqueue_reset() · ba3e0c47
      Xuan Zhuo authored
      Introduce virtqueue_reset() to release all buffer inside vq.
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-10-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      ba3e0c47
    • Xuan Zhuo's avatar
      virtio_ring: separate the logic of reset/enable from virtqueue_resize · ad48d53b
      Xuan Zhuo authored
      The subsequent reset function will reuse these logic.
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-9-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      ad48d53b
    • Xuan Zhuo's avatar
      virtio_ring: correct the expression of the description of virtqueue_resize() · 4d09f240
      Xuan Zhuo authored
      Modify the "useless" to a more accurate "unused".
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-8-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      4d09f240
    • Xuan Zhuo's avatar
      virtio_ring: skip unmap for premapped · b319940f
      Xuan Zhuo authored
      Now we add a case where we skip dma unmap, the vq->premapped is true.
      
      We can't just rely on use_dma_api to determine whether to skip the dma
      operation. For convenience, I introduced the "do_unmap". By default, it
      is the same as use_dma_api. If the driver is configured with premapped,
      then do_unmap is false.
      
      So as long as do_unmap is false, for addr of desc, we should skip dma
      unmap operation.
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Message-Id: <20230810123057.43407-7-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      b319940f
    • Xuan Zhuo's avatar
      virtio_ring: introduce virtqueue_dma_dev() · 2df64759
      Xuan Zhuo authored
      Added virtqueue_dma_dev() to get DMA device for virtio. Then the
      caller can do dma operation in advance. The purpose is to keep memory
      mapped across multiple add/get buf operations.
      Signed-off-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20230810123057.43407-6-xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      2df64759