An error occurred fetching the project authors.
  1. 06 May, 2024 3 commits
  2. 05 Mar, 2024 1 commit
  3. 16 Feb, 2024 1 commit
    • Tejun Heo's avatar
      workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK · fd0a68a2
      Tejun Heo authored
      2f34d733 ("workqueue: Fix queue_work_on() with BH workqueues") added
      irq_work usage to workqueue; however, it turns out irq_work is actually
      optional and the change breaks build on configuration which doesn't have
      CONFIG_IRQ_WORK enabled.
      
      Fix build by making workqueue use irq_work only when CONFIG_SMP and enabling
      CONFIG_IRQ_WORK when CONFIG_SMP is set. It's reasonable to argue that it may
      be better to just always enable it. However, this still saves a small bit of
      memory for tiny UP configs and also the least amount of change, so, for now,
      let's keep it conditional.
      
      Verified to do the right thing for x86_64 allnoconfig and defconfig, and
      aarch64 allnoconfig, allnoconfig + prink disable (SMP but nothing selects
      IRQ_WORK) and a modified aarch64 Kconfig where !SMP and nothing selects
      IRQ_WORK.
      
      v2: `depends on SMP` leads to Kconfig warnings when CONFIG_IRQ_WORK is
          selected by something else when !CONFIG_SMP. Use `def_bool y if SMP`
          instead.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarNaresh Kamboju <naresh.kamboju@linaro.org>
      Tested-by: default avatarAnders Roxell <anders.roxell@linaro.org>
      Fixes: 2f34d733 ("workqueue: Fix queue_work_on() with BH workqueues")
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      fd0a68a2
  4. 15 Feb, 2024 1 commit
    • Linus Torvalds's avatar
      update workarounds for gcc "asm goto" issue · 68fb3ca0
      Linus Torvalds authored
      In commit 4356e9f8 ("work around gcc bugs with 'asm goto' with
      outputs") I did the gcc workaround unconditionally, because the cause of
      the bad code generation wasn't entirely clear.
      
      In the meantime, Jakub Jelinek debugged the issue, and has come up with
      a fix in gcc [2], which also got backported to the still maintained
      branches of gcc-11, gcc-12 and gcc-13.
      
      Note that while the fix technically wasn't in the original gcc-14
      branch, Jakub says:
      
       "while it is true that no GCC 14 snapshots until today (or whenever the
        fix will be committed) have the fix, for GCC trunk it is up to the
        distros to use the latest snapshot if they use it at all and would
        allow better testing of the kernel code without the workaround, so
        that if there are other issues they won't be discovered years later.
        Most userland code doesn't actually use asm goto with outputs..."
      
      so we will consider gcc-14 to be fixed - if somebody is using gcc
      snapshots of the gcc-14 before the fix, they should upgrade.
      
      Note that while the bug goes back to gcc-11, in practice other gcc
      changes seem to have effectively hidden it since gcc-12.1 as per a
      bisect by Jakub.  So even a gcc-14 snapshot without the fix likely
      doesn't show actual problems.
      
      Also, make the default 'asm_goto_output()' macro mark the asm as
      volatile by hand, because of an unrelated gcc issue [1] where it doesn't
      match the documented behavior ("asm goto is always volatile").
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979 [1]
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921 [2]
      Link: https://lore.kernel.org/all/20240208220604.140859-1-seanjc@google.com/Requested-by: default avatarJakub Jelinek <jakub@redhat.com>
      Cc: Uros Bizjak <ubizjak@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Andrew Pinski <quic_apinski@quicinc.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68fb3ca0
  5. 08 Feb, 2024 1 commit
  6. 01 Feb, 2024 1 commit
    • Linus Torvalds's avatar
      Kconfig: Disable -Wstringop-overflow for GCC globally · 02153319
      Linus Torvalds authored
      It turns out it was never just gcc-11 that was broken.  Apparently it
      just happens to work on x86-64 with other gcc versions.
      
      On arm64, I see warnings with gcc version 13.2.1, and the kernel test
      robot reports the same problem on s390 with gcc 13.2.0.
      
      Admittedly it seems to be just the new Xe drm driver, but this is
      keeping me from doing my normal arm64 build testing.  So it gets
      reverted until somebody figures out what causes the problem (and why it
      doesn't show on x86-64, which is what makes me suspect it was never just
      about gcc-11, and more about just random happenstance).
      
      This also changes the Kconfig naming a bit - just make the "disable this
      for GCC" conditional be one simple Kconfig entry, and we can put the gcc
      version dependencies in that entry once we figure out what the correct
      rules are.
      
      The version dependency _may_ still end up being "gcc version larger than
      11" if the issue is purely in the Xe driver, but even if that ends up
      the case, let's make that all part of the "GCC_NO_STRINGOP_OVERFLOW"
      logic.
      
      For now, we just disable it for all gcc versions while the exact cause
      is unknown.
      
      Link: https://lore.kernel.org/all/202401161031.hjGJHMiJ-lkp@intel.com/T/
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      02153319
  7. 21 Jan, 2024 1 commit
  8. 20 Dec, 2023 1 commit
  9. 11 Sep, 2023 1 commit
    • Ard Biesheuvel's avatar
      arch: Remove Itanium (IA-64) architecture · cf8e8658
      Ard Biesheuvel authored
      The Itanium architecture is obsolete, and an informal survey [0] reveals
      that any residual use of Itanium hardware in production is mostly HP-UX
      or OpenVMS based. The use of Linux on Itanium appears to be limited to
      enthusiasts that occasionally boot a fresh Linux kernel to see whether
      things are still working as intended, and perhaps to churn out some
      distro packages that are rarely used in practice.
      
      None of the original companies behind Itanium still produce or support
      any hardware or software for the architecture, and it is listed as
      'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers
      that contributed on behalf of those companies (nor anyone else, for that
      matter) have been willing to support or maintain the architecture
      upstream or even be responsible for applying the odd fix. The Intel
      firmware team removed all IA-64 support from the Tianocore/EDK2
      reference implementation of EFI in 2018. (Itanium is the original
      architecture for which EFI was developed, and the way Linux supports it
      deviates significantly from other architectures.) Some distros, such as
      Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have
      dropped support years ago.
      
      While the argument is being made [1] that there is a 'for the common
      good' angle to being able to build and run existing projects such as the
      Grid Community Toolkit [2] on Itanium for interoperability testing, the
      fact remains that none of those projects are known to be deployed on
      Linux/ia64, and very few people actually have access to such a system in
      the first place. Even if there were ways imaginable in which Linux/ia64
      could be put to good use today, what matters is whether anyone is
      actually doing that, and this does not appear to be the case.
      
      There are no emulators widely available, and so boot testing Itanium is
      generally infeasible for ordinary contributors. GCC still supports IA-64
      but its compile farm [3] no longer has any IA-64 machines. GLIBC would
      like to get rid of IA-64 [4] too because it would permit some overdue
      code cleanups. In summary, the benefits to the ecosystem of having IA-64
      be part of it are mostly theoretical, whereas the maintenance overhead
      of keeping it supported is real.
      
      So let's rip off the band aid, and remove the IA-64 arch code entirely.
      This follows the timeline proposed by the Debian/ia64 maintainer [5],
      which removes support in a controlled manner, leaving IA-64 in a known
      good state in the most recent LTS release. Other projects will follow
      once the kernel support is removed.
      
      [0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/
      [1] https://lore.kernel.org/all/0075883c-7c51-00f5-2c2d-5119c1820410@web.de/
      [2] https://gridcf.org/gct-docs/latest/index.html
      [3] https://cfarm.tetaneutral.net/machines/list/
      [4] https://lore.kernel.org/all/87bkiilpc4.fsf@mid.deneb.enyo.de/
      [5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      cf8e8658
  10. 21 Aug, 2023 1 commit
  11. 18 Aug, 2023 1 commit
    • Eric DeVolder's avatar
      kexec: consolidate kexec and crash options into kernel/Kconfig.kexec · 89cde455
      Eric DeVolder authored
      Patch series "refactor Kconfig to consolidate KEXEC and CRASH options", v6.
      
      The Kconfig is refactored to consolidate KEXEC and CRASH options from
      various arch/<arch>/Kconfig files into new file kernel/Kconfig.kexec.
      
      The Kconfig.kexec is now a submenu titled "Kexec and crash features"
      located under "General Setup".
      
      The following options are impacted:
      
       - KEXEC
       - KEXEC_FILE
       - KEXEC_SIG
       - KEXEC_SIG_FORCE
       - KEXEC_IMAGE_VERIFY_SIG
       - KEXEC_BZIMAGE_VERIFY_SIG
       - KEXEC_JUMP
       - CRASH_DUMP
      
      Over time, these options have been copied between Kconfig files and
      are very similar to one another, but with slight differences.
      
      The following architectures are impacted by the refactor (because of
      use of one or more KEXEC/CRASH options):
      
       - arm
       - arm64
       - ia64
       - loongarch
       - m68k
       - mips
       - parisc
       - powerpc
       - riscv
       - s390
       - sh
       - x86 
      
      More information:
      
      In the patch series "crash: Kernel handling of CPU and memory hot
      un/plug"
      
       https://lore.kernel.org/lkml/20230503224145.7405-1-eric.devolder@oracle.com/
      
      the new kernel feature introduces the config option CRASH_HOTPLUG.
      
      In reviewing, Thomas Gleixner requested that the new config option
      not be placed in x86 Kconfig. Rather the option needs a generic/common
      home. To Thomas' point, the KEXEC and CRASH options have largely been
      duplicated in the various arch/<arch>/Kconfig files, with minor
      differences. This kind of proliferation is to be avoid/stopped.
      
       https://lore.kernel.org/lkml/875y91yv63.ffs@tglx/
      
      To that end, I have refactored the arch Kconfigs so as to consolidate
      the various KEXEC and CRASH options. Generally speaking, this work has
      the following themes:
      
      - KEXEC and CRASH options are moved into new file kernel/Kconfig.kexec
        - These items from arch/Kconfig:
            CRASH_CORE KEXEC_CORE KEXEC_ELF HAVE_IMA_KEXEC
        - These items from arch/x86/Kconfig form the common options:
            KEXEC KEXEC_FILE KEXEC_SIG KEXEC_SIG_FORCE
            KEXEC_BZIMAGE_VERIFY_SIG KEXEC_JUMP CRASH_DUMP
        - These items from arch/arm64/Kconfig form the common options:
            KEXEC_IMAGE_VERIFY_SIG
        - The crash hotplug series appends CRASH_HOTPLUG to Kconfig.kexec
      - The Kconfig.kexec is now a submenu titled "Kexec and crash features"
        and is now listed in "General Setup" submenu from init/Kconfig.
      - To control the common options, each has a new ARCH_SUPPORTS_<option>
        option. These gateway options determine whether the common options
        options are valid for the architecture.
      - To account for the slight differences in the original architecture
        coding of the common options, each now has a corresponding
        ARCH_SELECTS_<option> which are used to elicit the same side effects
        as the original arch/<arch>/Kconfig files for KEXEC and CRASH options.
      
      An example, 'make menuconfig' illustrating the submenu:
      
        > General setup > Kexec and crash features
        [*] Enable kexec system call
        [*] Enable kexec file based system call
        [*]   Verify kernel signature during kexec_file_load() syscall
        [ ]     Require a valid signature in kexec_file_load() syscall
        [ ]     Enable bzImage signature verification support
        [*] kexec jump
        [*] kernel crash dumps
        [*]   Update the crash elfcorehdr on system configuration changes
      
      In the process of consolidating the common options, I encountered
      slight differences in the coding of these options in several of the
      architectures. As a result, I settled on the following solution:
      
      - Each of the common options has a 'depends on ARCH_SUPPORTS_<option>'
        statement. For example, the KEXEC_FILE option has a 'depends on
        ARCH_SUPPORTS_KEXEC_FILE' statement.
      
        This approach is needed on all common options so as to prevent
        options from appearing for architectures which previously did
        not allow/enable them. For example, arm supports KEXEC but not
        KEXEC_FILE. The arch/arm/Kconfig does not provide
        ARCH_SUPPORTS_KEXEC_FILE and so KEXEC_FILE and related options
        are not available to arm.
      
      - The boolean ARCH_SUPPORTS_<option> in effect allows the arch to
        determine when the feature is allowed.  Archs which don't have the
        feature simply do not provide the corresponding ARCH_SUPPORTS_<option>.
        For each arch, where there previously were KEXEC and/or CRASH
        options, these have been replaced with the corresponding boolean
        ARCH_SUPPORTS_<option>, and an appropriate def_bool statement.
      
        For example, if the arch supports KEXEC_FILE, then the
        ARCH_SUPPORTS_KEXEC_FILE simply has a 'def_bool y'. This permits
        the KEXEC_FILE option to be available.
      
        If the arch has a 'depends on' statement in its original coding
        of the option, then that expression becomes part of the def_bool
        expression. For example, arm64 had:
      
        config KEXEC
          depends on PM_SLEEP_SMP
      
        and in this solution, this converts to:
      
        config ARCH_SUPPORTS_KEXEC
          def_bool PM_SLEEP_SMP
      
      
      - In order to account for the architecture differences in the
        coding for the common options, the ARCH_SELECTS_<option> in the
        arch/<arch>/Kconfig is used. This option has a 'depends on
        <option>' statement to couple it to the main option, and from
        there can insert the differences from the common option and the
        arch original coding of that option.
      
        For example, a few archs enable CRYPTO and CRYTPO_SHA256 for
        KEXEC_FILE. These require a ARCH_SELECTS_KEXEC_FILE and
        'select CRYPTO' and 'select CRYPTO_SHA256' statements.
      
      Illustrating the option relationships:
      
      For each of the common KEXEC and CRASH options:
       ARCH_SUPPORTS_<option> <- <option> <- ARCH_SELECTS_<option>
      
       <option>                   # in Kconfig.kexec
       ARCH_SUPPORTS_<option>     # in arch/<arch>/Kconfig, as needed
       ARCH_SELECTS_<option>      # in arch/<arch>/Kconfig, as needed
      
      
      For example, KEXEC:
       ARCH_SUPPORTS_KEXEC <- KEXEC <- ARCH_SELECTS_KEXEC
      
       KEXEC                      # in Kconfig.kexec
       ARCH_SUPPORTS_KEXEC        # in arch/<arch>/Kconfig, as needed
       ARCH_SELECTS_KEXEC         # in arch/<arch>/Kconfig, as needed
      
      
      To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
      enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
      select statements).
      
      Examples:
      A few examples to show the new strategy in action:
      
      ===== x86 (minus the help section) =====
      Original:
       config KEXEC
          bool "kexec system call"
          select KEXEC_CORE
      
       config KEXEC_FILE
          bool "kexec file based system call"
          select KEXEC_CORE
          select HAVE_IMA_KEXEC if IMA
          depends on X86_64
          depends on CRYPTO=y
          depends on CRYPTO_SHA256=y
      
       config ARCH_HAS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
       config KEXEC_SIG
          bool "Verify kernel signature during kexec_file_load() syscall"
          depends on KEXEC_FILE
      
       config KEXEC_SIG_FORCE
          bool "Require a valid signature in kexec_file_load() syscall"
          depends on KEXEC_SIG
      
       config KEXEC_BZIMAGE_VERIFY_SIG
          bool "Enable bzImage signature verification support"
          depends on KEXEC_SIG
          depends on SIGNED_PE_FILE_VERIFICATION
          select SYSTEM_TRUSTED_KEYRING
      
       config CRASH_DUMP
          bool "kernel crash dumps"
          depends on X86_64 || (X86_32 && HIGHMEM)
      
       config KEXEC_JUMP
          bool "kexec jump"
          depends on KEXEC && HIBERNATION
          help
      
      becomes...
      New:
      config ARCH_SUPPORTS_KEXEC
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_FILE
          def_bool X86_64 && CRYPTO && CRYPTO_SHA256
      
      config ARCH_SELECTS_KEXEC_FILE
          def_bool y
          depends on KEXEC_FILE
          select HAVE_IMA_KEXEC if IMA
      
      config ARCH_SUPPORTS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
      config ARCH_SUPPORTS_KEXEC_SIG
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_SIG_FORCE
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG
          def_bool y
      
      config ARCH_SUPPORTS_KEXEC_JUMP
          def_bool y
      
      config ARCH_SUPPORTS_CRASH_DUMP
          def_bool X86_64 || (X86_32 && HIGHMEM)
      
      
      ===== powerpc (minus the help section) =====
      Original:
       config KEXEC
          bool "kexec system call"
          depends on PPC_BOOK3S || PPC_E500 || (44x && !SMP)
          select KEXEC_CORE
      
       config KEXEC_FILE
          bool "kexec file based system call"
          select KEXEC_CORE
          select HAVE_IMA_KEXEC if IMA
          select KEXEC_ELF
          depends on PPC64
          depends on CRYPTO=y
          depends on CRYPTO_SHA256=y
      
       config ARCH_HAS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
       config CRASH_DUMP
          bool "Build a dump capture kernel"
          depends on PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
          select RELOCATABLE if PPC64 || 44x || PPC_85xx
      
      becomes...
      New:
      config ARCH_SUPPORTS_KEXEC
          def_bool PPC_BOOK3S || PPC_E500 || (44x && !SMP)
      
      config ARCH_SUPPORTS_KEXEC_FILE
          def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y
      
      config ARCH_SUPPORTS_KEXEC_PURGATORY
          def_bool KEXEC_FILE
      
      config ARCH_SELECTS_KEXEC_FILE
          def_bool y
          depends on KEXEC_FILE
          select KEXEC_ELF
          select HAVE_IMA_KEXEC if IMA
      
      config ARCH_SUPPORTS_CRASH_DUMP
          def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP)
      
      config ARCH_SELECTS_CRASH_DUMP
          def_bool y
          depends on CRASH_DUMP
          select RELOCATABLE if PPC64 || 44x || PPC_85xx
      
      
      Testing Approach and Results
      
      There are 388 config files in the arch/<arch>/configs directories.
      For each of these config files, a .config is generated both before and
      after this Kconfig series, and checked for equivalence. This approach
      allows for a rather rapid check of all architectures and a wide
      variety of configs wrt/ KEXEC and CRASH, and avoids requiring
      compiling for all architectures and running kernels and run-time
      testing.
      
      For each config file, the olddefconfig, allnoconfig and allyesconfig
      targets are utilized. In testing the randconfig has revealed problems
      as well, but is not used in the before and after equivalence check
      since one can not generate the "same" .config for before and after,
      even if using the same KCONFIG_SEED since the option list is
      different.
      
      As such, the following script steps compare the before and after
      of 'make olddefconfig'. The new symbols introduced by this series
      are filtered out, but otherwise the config files are PASS only if
      they were equivalent, and FAIL otherwise.
      
      The script performs the test by doing the following:
      
       # Obtain the "golden" .config output for given config file
       # Reset test sandbox
       git checkout master
       git branch -D test_Kconfig
       git checkout -B test_Kconfig master
       make distclean
       # Write out updated config
       cp -f <config file> .config
       make ARCH=<arch> olddefconfig
       # Track each item in .config, LHSB is "golden"
       scoreboard .config 
      
       # Obtain the "changed" .config output for given config file
       # Reset test sandbox
       make distclean
       # Apply this Kconfig series
       git am <this Kconfig series>
       # Write out updated config
       cp -f <config file> .config
       make ARCH=<arch> olddefconfig
       # Track each item in .config, RHSB is "changed"
       scoreboard .config 
      
       # Determine test result
       # Filter-out new symbols introduced by this series
       # Filter-out symbol=n which not in either scoreboard
       # Compare LHSB "golden" and RHSB "changed" scoreboards and issue PASS/FAIL
      
      The script was instrumental during the refactoring of Kconfig as it
      continually revealed problems. The end result being that the solution
      presented in this series passes all configs as checked by the script,
      with the following exceptions:
      
      - arch/ia64/configs/zx1_config with olddefconfig
        This config file has:
        # CONFIG_KEXEC is not set
        CONFIG_CRASH_DUMP=y
        and this refactor now couples KEXEC to CRASH_DUMP, so it is not
        possible to enable CRASH_DUMP without KEXEC.
      
      - arch/sh/configs/* with allyesconfig
        The arch/sh/Kconfig codes CRASH_DUMP as dependent upon BROKEN_ON_MMU
        (which clearly is not meant to be set). This symbol is not provided
        but with the allyesconfig it is set to yes which enables CRASH_DUMP.
        But KEXEC is coded as dependent upon MMU, and is set to no in
        arch/sh/mm/Kconfig, so KEXEC is not enabled.
        This refactor now couples KEXEC to CRASH_DUMP, so it is not
        possible to enable CRASH_DUMP without KEXEC.
      
      While the above exceptions are not equivalent to their original,
      the config file produced is valid (and in fact better wrt/ CRASH_DUMP
      handling).
      
      
      This patch (of 14)
      
      The config options for kexec and crash features are consolidated
      into new file kernel/Kconfig.kexec. Under the "General Setup" submenu
      is a new submenu "Kexec and crash handling". All the kexec and
      crash options that were once in the arch-dependent submenu "Processor
      type and features" are now consolidated in the new submenu.
      
      The following options are impacted:
      
       - KEXEC
       - KEXEC_FILE
       - KEXEC_SIG
       - KEXEC_SIG_FORCE
       - KEXEC_BZIMAGE_VERIFY_SIG
       - KEXEC_JUMP
       - CRASH_DUMP
      
      The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP.
      
      Architectures specify support of certain KEXEC and CRASH features with
      similarly named new ARCH_SUPPORTS_<option> config options.
      
      Architectures can utilize the new ARCH_SELECTS_<option> config
      options to specify additional components when <option> is enabled.
      
      To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be
      enabled, and the ARCH_SELECTS_<option> handles side effects (ie.
      select statements).
      
      Link: https://lkml.kernel.org/r/20230712161545.87870-1-eric.devolder@oracle.com
      Link: https://lkml.kernel.org/r/20230712161545.87870-2-eric.devolder@oracle.comSigned-off-by: default avatarEric DeVolder <eric.devolder@oracle.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Cc. "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com> # for x86
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Huacai Chen <chenhuacai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Juerg Haefliger <juerg.haefliger@canonical.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Marc Aurèle La France <tsi@tuyoix.net>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Mike Rapoport (IBM) <rppt@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul E. McKenney <paulmck@kernel.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Sebastian Reichel <sebastian.reichel@collabora.com>
      Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: WANG Xuerui <kernel@xen0n.name>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xin Li <xin3.li@intel.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      89cde455
  12. 02 Aug, 2023 1 commit
  13. 09 Jun, 2023 1 commit
    • Nhat Pham's avatar
      cachestat: implement cachestat syscall · cf264e13
      Nhat Pham authored
      There is currently no good way to query the page cache state of large file
      sets and directory trees.  There is mincore(), but it scales poorly: the
      kernel writes out a lot of bitmap data that userspace has to aggregate,
      when the user really doesn not care about per-page information in that
      case.  The user also needs to mmap and unmap each file as it goes along,
      which can be quite slow as well.
      
      Some use cases where this information could come in handy:
        * Allowing database to decide whether to perform an index scan or
          direct table queries based on the in-memory cache state of the
          index.
        * Visibility into the writeback algorithm, for performance issues
          diagnostic.
        * Workload-aware writeback pacing: estimating IO fulfilled by page
          cache (and IO to be done) within a range of a file, allowing for
          more frequent syncing when and where there is IO capacity, and
          batching when there is not.
        * Computing memory usage of large files/directory trees, analogous to
          the du tool for disk usage.
      
      More information about these use cases could be found in the following
      thread:
      
      https://lore.kernel.org/lkml/20230315170934.GA97793@cmpxchg.org/
      
      This patch implements a new syscall that queries cache state of a file and
      summarizes the number of cached pages, number of dirty pages, number of
      pages marked for writeback, number of (recently) evicted pages, etc.  in a
      given range.  Currently, the syscall is only wired in for x86
      architecture.
      
      NAME
          cachestat - query the page cache statistics of a file.
      
      SYNOPSIS
          #include <sys/mman.h>
      
          struct cachestat_range {
              __u64 off;
              __u64 len;
          };
      
          struct cachestat {
              __u64 nr_cache;
              __u64 nr_dirty;
              __u64 nr_writeback;
              __u64 nr_evicted;
              __u64 nr_recently_evicted;
          };
      
          int cachestat(unsigned int fd, struct cachestat_range *cstat_range,
              struct cachestat *cstat, unsigned int flags);
      
      DESCRIPTION
          cachestat() queries the number of cached pages, number of dirty
          pages, number of pages marked for writeback, number of evicted
          pages, number of recently evicted pages, in the bytes range given by
          `off` and `len`.
      
          An evicted page is a page that is previously in the page cache but
          has been evicted since. A page is recently evicted if its last
          eviction was recent enough that its reentry to the cache would
          indicate that it is actively being used by the system, and that
          there is memory pressure on the system.
      
          These values are returned in a cachestat struct, whose address is
          given by the `cstat` argument.
      
          The `off` and `len` arguments must be non-negative integers. If
          `len` > 0, the queried range is [`off`, `off` + `len`]. If `len` ==
          0, we will query in the range from `off` to the end of the file.
      
          The `flags` argument is unused for now, but is included for future
          extensibility. User should pass 0 (i.e no flag specified).
      
          Currently, hugetlbfs is not supported.
      
          Because the status of a page can change after cachestat() checks it
          but before it returns to the application, the returned values may
          contain stale information.
      
      RETURN VALUE
          On success, cachestat returns 0. On error, -1 is returned, and errno
          is set to indicate the error.
      
      ERRORS
          EFAULT cstat or cstat_args points to an invalid address.
      
          EINVAL invalid flags.
      
          EBADF  invalid file descriptor.
      
          EOPNOTSUPP file descriptor is of a hugetlbfs file
      
      [nphamcs@gmail.com: replace rounddown logic with the existing helper]
        Link: https://lkml.kernel.org/r/20230504022044.3675469-1-nphamcs@gmail.com
      Link: https://lkml.kernel.org/r/20230503013608.2431726-3-nphamcs@gmail.comSigned-off-by: default avatarNhat Pham <nphamcs@gmail.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cf264e13
  14. 23 Apr, 2023 1 commit
    • Linus Torvalds's avatar
      gcc: disable '-Warray-bounds' for gcc-13 too · 0da6e5fd
      Linus Torvalds authored
      We started disabling '-Warray-bounds' for gcc-12 originally on s390,
      because it resulted in some warnings that weren't realistically fixable
      (commit 8b202ee2: "s390: disable -Warray-bounds").
      
      That s390-specific issue was then found to be less common elsewhere, but
      generic (see f0be87c4: "gcc-12: disable '-Warray-bounds' universally
      for now"), and then later expanded the version check was expanded to
      gcc-11 (5a41237a: "gcc: disable -Warray-bounds for gcc-11 too").
      
      And it turns out that I was much too optimistic in thinking that it's
      all going to go away, and here we are with gcc-13 showing all the same
      issues.  So instead of expanding this one version at a time, let's just
      disable it for gcc-11+, and put an end limit to it only when we actually
      find a solution.
      
      Yes, I'm sure some of this is because the kernel just does odd things
      (like our "container_of()" use, but also knowingly playing games with
      things like linker tables and array layouts).
      
      And yes, some of the warnings are likely signs of real bugs, but when
      there are hundreds of false positives, that doesn't really help.
      
      Oh well.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0da6e5fd
  15. 29 Mar, 2023 1 commit
  16. 27 Mar, 2023 1 commit
  17. 06 Mar, 2023 1 commit
    • Greg Kroah-Hartman's avatar
      driver core: remove CONFIG_SYSFS_DEPRECATED and CONFIG_SYSFS_DEPRECATED_V2 · 721da5ce
      Greg Kroah-Hartman authored
      CONFIG_SYSFS_DEPRECATED was added in commit 88a22c98
      ("CONFIG_SYSFS_DEPRECATED") in 2006 to allow systems with older versions
      of some tools (i.e. Fedora 3's version of udev) to boot properly.  Four
      years later, in 2010, the option was attempted to be removed as most of
      userspace should have been fixed up properly by then, but some kernel
      developers clung to those old systems and refused to update, so we added
      CONFIG_SYSFS_DEPRECATED_V2 in commit e52eec13 ("SYSFS: Allow boot
      time switching between deprecated and modern sysfs layout") to allow
      them to continue to boot properly, and we allowed a boot time parameter
      to be used to switch back to the old format if needed.
      
      Over time, the logic that was covered under these config options was
      slowly removed from individual driver subsystems successfully, removed,
      and the only thing that is now left in the kernel are some changes in
      the block layer's representation in sysfs where real directories are
      used instead of symlinks like normal.
      
      Because the original changes were done to userspace tools in 2006, and
      all distros that use those tools are long end-of-life, and older
      non-udev-based systems do not care about the block layer's sysfs
      representation, it is time to finally remove this old logic and the
      config entries from the kernel.
      
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: linux-block@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Link: https://lore.kernel.org/r/20230223073326.2073220-1-gregkh@linuxfoundation.orgSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      721da5ce
  18. 21 Feb, 2023 2 commits
  19. 03 Feb, 2023 1 commit
    • Paul E. McKenney's avatar
      init: Remove "select SRCU" · bc636dcb
      Paul E. McKenney authored
      Now that the SRCU Kconfig option is unconditionally selected, there is
      no longer any point in selecting it.  Therefore, remove the "select SRCU"
      Kconfig statements.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Miguel Ojeda <ojeda@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarJohn Ogness <john.ogness@linutronix.de>
      bc636dcb
  20. 26 Jan, 2023 1 commit
  21. 17 Jan, 2023 1 commit
  22. 12 Jan, 2023 1 commit
  23. 11 Jan, 2023 1 commit
  24. 09 Jan, 2023 1 commit
  25. 27 Dec, 2022 1 commit
    • Mathieu Desnoyers's avatar
      sched: Introduce per-memory-map concurrency ID · af7f588d
      Mathieu Desnoyers authored
      This feature allows the scheduler to expose a per-memory map concurrency
      ID to user-space. This concurrency ID is within the possible cpus range,
      and is temporarily (and uniquely) assigned while threads are actively
      running within a memory map. If a memory map has fewer threads than
      cores, or is limited to run on few cores concurrently through sched
      affinity or cgroup cpusets, the concurrency IDs will be values close
      to 0, thus allowing efficient use of user-space memory for per-cpu
      data structures.
      
      This feature is meant to be exposed by a new rseq thread area field.
      
      The primary purpose of this feature is to do the heavy-lifting needed
      by memory allocators to allow them to use per-cpu data structures
      efficiently in the following situations:
      
      - Single-threaded applications,
      - Multi-threaded applications on large systems (many cores) with limited
        cpu affinity mask,
      - Multi-threaded applications on large systems (many cores) with
        restricted cgroup cpuset per container.
      
      One of the key concern from scheduler maintainers is the overhead
      associated with additional spin locks or atomic operations in the
      scheduler fast-path. This is why the following optimization is
      implemented.
      
      On context switch between threads belonging to the same memory map,
      transfer the mm_cid from prev to next without any atomic ops. This
      takes care of use-cases involving frequent context switch between
      threads belonging to the same memory map.
      
      Additional optimizations can be done if the spin locks added when
      context switching between threads belonging to different memory maps end
      up being a performance bottleneck. Those are left out of this patch
      though. A performance impact would have to be clearly demonstrated to
      justify the added complexity.
      
      The credit goes to Paul Turner (Google) for the original virtual cpu id
      idea. This feature is implemented based on the discussions with Paul
      Turner and Peter Oskolkov (Google), but I took the liberty to implement
      scheduler fast-path optimizations and my own NUMA-awareness scheme. The
      rumor has it that Google have been running a rseq vcpu_id extension
      internally in production for a year. The tcmalloc source code indeed has
      comments hinting at a vcpu_id prototype extension to the rseq system
      call [1].
      
      The following benchmarks do not show any significant overhead added to
      the scheduler context switch by this feature:
      
      * perf bench sched messaging (process)
      
      Baseline:                    86.5±0.3 ms
      With mm_cid:                 86.7±2.6 ms
      
      * perf bench sched messaging (threaded)
      
      Baseline:                    84.3±3.0 ms
      With mm_cid:                 84.7±2.6 ms
      
      * hackbench (process)
      
      Baseline:                    82.9±2.7 ms
      With mm_cid:                 82.9±2.9 ms
      
      * hackbench (threaded)
      
      Baseline:                    85.2±2.6 ms
      With mm_cid:                 84.4±2.9 ms
      
      [1] https://github.com/google/tcmalloc/blob/master/tcmalloc/internal/linux_syscall_support.h#L26Signed-off-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20221122203932.231377-8-mathieu.desnoyers@efficios.com
      af7f588d
  26. 22 Nov, 2022 1 commit
  27. 15 Nov, 2022 1 commit
    • Zhen Lei's avatar
      kallsyms: Add self-test facility · 30f3bb09
      Zhen Lei authored
      Added test cases for basic functions and performance of functions
      kallsyms_lookup_name(), kallsyms_on_each_symbol() and
      kallsyms_on_each_match_symbol(). It also calculates the compression rate
      of the kallsyms compression algorithm for the current symbol set.
      
      The basic functions test begins by testing a set of symbols whose address
      values are known. Then, traverse all symbol addresses and find the
      corresponding symbol name based on the address. It's impossible to
      determine whether these addresses are correct, but we can use the above
      three functions along with the addresses to test each other. Due to the
      traversal operation of kallsyms_on_each_symbol() is too slow, only 60
      symbols can be tested in one second, so let it test on average once
      every 128 symbols. The other two functions validate all symbols.
      
      If the basic functions test is passed, print only performance test
      results. If the test fails, print error information, but do not perform
      subsequent performance tests.
      
      Start self-test automatically after system startup if
      CONFIG_KALLSYMS_SELFTEST=y.
      
      Example of output content: (prefix 'kallsyms_selftest:' is omitted
       start
        ---------------------------------------------------------
       | nr_symbols | compressed size | original size | ratio(%) |
       |---------------------------------------------------------|
       |     107543 |       1357912   |      2407433  |  56.40   |
        ---------------------------------------------------------
       kallsyms_lookup_name() looked up 107543 symbols
       The time spent on each symbol is (ns): min=630, max=35295, avg=7353
       kallsyms_on_each_symbol() traverse all: 11782628 ns
       kallsyms_on_each_match_symbol() traverse all: 9261 ns
       finish
      Signed-off-by: default avatarZhen Lei <thunder.leizhen@huawei.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      30f3bb09
  28. 01 Nov, 2022 1 commit
  29. 21 Oct, 2022 1 commit
  30. 12 Oct, 2022 1 commit
  31. 03 Oct, 2022 1 commit
  32. 28 Sep, 2022 1 commit
  33. 05 Sep, 2022 1 commit
  34. 21 Aug, 2022 1 commit
  35. 27 Jul, 2022 2 commits
  36. 23 Jul, 2022 1 commit
    • Tejun Heo's avatar
      cgroup: Make !percpu threadgroup_rwsem operations optional · 6a010a49
      Tejun Heo authored
      3942a9bd ("locking, rcu, cgroup: Avoid synchronize_sched() in
      __cgroup_procs_write()") disabled percpu operations on threadgroup_rwsem
      because the impiled synchronize_rcu() on write locking was pushing up the
      latencies too much for android which constantly moves processes between
      cgroups.
      
      This makes the hotter paths - fork and exit - slower as they're always
      forced into the slow path. There is no reason to force this on everyone
      especially given that more common static usage pattern can now completely
      avoid write-locking the rwsem. Write-locking is elided when turning on and
      off controllers on empty sub-trees and CLONE_INTO_CGROUP enables seeding a
      cgroup without grabbing the rwsem.
      
      Restore the default percpu operations and introduce the mount option
      "favordynmods" and config option CGROUP_FAVOR_DYNMODS for users who need
      lower latencies for the dynamic operations.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Michal Koutn� <mkoutny@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      6a010a49