1. 01 May, 2024 1 commit
  2. 30 Apr, 2024 2 commits
  3. 29 Apr, 2024 14 commits
    • Michael Ellerman's avatar
      selftests/powerpc: Install tests in sub-directories · dda32e37
      Michael Ellerman authored
      The sources for the powerpc selftests are arranged into sub-directories.
      However when the tests are built and installed, the sub-directories are
      squashed, losing the structure.
      
      For example, with the current code the result of installing the selftests is:
      
        $ tree tools/testing/selftests/kselftest_install
        tools/testing/selftests/kselftest_install
        ├── kselftest
        │   ├── ktap_helpers.sh
        │   ├── module.sh
        │   ├── prefix.pl
        │   └── runner.sh
        ├── kselftest-list.txt
        ├── powerpc
        │   ├── alignment_handler
        │   ├── attr_test
        │   ├── back_to_back_ebbs_test
        │   ├── bad_accesses
        │   ├── bhrb_filter_map_test
        │   ├── bhrb_no_crash_wo_pmu_test
        │   ├── blacklisted_events_test
        │   ├── cache_shape
        │   ├── close_clears_pmcc_test
        │   ├── context_switch
        │   ├── copy_first_unaligned
        ...
        │   ├── settings
        ...
        │   └── wild_bctr
        └── run_kselftest.sh
      
      All the powerpc tests are squashed into the single powerpc directory. In
      particular, note that there is a single `settings` file, even though
      there are multiple settings files in the powerpc selftest sources. One
      of the settings files ends up installed, depending on install order,
      even if they have different contents.
      
      Similarly if there were two tests with the same name in different
      sub-directories they would clobber each other.
      
      Fix it by replicating the directory structure of the source tree into
      the install directory. The result being for example:
      
        $ tree tools/testing/selftests/kselftest_install
        tools/testing/selftests/kselftest_install
        ├── kselftest
        │   ├── ktap_helpers.sh
        │   ├── module.sh
        │   ├── prefix.pl
        │   └── runner.sh
        ├── kselftest-list.txt
        ├── powerpc
        │   ├── alignment
        │   │   ├── alignment_handler
        │   │   └── copy_first_unaligned
        │   ├── benchmarks
        │   │   ├── context_switch
        │   │   ├── exec_target
        │   │   ├── fork
        │   │   ├── futex_bench
        │   │   ├── gettimeofday
        │   │   ├── mmap_bench
        │   │   ├── null_syscall
        │   │   └── settings
        ...
        │   ├── eeh
        │   │   ├── eeh-basic.sh
        │   │   ├── eeh-functions.sh
        │   │   └── settings
        ...
        │   └── vphn
        │       └── test-vphn
        └── run_kselftest.sh
      
      Note multiple settings files in different sub-directories.
      
      This change also has the effect of changing the names of the tests from
      the point of view of the kselftest runner. Before the tests are named
      eg:
      
        powerpc:copy_first_unaligned
        powerpc:cache_shape
        powerpc:reg_access_test
      
      After, the test collection names include the sub-directory:
      
        powerpc/alignment:copy_first_unaligned
        powerpc/cache_shape:cache_shape
        powerpc/pmu/ebb:reg_access_test
      
      That means whereas previously all powerpc tests could be run with:
      
        $ ./run_kselftest.sh -c powerpc
      
      After the change it's necessary to pass a regex that matches all powerpc
      entries, eg:
      
        $ ./run_kselftest.sh -c "powerpc.*"
      
      The latter form also works before and after the change.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240422133453.1793988-2-mpe@ellerman.id.au
      dda32e37
    • Michael Ellerman's avatar
      selftests/powerpc: Convert pmu Makefile to for loop style · 822a0495
      Michael Ellerman authored
      The pmu Makefile has grown more sub directories over the years. Rather
      than open coding the rules for each subdir, use for loops.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240422133453.1793988-1-mpe@ellerman.id.au
      822a0495
    • Madhavan Srinivasan's avatar
      selftests/powerpc: make sub-folders buildable on their own · 108e5e68
      Madhavan Srinivasan authored
      Build breaks when executing make with run_tests for sub-folders
      under powerpc. This is because, CFLAGS and GIT_VERSION macros are
      defined in Makefile of toplevel powerpc folder.
      
        make: Entering directory '/home/maddy/linux/tools/testing/selftests/powerpc/mm'
        gcc     hugetlb_vs_thp_test.c ../harness.c ../utils.c  -o /home/maddy/selftest_output//hugetlb_vs_thp_test
        hugetlb_vs_thp_test.c:6:10: fatal error: utils.h: No such file or directory
            6 | #include "utils.h"
              |          ^~~~~~~~~
        compilation terminated.
      
      Fix this by adding the flags.mk in each sub-folder Makefile. Also remove
      the CFLAGS and GIT_VERSION macros from powerpc/ folder Makefile since
      the same is definied in flags.mk
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240229093711.581230-3-maddy@linux.ibm.com
      108e5e68
    • Madhavan Srinivasan's avatar
      selftests/powerpc: Add flags.mk to support pmu buildable · 5553a793
      Madhavan Srinivasan authored
      When running `make -C powerpc/pmu run_tests` from top level selftests
      directory, currently this error is being reported:
      
        make: Entering directory '/home/maddy/linux/tools/testing/selftests/powerpc/pmu'
        Makefile:40: warning: overriding recipe for target 'emit_tests'
        ../../lib.mk:111: warning: ignoring old recipe for target 'emit_tests'
        gcc -m64    count_instructions.c ../harness.c event.c lib.c ../utils.c loop.S  -o /home/maddy/selftest_output//count_instructions
        In file included from count_instructions.c:13:
        event.h:12:10: fatal error: utils.h: No such file or directory
        12 | #include "utils.h"
          |          ^~~~~~~~~
        compilation terminated.
      
      This is due to missing of include path in CFLAGS. That is, CFLAGS and
      GIT_VERSION macros are defined in the powerpc/ folder Makefile which
      in this case is not involved.
      
      To address the failure in case of executing specific sub-folder test
      directly, a new rule file has been addded by the patch called "flags.mk"
      under selftest/powerpc/ folder and is linked to all the Makefile of
      powerpc/pmu sub-folders.
      Reported-by: default avatarSachin Sant <sachinp@linux.ibm.com>
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.ibm.com>
      Tested-by: default avatarSachin Sant <sachinp@linux.ibm.com>
      [mpe: Fixup ifeq, make GIT_VERSION simply expanded to avoid re-executing git describe]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240229093711.581230-2-maddy@linux.ibm.com
      5553a793
    • Madhavan Srinivasan's avatar
      selftests/powerpc: Re-order *FLAGS to follow lib.mk · 37496845
      Madhavan Srinivasan authored
      In some powerpc/ sub-folder Makefiles, CFLAGS are defined before lib.mk
      include. Clean it up by re-ordering the flags to follow after the mk
      include. This is needed to support sub-folders in powerpc/ buildable on
      its own.
      Signed-off-by: default avatarMadhavan Srinivasan <maddy@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240229093711.581230-1-maddy@linux.ibm.com
      37496845
    • Lidong Zhong's avatar
      powerpc/pseries/vio: Don't return ENODEV if node or compatible missing · 29247de4
      Lidong Zhong authored
      We noticed the following nuisance messages during boot process:
      
        vio vio: uevent: failed to send synthetic uevent
        vio 4000: uevent: failed to send synthetic uevent
        vio 4001: uevent: failed to send synthetic uevent
        vio 4002: uevent: failedto send synthetic uevent
        vio 4004: uevent: failed to send synthetic uevent
      
      It's caused by either vio_register_device_node() failing to set
      dev->of_node or the node is missing a "compatible" property. To match
      the definition of modalias in modalias_show(), remove the return of
      ENODEV in such cases. The failure messages is also suppressed with this
      change.
      Signed-off-by: default avatarLidong Zhong <lidong.zhong@suse.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240411020450.12725-1-lidong.zhong@suse.com
      29247de4
    • Nathan Lynch's avatar
      powerpc/pseries: Enforce hcall result buffer validity and size · ff2e185c
      Nathan Lynch authored
      plpar_hcall(), plpar_hcall9(), and related functions expect callers to
      provide valid result buffers of certain minimum size. Currently this
      is communicated only through comments in the code and the compiler has
      no idea.
      
      For example, if I write a bug like this:
      
        long retbuf[PLPAR_HCALL_BUFSIZE]; // should be PLPAR_HCALL9_BUFSIZE
        plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, ...);
      
      This compiles with no diagnostics emitted, but likely results in stack
      corruption at runtime when plpar_hcall9() stores results past the end
      of the array. (To be clear this is a contrived example and I have not
      found a real instance yet.)
      
      To make this class of error less likely, we can use explicitly-sized
      array parameters instead of pointers in the declarations for the hcall
      APIs. When compiled with -Warray-bounds[1], the code above now
      provokes a diagnostic like this:
      
      error: array argument is too small;
      is of size 32, callee requires at least 72 [-Werror,-Warray-bounds]
         60 |                 plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf,
            |                 ^                                   ~~~~~~
      
      [1] Enabled for LLVM builds but not GCC for now. See commit
          0da6e5fd ("gcc: disable '-Warray-bounds' for gcc-13 too") and
          related changes.
      Signed-off-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240408-pseries-hvcall-retbuf-v1-1-ebc73d7253cf@linux.ibm.com
      ff2e185c
    • Michael Ellerman's avatar
      powerpc/dart: Drop unnecessary call to kmemleak_no_scan() · 4ccae236
      Michael Ellerman authored
      Erhard reported that kmemleak was showing a warning at boot:
      
        kmemleak: Not scanning unknown object at 0xc00000007f000000
        CPU: 0 PID: 0 Comm: swapper Not tainted 5.19.0-rc3-PMacG5+ #2
        Call Trace:
         .dump_stack_lvl+0x7c/0xc4 (unreliable)
         .kmemleak_no_scan+0xe0/0x100
         .iommu_init_early_dart+0x2f0/0x924
         .pmac_probe+0x1b0/0x20c
         .setup_arch+0x1b8/0x674
         .start_kernel+0xdc/0xb74
         start_here_common+0x1c/0x44
        DART table allocated at: (____ptrval____)
      
      Which he bisected to a change in kmemleak, commit
      23c2d497 ("mm: kmemleak: take a full lowmem check in kmemleak_*_phys()").
      
      Because pmac_probe() is called before mem_topology_setup(), the min/
      max PFN variables are still zero. That causes kmemleak_alloc_phys() to
      ignore the allocation, because the checks against the PFN fail. Then
      kmemleak_no_scan() can't find the allocation and prints warning.
      
      Given that kmemleak_alloc_phys() is ignoring the allocation to begin
      with, there's no need to call kmemleak_no_scan() at all, which avoids
      the warning.
      Reported-by: default avatarErhard Furtner <erhard_f@mailbox.org>
      Closes: https://lore.kernel.org/all/bug-216156-206035@https.bugzilla.kernel.org%2F/Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240419115913.3317575-1-mpe@ellerman.id.au
      4ccae236
    • Ganesh Goudar's avatar
      powerpc/eeh: Permanently disable the removed device · d1679b4f
      Ganesh Goudar authored
      When a device is hot removed on powernv, the hotplug driver clears
      the device's state. However, on pseries, if a device is removed by
      phyp after reaching the error threshold, the kernel remains unaware,
      leading to the device not being torn down. This prevents necessary
      remediation actions like failover.
      
      Permanently disable the device if the presence check fails.
      
      Also, in eeh_dev_check_failure in we may consider the error as false
      positive if the device is hotpluged out as the get_state call returns
      EEH_STATE_NOT_SUPPORT and we may end up not clearing the device state,
      so log the event if the state is not moved to permanent failure state.
      Signed-off-by: default avatarGanesh Goudar <ganeshgr@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240422075737.1405551-1-ganeshgr@linux.ibm.com
      d1679b4f
    • Sourabh Jain's avatar
      Documentation/powerpc: update fadump implementation details · 57e67001
      Sourabh Jain authored
      The patch titled ("powerpc: make fadump resilient with memory add/remove
      events") has made significant changes to the implementation of fadump,
      particularly on elfcorehdr creation and fadump crash info header
      structure. Therefore, updating the fadump implementation documentation
      to reflect those changes.
      
      Following updates are done to firmware assisted dump documentation:
      
      1. The elfcorehdr is no longer stored after fadump HDR in the reserved
         dump area. Instead, the second kernel dynamically allocates memory
         for the elfcorehdr within the address range from 0 to the boot memory
         size. Therefore, update figures 1 and 2 of Memory Reservation during
         the first and second kernels to reflect this change.
      
      2. A version field has been added to the fadump header to manage the
         future changes to fadump crash info header structure without changing
         the fadump header magic number in the future. Therefore, remove the
         corresponding TODO from the document.
      Signed-off-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240422195932.1583833-4-sourabhjain@linux.ibm.com
      57e67001
    • Sourabh Jain's avatar
      powerpc/fadump: add hotplug_ready sysfs interface · bc446c5a
      Sourabh Jain authored
      The elfcorehdr describes the CPUs and memory of the crashed kernel to
      the kernel that captures the dump, known as the second or fadump kernel.
      The elfcorehdr needs to be updated if the system's memory changes due to
      memory hotplug or online/offline events.
      
      Currently, memory hotplug events are monitored in userspace by udev
      rules, and fadump is re-registered, which recreates the elfcorehdr with
      the latest available memory in the system.
      
      However, the previous patch ("powerpc: make fadump resilient with memory
      add/remove events") moved the creation of elfcorehdr to the second or
      fadump kernel. This eliminates the need to regenerate the elfcorehdr
      during memory hotplug or online/offline events.
      
      Create a sysfs entry at /sys/kernel/fadump/hotplug_ready to let
      userspace know that fadump re-registration is not required for memory
      add/remove events.
      Signed-off-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240422195932.1583833-3-sourabhjain@linux.ibm.com
      bc446c5a
    • Sourabh Jain's avatar
      powerpc: make fadump resilient with memory add/remove events · c6c5b14d
      Sourabh Jain authored
      Due to changes in memory resources caused by either memory hotplug or
      online/offline events, the elfcorehdr, which describes the CPUs and
      memory of the crashed kernel to the kernel that collects the dump (known
      as second/fadump kernel), becomes outdated. Consequently, attempting
      dump collection with an outdated elfcorehdr can lead to failed or
      inaccurate dump collection.
      
      Memory hotplug or online/offline events is referred as memory add/remove
      events in reset of the commit message.
      
      The current solution to address the aforementioned issue is as follows:
      Monitor memory add/remove events in userspace using udev rules, and
      re-register fadump whenever there are changes in memory resources. This
      leads to the creation of a new elfcorehdr with updated system memory
      information.
      
      There are several notable issues associated with re-registering fadump
      for every memory add/remove events.
      
      1. Bulk memory add/remove events with udev-based fadump re-registration
         can lead to race conditions and, more importantly, it creates a wide
         window during which fadump is inactive until all memory add/remove
         events are settled.
      2. Re-registering fadump for every memory add/remove event is
         inefficient.
      3. The memory for elfcorehdr is allocated based on the memblock regions
         available during early boot and remains fixed thereafter. However, if
         elfcorehdr is later recreated with additional memblock regions, its
         size will increase, potentially leading to memory corruption.
      
      Address the aforementioned challenges by shifting the creation of
      elfcorehdr from the first kernel (also referred as the crashed kernel),
      where it was created and frequently recreated for every memory
      add/remove event, to the fadump kernel. As a result, the elfcorehdr only
      needs to be created once, thus eliminating the necessity to re-register
      fadump during memory add/remove events.
      
      At present, the first kernel prepares fadump header and stores it in the
      fadump reserved area. The fadump header includes the start address of
      the elfcorehdr, crashing CPU details, and other relevant information. In
      the event of a crash in the first kernel, the second/fadump boots and
      accesses the fadump header prepared by the first kernel. It then
      performs the following steps in a platform-specific function
      [rtas|opal]_fadump_process:
      
      1. Sanity check for fadump header
      2. Update CPU notes in elfcorehdr
      
      Along with the above, update the setup_fadump()/fadump.c to create
      elfcorehdr and set its address to the global variable elfcorehdr_addr
      for the vmcore module to process it in the second/fadump kernel.
      
      Section below outlines the information required to create the elfcorehdr
      and the changes made to make it available to the fadump kernel if it's
      not already.
      
      To create elfcorehdr, the following crashed kernel information is
      required: CPU notes, vmcoreinfo, and memory ranges.
      
      At present, the CPU notes are already prepared in the fadump kernel, so
      no changes are needed in that regard. The fadump kernel has access to
      all crashed kernel memory regions, including boot memory regions that
      are relocated by firmware to fadump reserved areas, so no changes for
      that either. However, it is necessary to add new members to the fadump
      header, i.e., the 'fadump_crash_info_header' structure, in order to pass
      the crashed kernel's vmcoreinfo address and its size to fadump kernel.
      
      In addition to the vmcoreinfo address and size, there are a few other
      attributes also added to the fadump_crash_info_header structure.
      
      1. version:
         It stores the fadump header version, which is currently set to 1.
         This provides flexibility to update the fadump crash info header in
         the future without changing the magic number. For each change in the
         fadump header, the version will be increased. This will help the
         updated kernel determine how to handle kernel dumps from older
         kernels. The magic number remains relevant for checking fadump header
         corruption.
      
      2. pt_regs_sz/cpu_mask_sz:
         Store size of pt_regs and cpu_mask structure of first kernel. These
         attributes are used to prevent dump processing if the sizes of
         pt_regs or cpu_mask structure differ between the first and fadump
         kernels.
      
      Note: if either first/crashed kernel or second/fadump kernel do not have
      the changes introduced here then kernel fail to collect the dump and
      prints relevant error message on the console.
      Signed-off-by: default avatarSourabh Jain <sourabhjain@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240422195932.1583833-2-sourabhjain@linux.ibm.com
      c6c5b14d
    • Shrikanth Hegde's avatar
      powerpc/pseries: Add failure related checks for h_get_mpp and h_get_ppp · 6d434163
      Shrikanth Hegde authored
      Couple of Minor fixes:
      
      - hcall return values are long. Fix that for h_get_mpp, h_get_ppp and
      parse_ppp_data
      
      - If hcall fails, values set should be at-least zero. It shouldn't be
      uninitialized values. Fix that for h_get_mpp and h_get_ppp
      Signed-off-by: default avatarShrikanth Hegde <sshegde@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240412092047.455483-3-sshegde@linux.ibm.com
      6d434163
    • Shrikanth Hegde's avatar
      powerpc/pseries: Add pool idle time at LPAR boot · 9c74ecfd
      Shrikanth Hegde authored
      When there are no options specified for lparstat, it is expected to
      give reports since LPAR(Logical Partition) boot.
      
      APP(Available Processor Pool) is an indicator of how many cores in the
      shared pool are free to use in Shared Processor LPAR(SPLPAR). APP is
      derived using pool_idle_time which is obtained using H_PIC call.
      
      The interval based reports show correct APP value while since boot
      report shows very high APP values. This happens because in that case APP
      is obtained by dividing pool idle time by LPAR uptime. Since pool idle
      time is reported by the PowerVM hypervisor since its boot, it need not
      align with LPAR boot.
      
      To fix that export boot pool idle time in lparcfg and powerpc-utils will
      use this info to derive APP as below for since boot reports.
      
      APP = (pool idle time - boot pool idle time) / (uptime * timebase)
      
      Results:: Observe APP values.
      ====================== Shared LPAR ================================
      lparstat
      System Configuration
      type=Shared mode=Uncapped smt=8 lcpu=12 mem=15573440 kB cpus=37 ent=12.00
      
      reboot
      stress-ng --cpu=$(nproc) -t 600
      sleep 600
      So in this case app is expected to close to 37-6=31.
      
      ====== 6.9-rc1 and lparstat 1.3.10  =============
      %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
      ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
      47.48  0.01  0.00    52.51     0.00  0.00 47.49 69099.72 541547    21
      
      === With this patch and powerpc-utils patch to do the above equation ===
      %user  %sys %wait    %idle    physc %entc lbusy   app  vcsw phint
      ----- ----- -----    -----    ----- ----- ----- ----- ----- -----
      47.48  0.01  0.00    52.51     5.73 47.75 47.49 31.21 541753    21
      =====================================================================
      
      Note: physc, purr/idle purr being inaccurate is being handled in a
      separate patch in powerpc-utils tree.
      Signed-off-by: default avatarShrikanth Hegde <sshegde@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://msgid.link/20240412092047.455483-2-sshegde@linux.ibm.com
      9c74ecfd
  4. 19 Apr, 2024 3 commits
  5. 18 Apr, 2024 1 commit
  6. 15 Apr, 2024 2 commits
  7. 08 Apr, 2024 2 commits
  8. 03 Apr, 2024 4 commits
  9. 31 Mar, 2024 11 commits