Commits · 20ce0c247b2500cb7060cb115274ba71abda2626 · Kirill Smelkov / linux

04 Jul, 2024 8 commits

powerpc/pci: Hotplug driver bridge support · 20ce0c24

Krishna Kumar authored Jul 01, 2024

There is an issue with the hotplug operation when it's done on the
bridge/switch slot. The bridge-port and devices behind the bridge, which
become offline by hot-unplug operation, don't get hot-plugged/enabled
by doing hot-plug operation on that slot. Only the first port of the
bridge gets enabled and the remaining port/devices remain unplugged. The
hot plug/unplug operation is done by the hotplug driver (drivers/pci/
hotplug/pnv_php.c).

This behavior is due to missing code for the switch/bridge. The existing
driver depends on pci_hp_add_devices() function for device enablement.
This function calls pci_scan_slot() on only one device-node/port of the
bridge, not on all the siblings' device-node/port.

The missing code needs to be added which will find all the sibling
device-nodes/bridge-ports and will run explicit pci_scan_slot()
on those. A new function has been added for this purpose
which is invoked from pci_hp_add_devices(). This new function
traverse_siblings_and_scan_slot() gets all the sibling bridge-ports by
traversal and explicitly invokes pci_scan_slot() on them.
Tested-by: Shawn Anastasio <sanastasio@raptorengineering.com>
Signed-off-by: Krishna Kumar <krishnak@linux.ibm.com>
[mpe: Move the code into pci-hotplug.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240701074513.94873-3-krishnak@linux.ibm.com

20ce0c24

pci/hotplug/pnv_php: Fix hotplug driver crash on Powernv · 335e35b7

Krishna Kumar authored Jul 01, 2024

The hotplug driver for powerpc (pci/hotplug/pnv_php.c) causes a kernel
crash when we try to hot-unplug/disable the PCIe switch/bridge from
the PHB.

The crash occurs because although the MSI data structure has been
released during disable/hot-unplug path and it has been assigned
with NULL, still during unregistration the code was again trying to
explicitly disable the MSI which causes the NULL pointer dereference and
kernel crash.

The patch fixes the check during unregistration path to prevent invoking
pci_disable_msi/msix() since its data structure is already freed.
Reported-by: Timothy Pearson <tpearson@raptorengineering.com>
Closes: https://lore.kernel.org/all/1981605666.2142272.1703742465927.JavaMail.zimbra@raptorengineeringinc.com/Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Anastasio <sanastasio@raptorengineering.com>
Signed-off-by: Krishna Kumar <krishnak@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240701074513.94873-2-krishnak@linux.ibm.com

335e35b7

powerpc/configs: Update defconfig with now user-visible CONFIG_FSL_IFC · 45547a0a

Esben Haabendal authored May 30, 2024

With CONFIG_FSL_IFC now being user-visible, and thus changed from a select
to depends in CONFIG_MTD_NAND_FSL_IFC, the dependencies needs to be
selected in defconfigs.

Depends-on: 9ba0cae3 ("memory: fsl_ifc: Make FSL_IFC config visible and selectable")
Signed-off-by: Esben Haabendal <esben@geanix.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240530-fsl-ifc-config-v3-2-1fd2c3d233dd@geanix.com

45547a0a

powerpc: add missing MODULE_DESCRIPTION() macros · 9c5f6473

Jeff Johnson authored Jun 15, 2024

With ARCH=powerpc, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/kernel/rtas_flash.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/sysdev/rtc_cmos_setup.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/pseries/papr_scm.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/spufs/spufs.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_thermal.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cpufreq_spudemand.o
WARNING: modpost: missing MODULE_DESCRIPTION() in arch/powerpc/platforms/cell/cbe_powerbutton.o

Add the missing invocation of the MODULE_DESCRIPTION() macro to all
files which have a MODULE_LICENSE().

This includes 85xx/t1042rdb_diu.c and chrp/nvram.c which, although
they did not produce a warning with the powerpc allmodconfig
configuration, may cause this warning with other configurations.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240615-md-powerpc-arch-powerpc-v1-1-ba4956bea47a@quicinc.com

9c5f6473

macintosh/mac_hid: add MODULE_DESCRIPTION() · 50b5fed9

Jeff Johnson authored May 09, 2024

Fix the make W=1 warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/macintosh/mac_hid.o
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
[mpe: Change the description to just "Mouse button 2+3 emulation"]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240509-mac_hid-md-v1-1-4091f1e4e4e0@quicinc.com

50b5fed9

powerpc/kexec: Use of_property_read_reg() · cf08b628

Rob Herring (Arm) authored Jul 02, 2024

Replace open-coded parsing of "reg" property with
of_property_read_reg().
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
[mpe: Rename end to size, add comment to the bail out case]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240702210344.722364-1-robh@kernel.org

cf08b628

powerpc/64s/radix/kfence: map __kfence_pool at page granularity · 353d7a84

Hari Bathini authored Jul 01, 2024

When KFENCE is enabled, total system memory is mapped at page level
granularity. But in radix MMU mode, ~3GB additional memory is needed
to map 100GB of system memory at page level granularity when compared
to using 2MB direct mapping.This is not desired considering KFENCE is
designed to be enabled in production kernels [1].

Mapping only the memory allocated for KFENCE pool at page granularity is
sufficient to enable KFENCE support. So, allocate __kfence_pool during
bootup and map it at page granularity instead of mapping all system
memory at page granularity.

Without patch:
  # cat /proc/meminfo
  MemTotal:       101201920 kB

With patch:
  # cat /proc/meminfo
  MemTotal:       104483904 kB

Note that enabling KFENCE at runtime is disabled for radix MMU for now,
as it depends on the ability to split page table mappings and such APIs
are not currently implemented for radix MMU.

All kfence_test.c testcases passed with this patch.

[1] https://lore.kernel.org/all/20201103175841.3495947-2-elver@google.com/Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240701130021.578240-1-hbathini@linux.ibm.com

353d7a84

powerpc/pseries/iommu: Define spapr_tce_table_group_ops only with CONFIG_IOMMU_API · af199e6c

Shivaprasad G Bhat authored Jul 04, 2024

The patch fixes the below warning:
  arch/powerpc/platforms/pseries/iommu.c:1824:37: warning:
  'spapr_tce_table_group_ops' defined but not used
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407020357.Hz8kQkKf-lkp@intel.com/
Fixes: b09c031d ("powerpc/iommu: Move pSeries specific functions to pseries/iommu.c")
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/172008854222.784.13666247605789409729.stgit@linux.ibm.com

af199e6c

01 Jul, 2024 1 commit

selftests/sigaltstack: Fix ppc64 GCC build · 17c743b9

Michael Ellerman authored May 20, 2024

Building the sigaltstack test with GCC on 64-bit powerpc errors with:

  gcc -Wall     sas.c  -o /home/michael/linux/.build/kselftest/sigaltstack/sas
  In file included from sas.c:23:
  current_stack_pointer.h:22:2: error: #error "implement current_stack_pointer equivalent"
     22 | #error "implement current_stack_pointer equivalent"
        |  ^~~~~
  sas.c: In function ‘my_usr1’:
  sas.c:50:13: error: ‘sp’ undeclared (first use in this function); did you mean ‘p’?
     50 |         if (sp < (unsigned long)sstack ||
        |             ^~

This happens because GCC doesn't define __ppc__ for 64-bit builds, only
32-bit builds. Instead use __powerpc__ to detect powerpc builds, which
is defined by clang and GCC for 64-bit and 32-bit builds.

Fixes: 05107edc ("selftests: sigaltstack: fix -Wuninitialized")
Cc: stable@vger.kernel.org # v6.3+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240520062647.688667-1-mpe@ellerman.id.au

17c743b9

28 Jun, 2024 16 commits

powerpc/prom: Add CPU info to hardware description string later · 7bdd1c6c

Nathan Lynch authored Jun 03, 2024

cur_cpu_spec->cpu_name is appended to ppc_hw_desc before cur_cpu_spec
has taken on its final value. This is illustrated on pseries by
comparing the CPU name as reported at boot ("POWER8E (raw)") to the
contents of /proc/cpuinfo ("POWER8 (architected)"):

  $ dmesg | grep Hardware
  Hardware name: IBM,8408-E8E POWER8E (raw) 0x4b0201 0xf000004 \
    of:IBM,FW860.50 (SV860_146) hv:phyp pSeries

  $ grep -m 1 ^cpu /proc/cpuinfo
  cpu             : POWER8 (architected), altivec supported

Some 44x models would appear to be affected as well; see
identical_pvr_fixup().

This results in incorrect CPU information in stack dumps --
ppc_hw_desc is an input to dump_stack_set_arch_desc().

Delay gathering the CPU name until after all potential calls to
identify_cpu().
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: bd649d40 ("powerpc: Add PVR & CPU name to hardware description")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240603-fix-cpu-hwdesc-v1-1-945f2850fcaa@linux.ibm.com

7bdd1c6c

powerpc/rtas: Prevent Spectre v1 gadget construction in sys_rtas() · 0974d03e

Nathan Lynch authored May 30, 2024

Smatch warns:

  arch/powerpc/kernel/rtas.c:1932 __do_sys_rtas() warn: potential
  spectre issue 'args.args' [r] (local cap)

The 'nargs' and 'nret' locals come directly from a user-supplied
buffer and are used as indexes into a small stack-based array and as
inputs to copy_to_user() after they are subject to bounds checks.

Use array_index_nospec() after the bounds checks to clamp these values
for speculative execution.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reported-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240530-sys_rtas-nargs-nret-v1-1-129acddd4d89@linux.ibm.com

0974d03e

powerpc/platforms: Move files from 4xx to 44x · d5d1a1a5

Christophe Leroy authored Jun 28, 2024

Only 44x uses 4xx now, so only keep one directory.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-7-mpe@ellerman.id.au

d5d1a1a5

powerpc: Replace CONFIG_4xx with CONFIG_44x · 7bf5f056

Michael Ellerman authored Jun 28, 2024

Replace 4xx usage with 44x, and replace 4xx_SOC with 44x.

Also, as pointed out by Christophe, if 44x || BOOKE can be simplified to
just test BOOKE, because 44x always selects BOOKE.

Retain the CONFIG_4xx symbol, as there are drivers that use it to mean
4xx || 44x, those will need updating before CONFIG_4xx can be removed.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-6-mpe@ellerman.id.au

7bf5f056

powerpc/4xx: Remove CONFIG_BOOKE_OR_40x · 002b27a5

Michael Ellerman authored Jun 28, 2024

Now that 40x is gone, replace CONFIG_BOOKE_OR_40x by CONFIG_BOOKE.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-5-mpe@ellerman.id.au

002b27a5

powerpc: Remove core support for 40x · 732b32da

Christophe Leroy authored Jun 28, 2024

Now that 40x platforms have gone, remove support
for 40x in the core of powerpc arch.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-4-mpe@ellerman.id.au

732b32da

powerpc: Remove 40x from Kconfig and defconfig · e939da89

Michael Ellerman authored Jun 28, 2024

Remove 40x from Kconfig, making the code unreachable.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-3-mpe@ellerman.id.au

e939da89

powerpc/boot: Remove all 40x platforms from boot · 839ff58e

Christophe Leroy authored Jun 28, 2024

Remove 40x platforms from the boot directory.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-2-mpe@ellerman.id.au

839ff58e

powerpc/40x: Remove 40x platforms. · 47d13a26

Christophe Leroy authored Jun 28, 2024

40x platforms have been orphaned for many years.

Remove them.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240628121201.130802-1-mpe@ellerman.id.au

47d13a26

macintosh: Drop explicit initialization of struct i2c_device_id::driver_data to 0 · 38767dde

Uwe Kleine-König authored Jun 24, 2024

These drivers don't use the driver_data member of struct i2c_device_id,
so don't explicitly initialize this member.

This prepares putting driver_data in an anonymous union which requires
either no initialization or named designators. But it's also a nice
cleanup on its own.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240624132433.1244750-2-u.kleine-koenig@baylibre.com

38767dde

powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries · f431a8cd

Shivaprasad G Bhat authored Jun 24, 2024

PPC64 IOMMU API defines iommu_table_group_ops which handles DMA
windows for PEs, their ownership transfer, create/set/unset the TCE
tables for the Dynamic DMA wundows(DDW). VFIOS uses these APIs for
support on POWER.

The commit 9d67c943 ("powerpc/iommu: Add "borrowing"
iommu_table_group_ops") implemented partial support for this API with
"borrow" mechanism wherein the DMA windows if created already by the
host driver, they would be available for VFIO to use. Also, it didn't
have the support to control/modify the window size or the IO page
size.

The current patch implements all the necessary iommu_table_group_ops
APIs there by avoiding the "borrrowing". So, just the way it is on the
PowerNV platform, with this patch the iommu table group ownership is
transferred to the VFIO PPC subdriver, the iommu table, DMA windows
creation/deletion all driven through the APIs.

The pSeries uses the query-pe-dma-window, create-pe-dma-window and
reset-pe-dma-window RTAS calls for DMA window creation, deletion and
reset to defaul. The RTAs calls do show some minor differences to the
way things are to be handled on the pSeries which are listed below.

* On pSeries, the default DMA window size is "fixed" cannot be custom
sized as requested by the user. For non-SRIOV VFs, It is fixed at 2GB
and for SRIOV VFs, its variable sized based on the capacity assigned
to it during the VF assignment to the LPAR. So, for the  default DMA
window alone the size if requested less than tce32_size, the smaller
size is enforced using the iommu table->it_size.

* The DMA start address for 32-bit window is 0, and for the 64-bit
window in case of PowerNV is hardcoded to TVE select (bit 59) at 512PiB
offset. This address is returned at the time of create_table() API call
(even before the window is created), the subsequent set_window() call
actually opens the DMA window. On pSeries, the DMA start address for
32-bit window is known from the 'ibm,dma-window' DT property. However,
the 64-bit window start address is not known until the create-pe-dma
RTAS call is made. So, the create_table() which returns the DMA window
start address actually opens the DMA window and returns the DMA start
address as returned by the Hypervisor for the create-pe-dma RTAS call.

* The reset-pe-dma RTAS call resets the DMA windows and restores the
default DMA window, however it does not clear the TCE table entries
if there are any. In case of ownership transfer from platform domain
which used direct mapping, the patch chooses remove-pe-dma instead of
reset-pe for the 64-bit window intentionally so that the
clear_dma_window() is called.

Other than the DMA window management changes mentioned above, the
patch also brings back the userspace view for the single level TCE
as it existed before commit 090bad39 ("powerpc/powernv: Add
indirect levels to it_userspace") along with the relavent
refactoring.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923275958.1397.907964437142542242.stgit@linux.ibm.com

f431a8cd

powerpc/iommu: Move dev_has_iommu_table() to iommu.c · 35146ead

Shivaprasad G Bhat authored Jun 24, 2024

Move function dev_has_iommu_table() to powerpc/kernel/iommu.c
as it is going to be used by machine specific iommu code as
well in subsequent patches.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923274748.1397.6274953248403106679.stgit@linux.ibm.com

35146ead

vfio/spapr: Always clear TCEs before unsetting the window · 4ba2fdff

Shivaprasad G Bhat authored Jun 24, 2024

The PAPR expects the TCE table to have no entries at the time of
unset window(i.e. remove-pe). The TCE clear right now is done
before freeing the iommu table. On pSeries, the unset window
makes those entries inaccessible to the OS and the H_PUT/GET calls
fail on them with H_CONSTRAINED.

On PowerNV, this has no side effect as the TCE clear can be done
before the DMA window removal as well.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923273535.1397.1236742071894414895.stgit@linux.ibm.com

4ba2fdff

powerpc/pseries/iommu: Use the iommu table[0] for IOV VF's DDW · aed6e494

Shivaprasad G Bhat authored Jun 24, 2024

This patch basically brings consistency with PowerNV approach
to use the first freely available iommu table when the default
window is removed.

The pSeries iommu code convention has been that the table[0] is
for the default 32 bit DMA window and the table[1] is for the
64 bit DDW.

With VFs having only 1 DMA window, the default has to be removed
for creating the larger DMA window. The existing code uses the
table[1] for that, while marking the table[0] as NULL. This is
fine as long as the host driver itself uses the device.

For the VFIO user, on pSeries there is no way to skip table[0]
as the VFIO subdriver uses the first freely available table.
The window 0, when created as 64-bit DDW in that context would
still be on table[0], as the maximum number of windows is 1.

This won't have any impact for the host driver as the table is
fetched from the device's iommu_table_base.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
[mpe: Rebase and resolve conflicts]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923272328.1397.1817843961216868850.stgit@linux.ibm.com

aed6e494

powerpc/pseries/iommu: Fix the VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl output · 6af67f2d

Shivaprasad G Bhat authored Jun 24, 2024

The ioctl VFIO_IOMMU_SPAPR_TCE_GET_INFO is not reporting the
actuals on the platform as not all the details are correctly
collected during the platform probe/scan into the iommu_table_group.

Collect the information during the device setup time as the DMA
window property is already looked up on parent nodes anyway.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923271138.1397.7908302630061814623.stgit@linux.ibm.com

6af67f2d

powerpc/iommu: Move pSeries specific functions to pseries/iommu.c · b09c031d

Shivaprasad G Bhat authored Jun 24, 2024

The PowerNV specific table_group_ops are defined in powernv/pci-ioda.c.
The pSeries specific table_group_ops are sitting in the generic powerpc
file. Move it to where it actually belong(pseries/iommu.c).

The functions are currently defined even for CONFIG_PPC_POWERNV
which are unused on PowerNV.

Only code movement, no functional changes intended.
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/171923269701.1397.15758640002786937132.stgit@linux.ibm.com

b09c031d

17 Jun, 2024 3 commits

powerpc/kexec_file: fix cpus node update to FDT · 932bed41

Sourabh Jain authored May 10, 2024

While updating the cpus node, commit 40c75399 ("powerpc/kexec_file:
Use current CPU info while setting up FDT") first deletes all subnodes
under the /cpus node. However, while adding sub-nodes back, it missed
adding cpus subnodes whose device_type != "cpu", such as l2-cache*,
l3-cache*, ibm,powerpc-cpu-features.

Fix this by only deleting cpus sub-nodes of device_type == "cpus" and
then adding all available nodes with device_type == "cpu".

Fixes: 40c75399 ("powerpc/kexec_file: Use current CPU info while setting up FDT")
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240510102235.2269496-3-sourabhjain@linux.ibm.com

932bed41

powerpc/kexec_file: fix extra size calculation for kexec FDT · 0d3ff067

Sourabh Jain authored May 10, 2024

While setting up the FDT for kexec, CPU nodes that are added after the
system boots and reserved memory ranges are incorporated into the
initial_boot_params (base FDT).

However, they are not taken into account when determining the additional
size needed for the kexec FDT. As a result, kexec fails to load,
generating the following error:

[1116.774451] Error updating memory reserve map: FDT_ERR_NOSPACE
kexec_file_load failed: No such process

Therefore, consider the extra size for CPU nodes added post-system boot
and reserved memory ranges while preparing the kexec FDT.

While adding a new parameter to the setup_new_fdt_ppc64 function, it was
noticed that there were a couple of unused parameters, so they were
removed.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240510102235.2269496-2-sourabhjain@linux.ibm.com

0d3ff067

powerpc/perf: Set cpumode flags using sample address · 0300a92e

Anjali K authored May 28, 2024

Currently in some cases, when the sampled instruction address register
latches to a specific address during sampling, the privilege bits
captured in the sampled event register are incorrect.

For example, a snippet from the perf report on a power10 system is:

Overhead Address Command Shared Object Symbol
........ .................. ............ ................. .......................
2.41% 0x7fff9f94a02c null_syscall [unknown] [k] 0x00007fff9f94a02c
2.20% 0x7fff9f94a02c null_syscall libc.so.6 [.] syscall

perf_get_misc_flags() function looks at the privilege bits to return
the corresponding flags to be used for the address symbol and these
privilege bit details are read from the sampled event register. In the
above snippet, address "0x00007fff9f94a02c" is shown as "k" (kernel) due
to the incorrect privilege bits captured in the sampled event register.

To address this case check whether the sampled address is in the kernel
area. Since this is specific to the latest platform, a new pmu flag
is added called "PPMU_P10" and is used to contain the proposed fix.
PPMU_P10_DD1 marked events are also included under PPMU_P10, hence
remove the code specific to PPMU_P10_DD1 marked events.
Signed-off-by: Anjali K <anjalik@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com <mailto:atrajeev@linux.vnet.ibm.com>>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240528040356.2722275-1-anjalik@linux.ibm.com

0300a92e

11 Jun, 2024 1 commit

powerpc/configs: drop RT_GROUP_SCHED=y from ppc6xx_defconfig · 2bac6cae

Celeste Liu authored May 30, 2024

Commit 6e5f1537 ("powerpc: Add a 6xx defconfig") said it was copied
from fedore's ppc32 defconfig, but at least since 2015-06-10, Fedora has
dropped this option.[1]

For cgroup v1, if turned on, and there's any cgroup in the "cpu" hierarchy it
needs an RT budget assigned, otherwise the processes in it will not be able to
get RT at all. The problem with RT group scheduling is that it requires the
budget assigned but there's no way we could assign a default budget, since the
values to assign are both upper and lower time limits, are absolute, and need to
be sum up to < 1 for each individal cgroup. That means we cannot really come up
with values that would work by default in the general case.[1]

For cgroup v2, it's almost unusable as well. If it turned on, the cpu controller
can only be enabled when all RT processes are in the root cgroup. But it will
lose the benefits of cgroup v2 if all RT process were placed in the same cgroup.

systemd also doesn't support it.[2]

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1229700
[2]: https://github.com/systemd/systemd/issues/13781#issuecomment-549164383Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240530111947.549474-12-CoelacanthusHex@gmail.com

2bac6cae

04 Jun, 2024 4 commits

powerpc/mm/drmem: Silence drmem_init() early return · 11e6e6d8

Nathan Lynch authored Jun 03, 2024

It's not an error or noteworthy condition if the
"ibm,dynamic-reconfiguration-memory" node isn't present.

Drop the needless message.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240603-silence-drmem_init-v1-1-e9d71646bc3d@linux.ibm.com

11e6e6d8

powerpc/pseries/vas: Use usleep_range() to support HCALL delay · 43ac9f5c

Haren Myneni authored Jan 15, 2024

VAS allocate, modify and deallocate HCALLs returns
H_LONG_BUSY_ORDER_1_MSEC or H_LONG_BUSY_ORDER_10_MSEC for busy
delay and expects OS to reissue HCALL after that delay. But using
msleep() will often sleep at least 20 msecs even though the
hypervisor suggests OS reissue these HCALLs after 1 or 10msecs.

The open and close VAS window functions hold mutex and then issue
these HCALLs. So these operations can take longer than the
necessary when multiple threads issue open or close window APIs
simultaneously, especially might affect the performance in the
case of repeat open/close APIs for each compression request.

Multiple tasks can open / close VAS windows at the same time
which depends on the available VAS credits. For example, 240
cores system provides 4800 VAS credits. It means 4800 tasks can
execute open VAS windows HCALLs with the mutex. Since each
msleep() will often sleep more than 20 msecs, some tasks are
waiting more than 120 secs to acquire mutex. It can cause hung
traces for these tasks in dmesg due to mutex contention around
open/close HCALLs.

Instead of msleep(), use usleep_range() to ensure sleep with
the expected value before issuing HCALL again. So since each
task sleep 10 msecs maximum, this patch allow more tasks can
issue open/close VAS calls without any hung traces in the
dmesg.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Suggested-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240116055910.421605-1-haren@linux.ibm.com

43ac9f5c

powerpc/numa: Online a node if PHB is attached. · 11981816

Nilay Shroff authored May 17, 2024

In the current design, a numa-node is made online only if that node is
attached to cpu/memory. With this design, if any PCI/IO device is found
to be attached to a numa-node which is not online then the numa-node
id of the corresponding PCI/IO device is set to NUMA_NO_NODE(-1). This
design may negatively impact the performance of PCIe device if the
numa-node assigned to PCIe device is -1 because in such case we may not
be able to accurately calculate the distance between two nodes.

The multi-controller NVMe PCIe disk has an issue with calculating the
node distance if the PCIe NVMe controller is attached to a PCI host
bridge which has numa-node id value set to NUMA_NO_NODE. This patch
helps fix this ensuring that a cpu/memory less numa node is made online
if it's attached to PCI host bridge.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240517142531.3273464-3-nilay@linux.ibm.com

11981816

powerpc/pseries/iommu: Split Dynamic DMA Window to be used in Hybrid mode · ff5163bb

Gaurav Batra authored May 13, 2024

Dynamic DMA Window (DDW) supports TCEs that are backed by 2MB page
size. In most configurations, DDW is big enough to pre-map all of LPAR
memory for IO. Pre-mapping of memory for DMA results in improvements in
IO performance.

Persistent memory, vPMEM, can be assigned to an LPAR as well. vPMEM is
not contiguous with LPAR memory and usually is assigned at high memory
addresses.  This makes is not possible to pre-map both vPMEM and LPAR
memory in the same DDW.

For a dedicated adapter this limitation is not an issue. Dedicated
adapters can have both Default DMA window, which is backed by 4K page
size and a DDW backed by 2MB page size TCEs. In this scenario, LPAR
memory is pre-mapped in the DDW.  Any DMA going to the vPMEM is routed
via dynamically allocated TCEs in the default window.

The issue arises with SR-IOV adapters. There is only one DMA window -
either Default or DDW. If an LPAR has vPMEM assigned, memory is not
pre-mapped in the DDW since TCEs needs to be allocated for vPMEM as well.
In this case, DDW is created and TCEs are dynamically allocated for both
vPMEM and LPAR memory.

Today, DDW is only used in single mode - direct mapped TCEs or
dynamically mapped TCEs. This enhancement breaks a single DDW in 2
regions -

	1. First region to pre-map LPAR memory
	2. Second region to dynamically allocate TCEs for IO to vPMEM

The DDW is split only if it is big enough to pre-map complete LPAR
memory and still have some space left to dynamically map vPMEM. Maximum
size possible DDW is created as permitted by the Hypervisor.
Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240514014608.35537-1-gbatra@linux.ibm.com

ff5163bb

03 Jun, 2024 1 commit

powerpc/pseries: Remove unused cede related functions · 214f33fc

Gautam Menghani authored May 14, 2024

Remove extended_cede_processor() and its helpers as
extended_cede_processor() has no callers since
commit 48f6e7f6("powerpc/pseries: remove cede offline state for CPUs")
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
Acked-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240514132457.292865-1-gautam@linux.ibm.com

214f33fc

02 Jun, 2024 6 commits

Linux 6.10-rc2 · c3f38fa6
Linus Torvalds authored Jun 02, 2024

c3f38fa6

Merge tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux · 58d89ee8

Linus Torvalds authored Jun 02, 2024

Pull ata fixes from Niklas Cassel:

 - Add a quirk for three different devices that have shown issues with
   LPM (link power management). These devices appear to not implement
   LPM properly, since we see command timeouts when enabling LPM. The
   quirk disables LPM for these problematic devices. (Me)

 - Do not apply the Intel PCS quirk on Alder Lake. The quirk is not
   needed and was originally added by mistake when LPM support was
   enabled for this AHCI controller. Enabling the quirk when not needed
   causes the the controller to not be able to detect the connected
   devices on some platforms.

* tag 'ata-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340
  ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD
  ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1
  ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake

58d89ee8

Merge tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a693b9c9

Linus Torvalds authored Jun 02, 2024

Pull x86 fixes from Ingo Molnar:
 "Miscellaneous topology parsing fixes:

   - Fix topology parsing regression on older CPUs in the new AMD/Hygon
     parser

   - Fix boot crash on odd Intel Quark and similar CPUs that do not fill
     out cpuinfo_x86::x86_clflush_size and zero out
     cpuinfo_x86::x86_cache_alignment as a result.

     Provide 32 bytes as a general fallback value.

   - Fix topology enumeration on certain rare CPUs where the BIOS locks
     certain CPUID leaves and the kernel unlocked them late, which broke
     with the new topology parsing code. Factor out this unlocking logic
     and move it earlier in the parsing sequence"

* tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/topology/intel: Unlock CPUID before evaluating anything
  x86/cpu: Provide default cache line size if not enumerated
  x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater

a693b9c9

Merge tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3fca58ff

Linus Torvalds authored Jun 02, 2024

Pull scheduler fix from Ingo Molnar:
 "Export a symbol to make life easier for instrumentation/debugging"

* tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/x86: Export 'percpu arch_freq_scale'

3fca58ff

Merge tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · efa8f11a

Linus Torvalds authored Jun 02, 2024

Pull perf events fix from Ingo Molnar:
 "Add missing MODULE_DESCRIPTION() lines"

* tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add missing MODULE_DESCRIPTION() lines
  perf/x86/rapl: Add missing MODULE_DESCRIPTION() line

efa8f11a

Merge tag 'hardening-v6.10-rc2-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 00a8c352

Linus Torvalds authored Jun 02, 2024

Pull hardening fixes from Kees Cook:

 - scsi: mpt3sas: Avoid possible run-time warning with long manufacturer
   strings

 - mailmap: update entry for Kees Cook

 - kunit/fortify: Remove __kmalloc_node() test

* tag 'hardening-v6.10-rc2-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kunit/fortify: Remove __kmalloc_node() test
  mailmap: update entry for Kees Cook
  scsi: mpt3sas: Avoid possible run-time warning with long manufacturer strings

00a8c352