1. 10 Aug, 2021 25 commits
  2. 04 Aug, 2021 8 commits
    • Nicholas Piggin's avatar
      powerpc/64s/perf: Always use SIAR for kernel interrupts · cf9c615c
      Nicholas Piggin authored
      If an interrupt is taken in kernel mode, always use SIAR for it rather than
      looking at regs_sipr. This prevents samples piling up around interrupt
      enable (hard enable or interrupt replay via soft enable) in PMUs / modes
      where the PR sample indication is not in synch with SIAR.
      
      This results in better sampling of interrupt entry and exit in particular.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210720141504.420110-1-npiggin@gmail.com
      cf9c615c
    • Parth Shah's avatar
      powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings · e9ef81e1
      Parth Shah authored
      On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus
      in thread-group share both L2 and L3 caches. Hence, use cache_property = 2
      itself to find both the L2 and L3 cache siblings.
      Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings,
      but fill the mask using same property "2" array.
      Signed-off-by: default avatarParth Shah <parth@linux.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210728175607.591679-4-parth@linux.ibm.com
      e9ef81e1
    • Gautham R. Shenoy's avatar
      powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() · 69aa8e07
      Gautham R. Shenoy authored
      The helper function get_shared_cpu_map() was added in
      
      'commit 500fe5f5 ("powerpc/cacheinfo: Report the correct
      shared_cpu_map on big-cores")'
      
      and subsequently expanded upon in
      
      'commit 0be47634 ("powerpc/cacheinfo: Print correct cache-sibling
      map/list for L2 cache")'
      
      in order to help report the correct groups of threads sharing these caches
      on big-core systems where groups of threads within a core can share
      different sets of caches.
      
      Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property,
      cache->shared_cpu_map contains the correct set of thread-siblings
      sharing the cache. Hence we no longer need the functions
      get_shared_cpu_map(). This patch removes this function. We also remove
      the helper function index_dir_to_cpu() which was only called by
      get_shared_cpu_map().
      
      With these functions removed, we can still see the correct
      cache-sibling map/list for L1 and L2 caches on systems with L1 and L2
      caches distributed among groups of threads in a core.
      
      With this patch, on a SMT8 POWER10 system where the L1 and L2 caches
      are split between the two groups of threads in a core, for CPUs 8,9,
      the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as
      follows:
      
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15
      /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
      /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
      /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
      /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15
      
      $ ppc64_cpu --smt=4
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11
      /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
      /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
      /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
      /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11
      
      $ ppc64_cpu --smt=2
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9
      /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
      /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
      /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
      /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9
      
      $ ppc64_cpu --smt=1
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210728175607.591679-3-parth@linux.ibm.com
      69aa8e07
    • Gautham R. Shenoy's avatar
      powerpc/cacheinfo: Lookup cache by dt node and thread-group id · a4bec516
      Gautham R. Shenoy authored
      Currently the cacheinfo code on powerpc indexes the "cache" objects
      (modelling the L1/L2/L3 caches) where the key is device-tree node
      corresponding to that cache. On some of the POWER server platforms
      thread-groups within the core share different sets of caches (Eg: On
      SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and
      threads 1,3,5,7 of the same core share another L1 cache). On such
      platforms, there is a single device-tree node corresponding to that
      cache and the cache-configuration within the threads of the core is
      indicated via "ibm,thread-groups" device-tree property.
      
      Since the current code is not aware of the "ibm,thread-groups"
      property, on the aforementoined systems, cacheinfo code still treats
      all the threads in the core to be sharing the cache because of the
      single device-tree node (In the earlier example, the cacheinfo code
      would says CPUs 0-7 share L1 cache).
      
      In this patch, we make the powerpc cacheinfo code aware of the
      "ibm,thread-groups" property. We indexe the "cache" objects by the
      key-pair (device-tree node, thread-group id). For any CPUX, for a
      given level of cache, the thread-group id is defined to be the first
      CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels
      of cache which are not represented in "ibm,thread-groups" property,
      the thread-group id is -1.
      
      [parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map"
      and "thread_group_l2_cache_map" to get rid of the compile error.]
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarParth Shah <parth@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210728175607.591679-2-parth@linux.ibm.com
      a4bec516
    • Masahiro Yamada's avatar
      powerpc: move the install rule to arch/powerpc/Makefile · 86ff0bce
      Masahiro Yamada authored
      Currently, the install target in arch/powerpc/Makefile descends into
      arch/powerpc/boot/Makefile to invoke the shell script, but there is no
      good reason to do so.
      
      arch/powerpc/Makefile can run the shell script directly.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729141937.445051-3-masahiroy@kernel.org
      86ff0bce
    • Masahiro Yamada's avatar
      powerpc: make the install target not depend on any build artifact · 9bef456b
      Masahiro Yamada authored
      The install target should not depend on any build artifact.
      
      The reason is explained in commit 19514fc6 ("arm, kbuild: make
      "make install" not depend on vmlinux").
      
      Change the PowerPC installation code in a similar way.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729141937.445051-2-masahiroy@kernel.org
      9bef456b
    • Masahiro Yamada's avatar
      powerpc: remove unused zInstall target from arch/powerpc/boot/Makefile · 156ca4e6
      Masahiro Yamada authored
      Commit c913e5f9 ("powerpc/boot: Don't install zImage.* from make
      install") added the zInstall target to arch/powerpc/boot/Makefile,
      but you cannot use it since the corresponding hook is missing in
      arch/powerpc/Makefile.
      
      It has never worked since its addition. Nobody has complained about
      it for 7 years, which means this code was unneeded.
      
      With this removal, the install.sh will be passed in with 4 parameters.
      Simplify the shell script.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729141937.445051-1-masahiroy@kernel.org
      156ca4e6
    • Nathan Chancellor's avatar
      cpuidle: pseries: Mark pseries_idle_proble() as __init · d04691d3
      Nathan Chancellor authored
      After commit 7cbd631d4dec ("cpuidle: pseries: Fixup CEDE0 latency only
      for POWER10 onwards"), pseries_idle_probe() is no longer inlined when
      compiling with clang, which causes a modpost warning:
      
      WARNING: modpost: vmlinux.o(.text+0xc86a54): Section mismatch in
      reference from the function pseries_idle_probe() to the function
      .init.text:fixup_cede0_latency()
      The function pseries_idle_probe() references
      the function __init fixup_cede0_latency().
      This is often because pseries_idle_probe lacks a __init
      annotation or the annotation of fixup_cede0_latency is wrong.
      
      pseries_idle_probe() is a non-init function, which calls
      fixup_cede0_latency(), which is an init function, explaining the
      mismatch. pseries_idle_probe() is only called from
      pseries_processor_idle_init(), which is an init function, so mark
      pseries_idle_probe() as __init so there is no more warning.
      
      Fixes: 054e44ba ("cpuidle: pseries: Add function to parse extended CEDE records")
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210803211547.1093820-1-nathan@kernel.org
      d04691d3
  3. 03 Aug, 2021 3 commits
    • Michal Suchanek's avatar
      powerpc/stacktrace: Include linux/delay.h · a6cae77f
      Michal Suchanek authored
      commit 7c6986ad ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()")
      introduces udelay() call without including the linux/delay.h header.
      This may happen to work on master but the header that declares the
      functionshould be included nonetheless.
      
      Fixes: 7c6986ad ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()")
      Signed-off-by: default avatarMichal Suchanek <msuchanek@suse.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729180103.15578-1-msuchanek@suse.de
      a6cae77f
    • Gautham R. Shenoy's avatar
      cpuidle: pseries: Do not cap the CEDE0 latency in fixup_cede0_latency() · 71737a6c
      Gautham R. Shenoy authored
      Currently in fixup_cede0_latency() code, we perform the fixup the
      CEDE(0) exit latency value only if minimum advertized extended CEDE
      latency values are less than 10us. This was done so as to not break
      the expected behaviour on POWER8 platforms where the advertised
      latency was higher than the default 10us, which would delay the SMT
      folding on the core.
      
      However, after the earlier patch "cpuidle/pseries: Fixup CEDE0 latency
      only for POWER10 onwards", we can be sure that the fixup of CEDE0
      latency is going to happen only from POWER10 onwards. Hence
      unconditionally use the minimum exit latency provided by the platform.
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1626676399-15975-3-git-send-email-ego@linux.vnet.ibm.com
      71737a6c
    • Gautham R. Shenoy's avatar
      cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards · 50741b70
      Gautham R. Shenoy authored
      Commit d947fb4c ("cpuidle: pseries: Fixup exit latency for
      CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
      of the Extended CEDE states advertised by the platform
      
      On POWER9 LPARs, the firmwares advertise a very low value of 2us for
      CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the
      PHYP hypervisor corresponds to the latency required to wakeup from the
      underlying hardware idle state. However the wakeup latency from the
      LPAR perspective should include
      
      1. The time taken to transition the CPU from the Hypervisor into the
         LPAR post wakeup from platform idle state
      
      2. Time taken to send the IPI from the source CPU (waker) to the idle
         target CPU (wakee).
      
      1. can be measured via timer idle test, where we queue a timer, say
      for 1ms, and enter the CEDE state. When the timer fires, in the timer
      handler we compute how much extra timer over the expected 1ms have we
      consumed. On a a POWER9 LPAR the numbers are
      
      CEDE latency measured using a timer (numbers in ns)
      N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
      400     2601     5677     5668.74    5917    6413     9299   455.01
      
      1. and 2. combined can be determined by an IPI latency test where we
      send an IPI to an idle CPU and in the handler compute the time
      difference between when the IPI was sent and when the handler ran. We
      see the following numbers on POWER9 LPAR.
      
      CEDE latency measured using an IPI (numbers in ns)
      N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
      400     711      7564     7369.43   8559    9514      9698   1200.01
      
      Suppose, we consider the 99th percentile latency value measured using
      the IPI to be the wakeup latency, the value would be 9.5us This is in
      the ballpark of the default value of 10us.
      
      Hence, use the exit latency of CEDE(0) based on the latency values
      advertized by platform only from POWER10 onwards. The values
      advertized on POWER10 platforms is more realistic and informed by the
      latency measurements. For earlier platforms stick to the default value
      of 10us. The fix was suggested by Michael Ellerman.
      
      Fixes: d947fb4c ("cpuidle: pseries: Fixup exit latency for CEDE(0)")
      Reported-by: default avatarEnrico Joedecke <joedecke@de.ibm.com>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1626676399-15975-2-git-send-email-ego@linux.vnet.ibm.com
      50741b70
  4. 26 Jul, 2021 2 commits
  5. 23 Jul, 2021 2 commits
    • Nicholas Piggin's avatar
      KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state · d9c57d3e
      Nicholas Piggin authored
      The H_ENTER_NESTED hypercall is handled by the L0, and it is a request
      by the L1 to switch the context of the vCPU over to that of its L2
      guest, and return with an interrupt indication. The L1 is responsible
      for switching some registers to guest context, and the L0 switches
      others (including all the hypervisor privileged state).
      
      If the L2 MSR has TM active, then the L1 is responsible for
      recheckpointing the L2 TM state. Then the L1 exits to L0 via the
      H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit,
      and then it recheckpoints the TM state as part of the nested entry and
      finally HRFIDs into the L2 with TM active MSR. Not efficient, but about
      the simplest approach for something that's horrendously complicated.
      
      Problems arise if the L1 exits to the L0 with a TM state which does not
      match the L2 TM state being requested. For example if the L1 is
      transactional but the L2 MSR is non-transactional, or vice versa. The
      L0's HRFID can take a TM Bad Thing interrupt and crash.
      
      Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then
      ensuring that if the L1 is suspended then the L2 must have TM active,
      and if the L1 is not suspended then the L2 must not have TM active.
      
      Fixes: 360cae31 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall")
      Cc: stable@vger.kernel.org # v4.20+
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d9c57d3e
    • Nicholas Piggin's avatar
      KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow · f62f3c20
      Nicholas Piggin authored
      The kvmppc_rtas_hcall() sets the host rtas_args.rets pointer based on
      the rtas_args.nargs that was provided by the guest. That guest nargs
      value is not range checked, so the guest can cause the host rets pointer
      to be pointed outside the args array. The individual rtas function
      handlers check the nargs and nrets values to ensure they are correct,
      but if they are not, the handlers store a -3 (0xfffffffd) failure
      indication in rets[0] which corrupts host memory.
      
      Fix this by testing up front whether the guest supplied nargs and nret
      would exceed the array size, and fail the hcall directly without storing
      a failure indication to rets[0].
      
      Also expand on a comment about why we kill the guest and try not to
      return errors directly if we have a valid rets[0] pointer.
      
      Fixes: 8e591cb7 ("KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls")
      Cc: stable@vger.kernel.org # v3.10+
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f62f3c20