1. 10 Aug, 2021 8 commits
    • Jordan Niethe's avatar
      powerpc: Always inline radix_enabled() to fix build failure · 27fd1111
      Jordan Niethe authored
      This is the same as commit acdad8fb ("powerpc: Force inlining of
      mmu_has_feature to fix build failure") but for radix_enabled().  The
      config in the linked bugzilla causes the following build failure:
      
        LD      .tmp_vmlinux.kallsyms1
        powerpc64-linux-ld: arch/powerpc/mm/pgtable.o: in function `.__ptep_set_access_flags':
        pgtable.c:(.text+0x17c): undefined reference to `.radix__ptep_set_access_flags'
        powerpc64-linux-ld: arch/powerpc/mm/pageattr.o: in function `.change_page_attr':
        pageattr.c:(.text+0xc0): undefined reference to `.radix__flush_tlb_kernel_range'
        etc.
      
      This is due to radix_enabled() not being inlined. See extract from
      building with -Winline:
      
        In file included from arch/powerpc/include/asm/lppaca.h:46,
                         from arch/powerpc/include/asm/paca.h:17,
                         from arch/powerpc/include/asm/current.h:13,
                         from include/linux/thread_info.h:23,
                         from include/asm-generic/preempt.h:5,
                         from ./arch/powerpc/include/generated/asm/preempt.h:1,
                         from include/linux/preempt.h:78,
                         from include/linux/spinlock.h:51,
                         from include/linux/mmzone.h:8,
                         from include/linux/gfp.h:6,
                         from arch/powerpc/mm/pgtable.c:21:
        arch/powerpc/include/asm/book3s/64/pgtable.h: In function '__ptep_set_access_flags':
        arch/powerpc/include/asm/mmu.h:327:20: error: inlining failed in call to 'radix_enabled': call is unlikely and code size would grow [-Werror=inline]
      
      The code relies on constant folding of MMU_FTRS_POSSIBLE at buildtime
      and elimination of non possible parts of code at compile time. For this
      to work radix_enabled() must be inlined so make it __always_inline.
      Reported-by: default avatarErhard F. <erhard_f@mailbox.org>
      Suggested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarJordan Niethe <jniethe5@gmail.com>
      [mpe: Trimmed error messages in change log]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=213803
      Link: https://lore.kernel.org/r/20210804013724.514468-1-jniethe5@gmail.com
      27fd1111
    • Sebastian Andrzej Siewior's avatar
      powerpc: Replace deprecated CPU-hotplug functions. · 5ae36401
      Sebastian Andrzej Siewior authored
      The functions get_online_cpus() and put_online_cpus() have been
      deprecated during the CPU hotplug rework. They map directly to
      cpus_read_lock() and cpus_read_unlock().
      
      Replace deprecated CPU-hotplug functions with the official version.
      The behavior remains unchanged.
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210803141621.780504-4-bigeasy@linutronix.de
      5ae36401
    • kernel test robot's avatar
      powerpc/kexec: fix for_each_child.cocci warning · c00103ab
      kernel test robot authored
      for_each_node_by_type should have of_node_put() before return.
      
      Generated by: scripts/coccinelle/iterators/for_each_child.cocci
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2108031654080.17639@hadrien
      c00103ab
    • Laurent Dufour's avatar
      powerpc/pseries: Prevent free CPU ids being reused on another node · bd1dd4c5
      Laurent Dufour authored
      When a CPU is hot added, the CPU ids are taken from the available mask
      from the lower possible set. If that set of values was previously used
      for a CPU attached to a different node, it appears to an application as
      if these CPUs have migrated from one node to another node which is not
      expected.
      
      To prevent this, it is needed to record the CPU ids used for each node
      and to not reuse them on another node. However, to prevent CPU hot plug
      to fail, in the case the CPU ids is starved on a node, the capability to
      reuse other nodes’ free CPU ids is kept. A warning is displayed in such
      a case to warn the user.
      
      A new CPU bit mask (node_recorded_ids_map) is introduced for each
      possible node. It is populated with the CPU onlined at boot time, and
      then when a CPU is hot plugged to a node. The bits in that mask remain
      when the CPU is hot unplugged, to remind this CPU ids have been used for
      this node.
      
      If no id set was found, a retry is made without removing the ids used on
      the other nodes to try reusing them. This is the way ids have been
      allocated prior to this patch.
      
      The effect of this patch can be seen by removing and adding CPUs using
      the Qemu monitor. In the following case, the first CPU from the node 2
      is removed, then the first one from the node 1 is removed too. Later,
      the first CPU of the node 2 is added back. Without that patch, the
      kernel will number these CPUs using the first CPU ids available which
      are the ones freed when removing the second CPU of the node 0. This
      leads to the CPU ids 16-23 to move from the node 1 to the node 2. With
      the patch applied, the CPU ids 32-39 are used since they are the lowest
      free ones which have not been used on another node.
      
      At boot time:
        [root@vm40 ~]# numactl -H | grep cpus
        node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
        node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
        node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
      
      Vanilla kernel, after the CPU hot unplug/plug operations:
        [root@vm40 ~]# numactl -H | grep cpus
        node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
        node 1 cpus: 24 25 26 27 28 29 30 31
        node 2 cpus: 16 17 18 19 20 21 22 23 40 41 42 43 44 45 46 47
      
      Patched kernel, after the CPU hot unplug/plug operations:
        [root@vm40 ~]# numactl -H | grep cpus
        node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
        node 1 cpus: 24 25 26 27 28 29 30 31
        node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
      Reviewed-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210429174908.16613-1-ldufour@linux.ibm.com
      bd1dd4c5
    • Laurent Dufour's avatar
      pseries/drmem: update LMBs after LPM · d144f4d5
      Laurent Dufour authored
      After a LPM, the device tree node ibm,dynamic-reconfiguration-memory may be
      updated by the hypervisor in the case the NUMA topology of the LPAR's
      memory is updated.
      
      This is handled by the kernel, but the memory's node is not updated because
      there is no way to move a memory block between nodes from the Linux kernel
      point of view.
      
      If later a memory block is added or removed, drmem_update_dt() is called
      and it is overwriting the DT node ibm,dynamic-reconfiguration-memory to
      match the added or removed LMB. But the LMB's associativity node has not
      been updated after the DT node update and thus the node is overwritten by
      the Linux's topology instead of the hypervisor one.
      
      Introduce a hook called when the ibm,dynamic-reconfiguration-memory node is
      updated to force an update of the LMB's associativity. However, ignore the
      call to that hook when the update has been triggered by drmem_update_dt().
      Because, in that case, the LMB tree has been used to set the DT property
      and thus it doesn't need to be updated back. Since drmem_update_dt() is
      called under the protection of the device_hotplug_lock and the hook is
      called in the same context, use a simple boolean variable to detect that
      call.
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
      Reviewed-by: default avatarNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210517090606.56930-1-ldufour@linux.ibm.com
      d144f4d5
    • Laurent Dufour's avatar
      powerpc/numa: Consider the max NUMA node for migratable LPAR · 9c7248bb
      Laurent Dufour authored
      When a LPAR is migratable, we should consider the maximum possible NUMA
      node instead of the number of NUMA nodes from the actual system.
      
      The DT property 'ibm,current-associativity-domains' defines the maximum
      number of nodes the LPAR can see when running on that box. But if the
      LPAR is being migrated on another box, it may see up to the nodes
      defined by 'ibm,max-associativity-domains'. So if a LPAR is migratable,
      that value should be used.
      
      Unfortunately, there is no easy way to know if an LPAR is migratable or
      not. The hypervisor exports the property 'ibm,migratable-partition' in
      the case it set to migrate partition, but that would not mean that the
      current partition is migratable.
      
      Without this patch, when a LPAR is started on a 2 node box and then
      migrated to a 3 node box, the hypervisor may spread the LPAR's CPUs on
      the 3rd node. In that case if a CPU from that 3rd node is added to the
      LPAR, it will be wrongly assigned to the node because the kernel has
      been set to use up to 2 nodes (the configuration of the departure node).
      With this patch applies, the CPU is correctly added to the 3rd node.
      
      Fixes: f9f130ff ("powerpc/numa: Detect support for coregroup")
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.ibm.com>
      Reviewed-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210511073136.17795-1-ldufour@linux.ibm.com
      9c7248bb
    • Christophe Leroy's avatar
      powerpc/non-smp: Unconditionaly call smp_mb() on switch_mm · c8a6d910
      Christophe Leroy authored
      Commit 3ccfebed ("powerpc, membarrier: Skip memory barrier in
      switch_mm()") added some logic to skip the smp_mb() in
      switch_mm_irqs_off() before the call to switch_mmu_context().
      
      However, on non SMP smp_mb() is just a compiler barrier and doing
      it unconditionaly is simpler than the logic used to check whether the
      barrier is needed or not.
      
      After the patch:
      
      00000000 <switch_mm_irqs_off>:
      ...
         c:	7c 04 18 40 	cmplw   r4,r3
        10:	81 24 00 24 	lwz     r9,36(r4)
        14:	91 25 04 c8 	stw     r9,1224(r5)
        18:	4d 82 00 20 	beqlr
        1c:	48 00 00 00 	b       1c <switch_mm_irqs_off+0x1c>
      			1c: R_PPC_REL24	switch_mmu_context
      
      Before the patch:
      
      00000000 <switch_mm_irqs_off>:
      ...
         c:	7c 04 18 40 	cmplw   r4,r3
        10:	81 24 00 24 	lwz     r9,36(r4)
        14:	91 25 04 c8 	stw     r9,1224(r5)
        18:	4d 82 00 20 	beqlr
        1c:	81 24 00 28 	lwz     r9,40(r4)
        20:	71 29 00 0a 	andi.   r9,r9,10
        24:	40 82 00 34 	bne     58 <switch_mm_irqs_off+0x58>
        28:	48 00 00 00 	b       28 <switch_mm_irqs_off+0x28>
      			28: R_PPC_REL24	switch_mmu_context
      ...
        58:	2c 03 00 00 	cmpwi   r3,0
        5c:	41 82 ff cc 	beq     28 <switch_mm_irqs_off+0x28>
        60:	48 00 00 00 	b       60 <switch_mm_irqs_off+0x60>
      			60: R_PPC_REL24	switch_mmu_context
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/e9d501da0c59f60ca767b1b3ea4603fce6d02b9e.1625486440.git.christophe.leroy@csgroup.eu
      c8a6d910
    • Christophe Leroy's avatar
      powerpc: Remove in_kernel_text() · 09ca4975
      Christophe Leroy authored
      Last user of in_kernel_text() stopped using in with
      commit 549e8152 ("powerpc: Make the 64-bit kernel as a
      position-independent executable").
      
      Generic function is_kernel_text() does the same.
      
      So remote it.
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/2a3a5b6f8cc0ef4e854d7b764f66aa8d2ee270d2.1624813698.git.christophe.leroy@csgroup.eu
      09ca4975
  2. 04 Aug, 2021 8 commits
    • Nicholas Piggin's avatar
      powerpc/64s/perf: Always use SIAR for kernel interrupts · cf9c615c
      Nicholas Piggin authored
      If an interrupt is taken in kernel mode, always use SIAR for it rather than
      looking at regs_sipr. This prevents samples piling up around interrupt
      enable (hard enable or interrupt replay via soft enable) in PMUs / modes
      where the PR sample indication is not in synch with SIAR.
      
      This results in better sampling of interrupt entry and exit in particular.
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Tested-by: default avatarAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210720141504.420110-1-npiggin@gmail.com
      cf9c615c
    • Parth Shah's avatar
      powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings · e9ef81e1
      Parth Shah authored
      On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus
      in thread-group share both L2 and L3 caches. Hence, use cache_property = 2
      itself to find both the L2 and L3 cache siblings.
      Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings,
      but fill the mask using same property "2" array.
      Signed-off-by: default avatarParth Shah <parth@linux.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210728175607.591679-4-parth@linux.ibm.com
      e9ef81e1
    • Gautham R. Shenoy's avatar
      powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() · 69aa8e07
      Gautham R. Shenoy authored
      The helper function get_shared_cpu_map() was added in
      
      'commit 500fe5f5 ("powerpc/cacheinfo: Report the correct
      shared_cpu_map on big-cores")'
      
      and subsequently expanded upon in
      
      'commit 0be47634 ("powerpc/cacheinfo: Print correct cache-sibling
      map/list for L2 cache")'
      
      in order to help report the correct groups of threads sharing these caches
      on big-core systems where groups of threads within a core can share
      different sets of caches.
      
      Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property,
      cache->shared_cpu_map contains the correct set of thread-siblings
      sharing the cache. Hence we no longer need the functions
      get_shared_cpu_map(). This patch removes this function. We also remove
      the helper function index_dir_to_cpu() which was only called by
      get_shared_cpu_map().
      
      With these functions removed, we can still see the correct
      cache-sibling map/list for L1 and L2 caches on systems with L1 and L2
      caches distributed among groups of threads in a core.
      
      With this patch, on a SMT8 POWER10 system where the L1 and L2 caches
      are split between the two groups of threads in a core, for CPUs 8,9,
      the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as
      follows:
      
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15
      /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
      /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
      /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
      /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15
      
      $ ppc64_cpu --smt=4
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11
      /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
      /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
      /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
      /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11
      
      $ ppc64_cpu --smt=2
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9
      /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
      /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
      /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
      /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9
      
      $ ppc64_cpu --smt=1
      $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
      /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
      /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210728175607.591679-3-parth@linux.ibm.com
      69aa8e07
    • Gautham R. Shenoy's avatar
      powerpc/cacheinfo: Lookup cache by dt node and thread-group id · a4bec516
      Gautham R. Shenoy authored
      Currently the cacheinfo code on powerpc indexes the "cache" objects
      (modelling the L1/L2/L3 caches) where the key is device-tree node
      corresponding to that cache. On some of the POWER server platforms
      thread-groups within the core share different sets of caches (Eg: On
      SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and
      threads 1,3,5,7 of the same core share another L1 cache). On such
      platforms, there is a single device-tree node corresponding to that
      cache and the cache-configuration within the threads of the core is
      indicated via "ibm,thread-groups" device-tree property.
      
      Since the current code is not aware of the "ibm,thread-groups"
      property, on the aforementoined systems, cacheinfo code still treats
      all the threads in the core to be sharing the cache because of the
      single device-tree node (In the earlier example, the cacheinfo code
      would says CPUs 0-7 share L1 cache).
      
      In this patch, we make the powerpc cacheinfo code aware of the
      "ibm,thread-groups" property. We indexe the "cache" objects by the
      key-pair (device-tree node, thread-group id). For any CPUX, for a
      given level of cache, the thread-group id is defined to be the first
      CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels
      of cache which are not represented in "ibm,thread-groups" property,
      the thread-group id is -1.
      
      [parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map"
      and "thread_group_l2_cache_map" to get rid of the compile error.]
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarParth Shah <parth@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210728175607.591679-2-parth@linux.ibm.com
      a4bec516
    • Masahiro Yamada's avatar
      powerpc: move the install rule to arch/powerpc/Makefile · 86ff0bce
      Masahiro Yamada authored
      Currently, the install target in arch/powerpc/Makefile descends into
      arch/powerpc/boot/Makefile to invoke the shell script, but there is no
      good reason to do so.
      
      arch/powerpc/Makefile can run the shell script directly.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729141937.445051-3-masahiroy@kernel.org
      86ff0bce
    • Masahiro Yamada's avatar
      powerpc: make the install target not depend on any build artifact · 9bef456b
      Masahiro Yamada authored
      The install target should not depend on any build artifact.
      
      The reason is explained in commit 19514fc6 ("arm, kbuild: make
      "make install" not depend on vmlinux").
      
      Change the PowerPC installation code in a similar way.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729141937.445051-2-masahiroy@kernel.org
      9bef456b
    • Masahiro Yamada's avatar
      powerpc: remove unused zInstall target from arch/powerpc/boot/Makefile · 156ca4e6
      Masahiro Yamada authored
      Commit c913e5f9 ("powerpc/boot: Don't install zImage.* from make
      install") added the zInstall target to arch/powerpc/boot/Makefile,
      but you cannot use it since the corresponding hook is missing in
      arch/powerpc/Makefile.
      
      It has never worked since its addition. Nobody has complained about
      it for 7 years, which means this code was unneeded.
      
      With this removal, the install.sh will be passed in with 4 parameters.
      Simplify the shell script.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729141937.445051-1-masahiroy@kernel.org
      156ca4e6
    • Nathan Chancellor's avatar
      cpuidle: pseries: Mark pseries_idle_proble() as __init · d04691d3
      Nathan Chancellor authored
      After commit 7cbd631d4dec ("cpuidle: pseries: Fixup CEDE0 latency only
      for POWER10 onwards"), pseries_idle_probe() is no longer inlined when
      compiling with clang, which causes a modpost warning:
      
      WARNING: modpost: vmlinux.o(.text+0xc86a54): Section mismatch in
      reference from the function pseries_idle_probe() to the function
      .init.text:fixup_cede0_latency()
      The function pseries_idle_probe() references
      the function __init fixup_cede0_latency().
      This is often because pseries_idle_probe lacks a __init
      annotation or the annotation of fixup_cede0_latency is wrong.
      
      pseries_idle_probe() is a non-init function, which calls
      fixup_cede0_latency(), which is an init function, explaining the
      mismatch. pseries_idle_probe() is only called from
      pseries_processor_idle_init(), which is an init function, so mark
      pseries_idle_probe() as __init so there is no more warning.
      
      Fixes: 054e44ba ("cpuidle: pseries: Add function to parse extended CEDE records")
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210803211547.1093820-1-nathan@kernel.org
      d04691d3
  3. 03 Aug, 2021 3 commits
    • Michal Suchanek's avatar
      powerpc/stacktrace: Include linux/delay.h · a6cae77f
      Michal Suchanek authored
      commit 7c6986ad ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()")
      introduces udelay() call without including the linux/delay.h header.
      This may happen to work on master but the header that declares the
      functionshould be included nonetheless.
      
      Fixes: 7c6986ad ("powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi()")
      Signed-off-by: default avatarMichal Suchanek <msuchanek@suse.de>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210729180103.15578-1-msuchanek@suse.de
      a6cae77f
    • Gautham R. Shenoy's avatar
      cpuidle: pseries: Do not cap the CEDE0 latency in fixup_cede0_latency() · 71737a6c
      Gautham R. Shenoy authored
      Currently in fixup_cede0_latency() code, we perform the fixup the
      CEDE(0) exit latency value only if minimum advertized extended CEDE
      latency values are less than 10us. This was done so as to not break
      the expected behaviour on POWER8 platforms where the advertised
      latency was higher than the default 10us, which would delay the SMT
      folding on the core.
      
      However, after the earlier patch "cpuidle/pseries: Fixup CEDE0 latency
      only for POWER10 onwards", we can be sure that the fixup of CEDE0
      latency is going to happen only from POWER10 onwards. Hence
      unconditionally use the minimum exit latency provided by the platform.
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1626676399-15975-3-git-send-email-ego@linux.vnet.ibm.com
      71737a6c
    • Gautham R. Shenoy's avatar
      cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards · 50741b70
      Gautham R. Shenoy authored
      Commit d947fb4c ("cpuidle: pseries: Fixup exit latency for
      CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
      of the Extended CEDE states advertised by the platform
      
      On POWER9 LPARs, the firmwares advertise a very low value of 2us for
      CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the
      PHYP hypervisor corresponds to the latency required to wakeup from the
      underlying hardware idle state. However the wakeup latency from the
      LPAR perspective should include
      
      1. The time taken to transition the CPU from the Hypervisor into the
         LPAR post wakeup from platform idle state
      
      2. Time taken to send the IPI from the source CPU (waker) to the idle
         target CPU (wakee).
      
      1. can be measured via timer idle test, where we queue a timer, say
      for 1ms, and enter the CEDE state. When the timer fires, in the timer
      handler we compute how much extra timer over the expected 1ms have we
      consumed. On a a POWER9 LPAR the numbers are
      
      CEDE latency measured using a timer (numbers in ns)
      N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
      400     2601     5677     5668.74    5917    6413     9299   455.01
      
      1. and 2. combined can be determined by an IPI latency test where we
      send an IPI to an idle CPU and in the handler compute the time
      difference between when the IPI was sent and when the handler ran. We
      see the following numbers on POWER9 LPAR.
      
      CEDE latency measured using an IPI (numbers in ns)
      N       Min      Median   Avg       90%ile  99%ile    Max    Stddev
      400     711      7564     7369.43   8559    9514      9698   1200.01
      
      Suppose, we consider the 99th percentile latency value measured using
      the IPI to be the wakeup latency, the value would be 9.5us This is in
      the ballpark of the default value of 10us.
      
      Hence, use the exit latency of CEDE(0) based on the latency values
      advertized by platform only from POWER10 onwards. The values
      advertized on POWER10 platforms is more realistic and informed by the
      latency measurements. For earlier platforms stick to the default value
      of 10us. The fix was suggested by Michael Ellerman.
      
      Fixes: d947fb4c ("cpuidle: pseries: Fixup exit latency for CEDE(0)")
      Reported-by: default avatarEnrico Joedecke <joedecke@de.ibm.com>
      Signed-off-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1626676399-15975-2-git-send-email-ego@linux.vnet.ibm.com
      50741b70
  4. 26 Jul, 2021 2 commits
  5. 23 Jul, 2021 2 commits
    • Nicholas Piggin's avatar
      KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state · d9c57d3e
      Nicholas Piggin authored
      The H_ENTER_NESTED hypercall is handled by the L0, and it is a request
      by the L1 to switch the context of the vCPU over to that of its L2
      guest, and return with an interrupt indication. The L1 is responsible
      for switching some registers to guest context, and the L0 switches
      others (including all the hypervisor privileged state).
      
      If the L2 MSR has TM active, then the L1 is responsible for
      recheckpointing the L2 TM state. Then the L1 exits to L0 via the
      H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit,
      and then it recheckpoints the TM state as part of the nested entry and
      finally HRFIDs into the L2 with TM active MSR. Not efficient, but about
      the simplest approach for something that's horrendously complicated.
      
      Problems arise if the L1 exits to the L0 with a TM state which does not
      match the L2 TM state being requested. For example if the L1 is
      transactional but the L2 MSR is non-transactional, or vice versa. The
      L0's HRFID can take a TM Bad Thing interrupt and crash.
      
      Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then
      ensuring that if the L1 is suspended then the L2 must have TM active,
      and if the L1 is not suspended then the L2 must not have TM active.
      
      Fixes: 360cae31 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall")
      Cc: stable@vger.kernel.org # v4.20+
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      d9c57d3e
    • Nicholas Piggin's avatar
      KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow · f62f3c20
      Nicholas Piggin authored
      The kvmppc_rtas_hcall() sets the host rtas_args.rets pointer based on
      the rtas_args.nargs that was provided by the guest. That guest nargs
      value is not range checked, so the guest can cause the host rets pointer
      to be pointed outside the args array. The individual rtas function
      handlers check the nargs and nrets values to ensure they are correct,
      but if they are not, the handlers store a -3 (0xfffffffd) failure
      indication in rets[0] which corrupts host memory.
      
      Fix this by testing up front whether the guest supplied nargs and nret
      would exceed the array size, and fail the hcall directly without storing
      a failure indication to rets[0].
      
      Also expand on a comment about why we kill the guest and try not to
      return errors directly if we have a valid rets[0] pointer.
      
      Fixes: 8e591cb7 ("KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls")
      Cc: stable@vger.kernel.org # v3.10+
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      f62f3c20
  6. 18 Jul, 2021 13 commits
    • Linus Torvalds's avatar
      Linux 5.14-rc2 · 2734d6c1
      Linus Torvalds authored
      2734d6c1
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.14-2021-07-18' of... · 8c25c447
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Skip invalid hybrid PMU on hybrid systems when the atom (little) CPUs
         are offlined.
      
       - Fix 'perf test' problems related to the recently added hybrid
         (BIG/little) code.
      
       - Split ARM's coresight (hw tracing) decode by aux records to avoid
         fatal decoding errors.
      
       - Fix add event failure in 'perf probe' when running 32-bit perf in a
         64-bit kernel.
      
       - Fix 'perf sched record' failure when CONFIG_SCHEDSTATS is not set.
      
       - Fix memory and refcount leaks detected by ASAn when running 'perf
         test', should be clean of warnings now.
      
       - Remove broken definition of __LITTLE_ENDIAN from tools'
         linux/kconfig.h, which was breaking the build in some systems.
      
       - Cast PTHREAD_STACK_MIN to int as it may turn into 'long
         sysconf(__SC_THREAD_STACK_MIN_VALUE), breaking the build in some
         systems.
      
       - Fix libperf build error with LIBPFM4=1.
      
       - Sync UAPI files changed by the memfd_secret new syscall.
      
      * tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (35 commits)
        perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
        perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel
        perf data: Close all files in close_dir()
        perf probe-file: Delete namelist in del_events() on the error path
        perf test bpf: Free obj_buf
        perf trace: Free strings in trace__parse_events_option()
        perf trace: Free syscall tp fields in evsel->priv
        perf trace: Free syscall->arg_fmt
        perf trace: Free malloc'd trace fields on exit
        perf lzma: Close lzma stream on exit
        perf script: Fix memory 'threads' and 'cpus' leaks on exit
        perf script: Release zstd data
        perf session: Cleanup trace_event
        perf inject: Close inject.output on exit
        perf report: Free generated help strings for sort option
        perf env: Fix memory leak of cpu_pmu_caps
        perf test maps__merge_in: Fix memory leak of maps
        perf dso: Fix memory leak in dso__new_map()
        perf test event_update: Fix memory leak of unit
        perf test event_update: Fix memory leak of evlist
        ...
      8c25c447
    • Linus Torvalds's avatar
      Merge tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f0eb870a
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "A few fixes for issues in the new online shrink code, additional
        corrections for my recent bug-hunt w.r.t. extent size hints on
        realtime, and improved input checking of the GROWFSRT ioctl.
      
        IOW, the usual 'I somehow got bored during the merge window and
        resumed auditing the farther reaches of xfs':
      
         - Fix shrink eligibility checking when sparse inode clusters enabled
      
         - Reset '..' directory entries when unlinking directories to prevent
           verifier errors if fs is shrinked later
      
         - Don't report unusable extent size hints to FSGETXATTR
      
         - Don't warn when extent size hints are unusable because the sysadmin
           configured them that way
      
         - Fix insufficient parameter validation in GROWFSRT ioctl
      
         - Fix integer overflow when adding rt volumes to filesystem"
      
      * tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: detect misaligned rtinherit directory extent size hints
        xfs: fix an integer overflow error in xfs_growfs_rt
        xfs: improve FSGROWFSRT precondition checking
        xfs: don't expose misaligned extszinherit hints to userspace
        xfs: correct the narrative around misaligned rtinherit/extszinherit dirs
        xfs: reset child dir '..' entry when unlinking child
        xfs: check for sparse inode clusters that cross new EOAG when shrinking
      f0eb870a
    • Linus Torvalds's avatar
      Merge tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · fbf1bddc
      Linus Torvalds authored
      Pull iomap fixes from Darrick Wong:
       "A handful of bugfixes for the iomap code.
      
        There's nothing especially exciting here, just fixes for UBSAN (not
        KASAN as I erroneously wrote in the tag message) warnings about
        undefined behavior in the SEEK_DATA/SEEK_HOLE code, and some
        reshuffling of per-page block state info to fix some problems with
        gfs2.
      
         - Fix KASAN warnings due to integer overflow in SEEK_DATA/SEEK_HOLE
      
         - Fix assertion errors when using inlinedata files on gfs2"
      
      * tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor
        iomap: Don't create iomap_page objects for inline files
        iomap: Permit pages without an iop to enter writeback
        iomap: remove the length variable in iomap_seek_hole
        iomap: remove the length variable in iomap_seek_data
      fbf1bddc
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.14' of... · 6750691a
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Restore the original behavior of scripts/setlocalversion when
         LOCALVERSION is set to empty.
      
       - Show Kconfig prompts even for 'make -s'
      
       - Fix the combination of COFNIG_LTO_CLANG=y and CONFIG_MODVERSIONS=y
         for older GNU Make versions
      
      * tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        Documentation: Fix intiramfs script name
        Kbuild: lto: fix module versionings mismatch in GNU make 3.X
        kbuild: do not suppress Kconfig prompts for silent build
        scripts/setlocalversion: fix a bug when LOCALVERSION is empty
      6750691a
    • Robert Richter's avatar
      Documentation: Fix intiramfs script name · 5e60f363
      Robert Richter authored
      Documentation was not changed when renaming the script in commit
      80e715a0 ("initramfs: rename gen_initramfs_list.sh to
      gen_initramfs.sh"). Fixing this.
      
      Basically does:
      
       $ sed -i -e s/gen_initramfs_list.sh/gen_initramfs.sh/g $(git grep -l gen_initramfs_list.sh)
      
      Fixes: 80e715a0 ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh")
      Signed-off-by: default avatarRobert Richter <rrichter@amd.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      5e60f363
    • Lecopzer Chen's avatar
      Kbuild: lto: fix module versionings mismatch in GNU make 3.X · 1d11053d
      Lecopzer Chen authored
      When building modules(CONFIG_...=m), I found some of module versions
      are incorrect and set to 0.
      This can be found in build log for first clean build which shows
      
      WARNING: EXPORT symbol "XXXX" [drivers/XXX/XXX.ko] version generation failed,
      symbol will not be versioned.
      
      But in second build(incremental build), the WARNING disappeared and the
      module version becomes valid CRC and make someone who want to change
      modules without updating kernel image can't insert their modules.
      
      The problematic code is
      +	$(foreach n, $(filter-out FORCE,$^),				\
      +		$(if $(wildcard $(n).symversions),			\
      +			; cat $(n).symversions >> $@.symversions))
      
      For example:
        rm -f fs/notify/built-in.a.symversions    ; rm -f fs/notify/built-in.a; \
      llvm-ar cDPrST fs/notify/built-in.a fs/notify/fsnotify.o \
      fs/notify/notification.o fs/notify/group.o ...
      
      `foreach n` shows nothing to `cat` into $(n).symversions because
      `if $(wildcard $(n).symversions)` return nothing, but actually
      they do exist during this line was executed.
      
      -rw-r--r-- 1 root root 168580 Jun 13 19:10 fs/notify/fsnotify.o
      -rw-r--r-- 1 root root    111 Jun 13 19:10 fs/notify/fsnotify.o.symversions
      
      The reason is the $(n).symversions are generated at runtime, but
      Makefile wildcard function expends and checks the file exist or not
      during parsing the Makefile.
      
      Thus fix this by use `test` shell command to check the file
      existence in runtime.
      
      Rebase from both:
      1. [https://lore.kernel.org/lkml/20210616080252.32046-1-lecopzer.chen@mediatek.com/]
      2. [https://lore.kernel.org/lkml/20210702032943.7865-1-lecopzer.chen@mediatek.com/]
      
      Fixes: 38e89184 ("kbuild: lto: fix module versioning")
      Co-developed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarLecopzer Chen <lecopzer.chen@mediatek.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      1d11053d
    • Masahiro Yamada's avatar
      kbuild: do not suppress Kconfig prompts for silent build · d952cfaf
      Masahiro Yamada authored
      When a new CONFIG option is available, Kbuild shows a prompt to get
      the user input.
      
        $ make
        [ snip ]
        Core Scheduling for SMT (SCHED_CORE) [N/y/?] (NEW)
      
      This is the only interactive place in the build process.
      
      Commit 174a1dcc ("kbuild: sink stdout from cmd for silent build")
      suppressed Kconfig prompts as well because syncconfig is invoked by
      the 'cmd' macro. You cannot notice the fact that Kconfig is waiting
      for the user input.
      
      Use 'kecho' to show the equivalent short log without suppressing stdout
      from sub-make.
      
      Fixes: 174a1dcc ("kbuild: sink stdout from cmd for silent build")
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      d952cfaf
    • Mikulas Patocka's avatar
      scripts/setlocalversion: fix a bug when LOCALVERSION is empty · 5df99bec
      Mikulas Patocka authored
      The commit 042da426 ("scripts/setlocalversion: simplify the short
      version part") reduces indentation. Unfortunately, it also changes behavior
      in a subtle way - if the user has empty "LOCALVERSION" variable, the plus
      sign is appended to the kernel version. It wasn't appended before.
      
      This patch reverts to the old behavior - we append the plus sign only if
      the LOCALVERSION variable is not set.
      
      Fixes: 042da426 ("scripts/setlocalversion: simplify the short version part")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      5df99bec
    • Yang Jihong's avatar
      perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set · b0f00855
      Yang Jihong authored
      The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user
      if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events.
      As a result, the command fails.
      
      Before:
      
        #perf sched record sleep 1
        event syntax error: 'sched:sched_stat_wait'
                             \___ unknown tracepoint
      
        Error:  File /sys/kernel/tracing/events/sched/sched_stat_wait not found.
        Hint:   Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
      
        Run 'perf list' for a list of valid events
      
         Usage: perf record [<options>] [<command>]
            or: perf record [<options>] -- <command> [<options>]
      
            -e, --event <event>   event selector. use 'perf list' to list available events
      
      Solution:
        Check whether schedstat tracepoints are exposed. If no, these events are not recorded.
      
      After:
        # perf sched record sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ]
        # perf sched report
        run measurement overhead: 4736 nsecs
        sleep measurement overhead: 9059979 nsecs
        the run test took 999854 nsecs
        the sleep test took 8945271 nsecs
        nr_run_events:        716
        nr_sleep_events:      785
        nr_wakeup_events:     0
        ...
        ------------------------------------------------------------
      
      Fixes: 2a09b5de ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set")
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b0f00855
    • Yang Jihong's avatar
      perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel · 22a66551
      Yang Jihong authored
      The "address" member of "struct probe_trace_point" uses long data type.
      If kernel is 64-bit and perf program is 32-bit, size of "address"
      variable is 32 bits.
      
      As a result, upper 32 bits of address read from kernel are truncated, an
      error occurs during address comparison in kprobe_warn_out_range().
      
      Before:
      
        # perf probe -a schedule
        schedule is out of .text, skip it.
          Error: Failed to add events.
      
      Solution:
        Change data type of "address" variable to u64 and change corresponding
      address printing and value assignment.
      
      After:
      
        # perf.new.new probe -a schedule
        Added new event:
          probe:schedule       (on schedule)
      
        You can now use it in all perf tools, such as:
      
                perf record -e probe:schedule -aR sleep 1
      
        # perf probe -l
          probe:schedule       (on schedule@kernel/sched/core.c)
        # perf record -e probe:schedule -aR sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.156 MB perf.data (1366 samples) ]
        # perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'probe:schedule'
        # Event count (approx.): 1366
        #
        # Overhead  Command          Shared Object      Symbol
        # ........  ...............  .................  ............
        #
             6.22%  migration/0      [kernel.kallsyms]  [k] schedule
             6.22%  migration/1      [kernel.kallsyms]  [k] schedule
             6.22%  migration/2      [kernel.kallsyms]  [k] schedule
             6.22%  migration/3      [kernel.kallsyms]  [k] schedule
             6.15%  migration/10     [kernel.kallsyms]  [k] schedule
             6.15%  migration/11     [kernel.kallsyms]  [k] schedule
             6.15%  migration/12     [kernel.kallsyms]  [k] schedule
             6.15%  migration/13     [kernel.kallsyms]  [k] schedule
             6.15%  migration/14     [kernel.kallsyms]  [k] schedule
             6.15%  migration/15     [kernel.kallsyms]  [k] schedule
             6.15%  migration/4      [kernel.kallsyms]  [k] schedule
             6.15%  migration/5      [kernel.kallsyms]  [k] schedule
             6.15%  migration/6      [kernel.kallsyms]  [k] schedule
             6.15%  migration/7      [kernel.kallsyms]  [k] schedule
             6.15%  migration/8      [kernel.kallsyms]  [k] schedule
             6.15%  migration/9      [kernel.kallsyms]  [k] schedule
             0.22%  rcu_sched        [kernel.kallsyms]  [k] schedule
        ...
        #
        # (Cannot load tips.txt file, please install perf!)
        #
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jianlin Lv <jianlin.lv@arm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Li Huafei <lihuafei1@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Link: http://lore.kernel.org/lkml/20210715063723.11926-1-yangjihong1@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      22a66551
    • Riccardo Mancini's avatar
      perf data: Close all files in close_dir() · d4b3eedc
      Riccardo Mancini authored
      When using 'perf report' in directory mode, the first file is not closed
      on exit, causing a memory leak.
      
      The problem is caused by the iterating variable never reaching 0.
      
      Fixes: 14552063 ("perf data: Add perf_data__(create_dir|close_dir) functions")
      Signed-off-by: default avatarRiccardo Mancini <rickyman7@gmail.com>
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Link: http://lore.kernel.org/lkml/20210716141122.858082-1-rickyman7@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d4b3eedc
    • Riccardo Mancini's avatar
      perf probe-file: Delete namelist in del_events() on the error path · e0fa7ab4
      Riccardo Mancini authored
      ASan reports some memory leaks when running:
      
        # perf test "42: BPF filter"
      
      This second leak is caused by a strlist not being dellocated on error
      inside probe_file__del_events.
      
      This patch adds a goto label before the deallocation and makes the error
      path jump to it.
      Signed-off-by: default avatarRiccardo Mancini <rickyman7@gmail.com>
      Fixes: e7895e42 ("perf probe: Split del_perf_probe_events()")
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/174963c587ae77fa108af794669998e4ae558338.1626343282.git.rickyman7@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e0fa7ab4
  7. 17 Jul, 2021 4 commits
    • Linus Torvalds's avatar
      Merge tag 'soc-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 1d67c8d9
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "Here are the patches for this week that came as the fallout of the
        merge window:
      
         - Two fixes for the NVidia memory controller driver
      
         - multiple defconfig files get patched to turn CONFIG_FB back on
           after that is no longer selected by CONFIG_DRM
      
         - ffa and scmpi firmware drivers fixes, mostly addressing compiler
           and documentation warnings
      
         - Platform specific fixes for device tree files on ASpeed, Renesas
           and NVidia SoC, mostly for recent regressions.
      
         - A workaround for a regression on the USB PHY with devlink when the
           usb-nop-xceiv driver is not available until the rootfs is mounted.
      
         - Device tree compiler warnings in Arm Versatile-AB"
      
      * tag 'soc-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
        ARM: dts: versatile: Fix up interrupt controller node names
        ARM: multi_v7_defconfig: Make NOP_USB_XCEIV driver built-in
        ARM: configs: Update u8500_defconfig
        ARM: configs: Update Vexpress defconfig
        ARM: configs: Update Versatile defconfig
        ARM: configs: Update RealView defconfig
        ARM: configs: Update Integrator defconfig
        arm: Typo s/PCI_IXP4XX_LEGACY/IXP4XX_PCI_LEGACY/
        firmware: arm_scmi: Fix range check for the maximum number of pending messages
        firmware: arm_scmi: Avoid padding in sensor message structure
        firmware: arm_scmi: Fix kernel doc warnings about return values
        firmware: arm_scpi: Fix kernel doc warnings
        firmware: arm_scmi: Fix kernel doc warnings
        ARM: shmobile: defconfig: Restore graphical consoles
        firmware: arm_ffa: Fix a possible ffa_linux_errmap buffer overflow
        firmware: arm_ffa: Fix the comment style
        firmware: arm_ffa: Simplify probe function
        firmware: arm_ffa: Ensure drivers provide a probe function
        firmware: arm_scmi: Fix possible scmi_linux_errmap buffer overflow
        firmware: arm_scmi: Ensure drivers provide a probe function
        ...
      1d67c8d9
    • Linus Torvalds's avatar
      Revert "mm/slub: use stackdepot to save stack trace in objects" · ae14c63a
      Linus Torvalds authored
      This reverts commit 78869146.
      
      It's not clear why, but it causes unexplained problems in entirely
      unrelated xfs code.  The most likely explanation is some slab
      corruption, possibly triggered due to CONFIG_SLUB_DEBUG_ON.  See [1].
      
      It ends up having a few other problems too, like build errors on
      arch/arc, and Geert reporting it using much more memory on m68k [3] (it
      probably does so elsewhere too, but it is probably just more noticeable
      on m68k).
      
      The architecture issues (both build and memory use) are likely just
      because this change effectively force-enabled STACKDEPOT (along with a
      very bad default value for the stackdepot hash size).  But together with
      the xfs issue, this all smells like "this commit was not ready" to me.
      
      Link: https://lore.kernel.org/linux-xfs/YPE3l82acwgI2OiV@infradead.org/ [1]
      Link: https://lore.kernel.org/lkml/202107150600.LkGNb4Vb-lkp@intel.com/ [2]
      Link: https://lore.kernel.org/lkml/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/ [3]
      Reported-by: default avatarChristoph Hellwig <hch@infradead.org>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae14c63a
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 5d766d55
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "One core fix for an oops which can occur if the error handling thread
        fails to start for some reason and the driver is removed.
      
        The other fixes are all minor ones in drivers"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ufs: core: Add missing host_lock in ufshcd_vops_setup_xfer_req()
        scsi: mpi3mr: Fix W=1 compilation warnings
        scsi: pm8001: Clean up kernel-doc and comments
        scsi: zfcp: Report port fc_security as unknown early during remote cable pull
        scsi: core: Fix bad pointer dereference when ehandler kthread is invalid
        scsi: fas216: Fix a build error
        scsi: core: Fix the documentation of the scsi_execute() time parameter
      5d766d55
    • Linus Torvalds's avatar
      Merge tag '5.14-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 44cb60b4
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Eight cifs/smb3 fixes, including three for stable.
      
        Three are DFS related fixes, and two to fix problems pointed out by
        static checkers"
      
      * tag '5.14-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: do not share tcp sessions of dfs connections
        SMB3.1.1: fix mount failure to some servers when compression enabled
        cifs: added WARN_ON for all the count decrements
        cifs: fix missing null session check in mount
        cifs: handle reconnect of tcon when there is no cached dfs referral
        cifs: fix the out of range assignment to bit fields in parse_server_interfaces
        cifs: Do not use the original cruid when following DFS links for multiuser mounts
        cifs: use the expiry output of dns_query to schedule next resolution
      44cb60b4