1. 02 Feb, 2023 2 commits
  2. 01 Feb, 2023 1 commit
  3. 31 Jan, 2023 4 commits
    • Michael Ellerman's avatar
      powerpc/64s/radix: Fix RWX mapping with relocated kernel · 111bcb37
      Michael Ellerman authored
      If a relocatable kernel is loaded at a non-zero address and told not to
      relocate to zero (kdump or RELOCATABLE_TEST), the mapping of the
      interrupt code at zero is left with RWX permissions.
      
      That is a security weakness, and leads to a warning at boot if
      CONFIG_DEBUG_WX is enabled:
      
        powerpc/mm: Found insecure W+X mapping at address 00000000056435bc/0xc000000000000000
        WARNING: CPU: 1 PID: 1 at arch/powerpc/mm/ptdump/ptdump.c:193 note_page+0x484/0x4c0
        CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc1-00001-g8ae8e98aea82-dirty #175
        Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-dd0dca hv:linux,kvm pSeries
        NIP:  c0000000004a1c34 LR: c0000000004a1c30 CTR: 0000000000000000
        REGS: c000000003503770 TRAP: 0700   Not tainted  (6.2.0-rc1-00001-g8ae8e98aea82-dirty)
        MSR:  8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24000220  XER: 00000000
        CFAR: c000000000545a58 IRQMASK: 0
        ...
        NIP note_page+0x484/0x4c0
        LR  note_page+0x480/0x4c0
        Call Trace:
          note_page+0x480/0x4c0 (unreliable)
          ptdump_pmd_entry+0xc8/0x100
          walk_pgd_range+0x618/0xab0
          walk_page_range_novma+0x74/0xc0
          ptdump_walk_pgd+0x98/0x170
          ptdump_check_wx+0x94/0x100
          mark_rodata_ro+0x30/0x70
          kernel_init+0x78/0x1a0
          ret_from_kernel_thread+0x5c/0x64
      
      The fix has two parts. Firstly the pages from zero up to the end of
      interrupts need to be marked read-only, so that they are left with R-X
      permissions. Secondly the mapping logic needs to be taught to ensure
      there is a page boundary at the end of the interrupt region, so that the
      permission change only applies to the interrupt text, and not the region
      following it.
      
      Fixes: c55d7b5e ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
      Reported-by: default avatarSachin Sant <sachinp@linux.ibm.com>
      Tested-by: default avatarSachin Sant <sachinp@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20230110124753.1325426-2-mpe@ellerman.id.au
      111bcb37
    • Michael Ellerman's avatar
      powerpc/64s/radix: Fix crash with unaligned relocated kernel · 98d0219e
      Michael Ellerman authored
      If a relocatable kernel is loaded at an address that is not 2MB aligned
      and told not to relocate to zero, the kernel can crash due to
      mark_rodata_ro() incorrectly changing some read-write data to read-only.
      
      Scenarios where the misalignment can occur are when the kernel is
      loaded by kdump or using the RELOCATABLE_TEST config option.
      
      Example crash with the kernel loaded at 5MB:
      
        Run /sbin/init as init process
        BUG: Unable to handle kernel data access on write at 0xc000000000452000
        Faulting instruction address: 0xc0000000005b6730
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
        CPU: 1 PID: 1 Comm: init Not tainted 6.2.0-rc1-00011-g349188be4841 #166
        Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,git-5b4c5a hv:linux,kvm pSeries
        NIP:  c0000000005b6730 LR: c000000000ae9ab8 CTR: 0000000000000380
        REGS: c000000004503250 TRAP: 0300   Not tainted  (6.2.0-rc1-00011-g349188be4841)
        MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44288480  XER: 00000000
        CFAR: c0000000005b66ec DAR: c000000000452000 DSISR: 0a000000 IRQMASK: 0
        ...
        NIP memset+0x68/0x104
        LR  zero_user_segments.constprop.0+0xa8/0xf0
        Call Trace:
          ext4_mpage_readpages+0x7f8/0x830
          ext4_readahead+0x48/0x60
          read_pages+0xb8/0x380
          page_cache_ra_unbounded+0x19c/0x250
          filemap_fault+0x58c/0xae0
          __do_fault+0x60/0x100
          __handle_mm_fault+0x1230/0x1a40
          handle_mm_fault+0x120/0x300
          ___do_page_fault+0x20c/0xa80
          do_page_fault+0x30/0xc0
          data_access_common_virt+0x210/0x220
      
      This happens because mark_rodata_ro() tries to change permissions on the
      range _stext..__end_rodata, but _stext sits in the middle of the 2MB
      page from 4MB to 6MB:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
        radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000000400000-0x0000000002400000 with 2.00 MiB pages (exec)
      
      The logic that changes the permissions assumes the linear mapping was
      split correctly at boot, so it marks the entire 2MB page read-only. That
      leads to the write fault above.
      
      To fix it, the boot time mapping logic needs to consider that if the
      kernel is running at a non-zero address then _stext is a boundary where
      it must split the mapping.
      
      That leads to the mapping being split correctly, allowing the rodata
      permission change to take happen correctly, with no spillover:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
        radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000000400000-0x0000000000500000 with 64.0 KiB pages
        radix-mmu: Mapped 0x0000000000500000-0x0000000000600000 with 64.0 KiB pages (exec)
        radix-mmu: Mapped 0x0000000000600000-0x0000000002400000 with 2.00 MiB pages (exec)
      
      If the kernel is loaded at a 2MB aligned address, the mapping continues
      to use 2MB pages as before:
      
        radix-mmu: Mapped 0x0000000000000000-0x0000000000200000 with 2.00 MiB pages (exec)
        radix-mmu: Mapped 0x0000000000200000-0x0000000000400000 with 2.00 MiB pages
        radix-mmu: Mapped 0x0000000000400000-0x0000000002c00000 with 2.00 MiB pages (exec)
        radix-mmu: Mapped 0x0000000002c00000-0x0000000100000000 with 2.00 MiB pages
      
      Fixes: c55d7b5e ("powerpc: Remove STRICT_KERNEL_RWX incompatibility with RELOCATABLE")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20230110124753.1325426-1-mpe@ellerman.id.au
      98d0219e
    • Michael Ellerman's avatar
      powerpc/kexec_file: Fix division by zero in extra size estimation · 7294194b
      Michael Ellerman authored
      In kexec_extra_fdt_size_ppc64() there's logic to estimate how much
      extra space will be needed in the device tree for some memory related
      properties.
      
      That logic uses the size of RAM divided by drmem_lmb_size() to do the
      estimation. However drmem_lmb_size() can be zero if the machine has no
      hotpluggable memory configured, which is the case when booting with qemu
      and no maxmem=x parameter is passed (the default).
      
      The division by zero is reported by UBSAN, and can also lead to an
      overflow and a warning from kvmalloc, and kdump kernel loading fails:
      
        WARNING: CPU: 0 PID: 133 at mm/util.c:596 kvmalloc_node+0x15c/0x160
        Modules linked in:
        CPU: 0 PID: 133 Comm: kexec Not tainted 6.2.0-rc5-03455-g07358bd97810 #223
        Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1200 0xf000005 of:SLOF,git-dd0dca pSeries
        NIP:  c00000000041ff4c LR: c00000000041fe58 CTR: 0000000000000000
        REGS: c0000000096ef750 TRAP: 0700   Not tainted  (6.2.0-rc5-03455-g07358bd97810)
        MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24248242  XER: 2004011e
        CFAR: c00000000041fed0 IRQMASK: 0
        ...
        NIP kvmalloc_node+0x15c/0x160
        LR  kvmalloc_node+0x68/0x160
        Call Trace:
          kvmalloc_node+0x68/0x160 (unreliable)
          of_kexec_alloc_and_setup_fdt+0xb8/0x7d0
          elf64_load+0x25c/0x4a0
          kexec_image_load_default+0x58/0x80
          sys_kexec_file_load+0x5c0/0x920
          system_call_exception+0x128/0x330
          system_call_vectored_common+0x15c/0x2ec
      
      To fix it, skip the calculation if drmem_lmb_size() is zero.
      
      Fixes: 2377c92e ("powerpc/kexec_file: fix FDT size estimation for kdump kernel")
      Cc: stable@vger.kernel.org # v5.12+
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20230130014707.541110-1-mpe@ellerman.id.au
      7294194b
    • Michael Ellerman's avatar
      powerpc/imc-pmu: Revert nest_init_lock to being a mutex · ad53db4a
      Michael Ellerman authored
      The recent commit 76d588dd ("powerpc/imc-pmu: Fix use of mutex in
      IRQs disabled section") fixed warnings (and possible deadlocks) in the
      IMC PMU driver by converting the locking to use spinlocks.
      
      It also converted the init-time nest_init_lock to a spinlock, even
      though it's not used at runtime in IRQ disabled sections or while
      holding other spinlocks.
      
      This leads to warnings such as:
      
        BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
        in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
        preempt_count: 1, expected: 0
        CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc2-14719-gf12cd061-dirty #1
        Hardware name: Mambo,Simulated-System POWER9 0x4e1203 opal:v6.6.6 PowerNV
        Call Trace:
          dump_stack_lvl+0x74/0xa8 (unreliable)
          __might_resched+0x178/0x1a0
          __cpuhp_setup_state+0x64/0x1e0
          init_imc_pmu+0xe48/0x1250
          opal_imc_counters_probe+0x30c/0x6a0
          platform_probe+0x78/0x110
          really_probe+0x104/0x420
          __driver_probe_device+0xb0/0x170
          driver_probe_device+0x58/0x180
          __driver_attach+0xd8/0x250
          bus_for_each_dev+0xb4/0x140
          driver_attach+0x34/0x50
          bus_add_driver+0x1e8/0x2d0
          driver_register+0xb4/0x1c0
          __platform_driver_register+0x38/0x50
          opal_imc_driver_init+0x2c/0x40
          do_one_initcall+0x80/0x360
          kernel_init_freeable+0x310/0x3b8
          kernel_init+0x30/0x1a0
          ret_from_kernel_thread+0x5c/0x64
      
      Fix it by converting nest_init_lock back to a mutex, so that we can call
      sleeping functions while holding it. There is no interaction between
      nest_init_lock and the runtime spinlocks used by the actual PMU routines.
      
      Fixes: 76d588dd ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled section")
      Tested-by: Kajol Jain<kjain@linux.ibm.com>
      Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20230130014401.540543-1-mpe@ellerman.id.au
      ad53db4a
  4. 30 Jan, 2023 4 commits
  5. 11 Jan, 2023 3 commits
    • Yang Yingliang's avatar
      powerpc/64s/hash: Make stress_hpt_timer_fn() static · f12cd061
      Yang Yingliang authored
      stress_hpt_timer_fn() is only used in hash_utils.c, make it static.
      
      Fixes: 6b34a099 ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20221228093603.3166599-1-yangyingliang@huawei.com
      f12cd061
    • Kajol Jain's avatar
      powerpc/imc-pmu: Fix use of mutex in IRQs disabled section · 76d588dd
      Kajol Jain authored
      Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP
      and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event.
      
      Command to trigger the warning:
        # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5
      
         Performance counter stats for 'sleep 5':
      
                         0      thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/
      
               5.002117947 seconds time elapsed
      
               0.000131000 seconds user
               0.001063000 seconds sys
      
      Below is snippet of the warning in dmesg:
      
        BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
        in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec
        preempt_count: 2, expected: 0
        4 locks held by perf-exec/2869:
         #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90
         #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0
         #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510
         #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510
        irq event stamp: 4806
        hardirqs last  enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0
        hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510
        softirqs last  enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61
        Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
        Call Trace:
          dump_stack_lvl+0x98/0xe0 (unreliable)
          __might_resched+0x2f8/0x310
          __mutex_lock+0x6c/0x13f0
          thread_imc_event_add+0xf4/0x1b0
          event_sched_in+0xe0/0x210
          merge_sched_in+0x1f0/0x600
          visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0
          ctx_flexible_sched_in+0xcc/0x140
          ctx_sched_in+0x20c/0x2a0
          ctx_resched+0x104/0x1c0
          perf_event_exec+0x340/0x510
          begin_new_exec+0x730/0xef0
          load_elf_binary+0x3f8/0x1e10
        ...
        do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0
        WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0
        CPU: 36 PID: 2869 Comm: sleep Tainted: G        W          6.2.0-rc2-00011-g1247637727f2 #61
        Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
        NIP:  c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670
        REGS: c00000004d2134e0 TRAP: 0700   Tainted: G        W           (6.2.0-rc2-00011-g1247637727f2)
        MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48002824  XER: 00000000
        CFAR: c00000000013fb64 IRQMASK: 1
      
      The above warning triggered because the current imc-pmu code uses mutex
      lock in interrupt disabled sections. The function mutex_lock()
      internally calls __might_resched(), which will check if IRQs are
      disabled and in case IRQs are disabled, it will trigger the warning.
      
      Fix the issue by changing the mutex lock to spinlock.
      
      Fixes: 8f95faaa ("powerpc/powernv: Detect and create IMC device")
      Reported-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarKajol Jain <kjain@linux.ibm.com>
      [mpe: Fix comments, trim oops in change log, add reported-by tags]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20230106065157.182648-1-kjain@linux.ibm.com
      76d588dd
    • Ojaswin Mujoo's avatar
      powerpc/boot: Fix incorrect version calculation issue in ld_version · 3287ebd7
      Ojaswin Mujoo authored
      The ld_version() function computes the wrong version value for certain
      ld versions such as the following:
      
        $ ld --version
        GNU ld (GNU Binutils; SUSE Linux Enterprise 15)
        2.37.20211103-150100.7.37
      
      For input 2.37.20211103, the value computed is 202348030000 which is
      higher than the value for a later version like 2.39.0, which is
      23900000.
      
      This issue was highlighted because with the above ld version, the
      powerpc kernel build started failing with ld error: "unrecognized option
      --no-warn-rwx-segments". This was caused due to the recent commit
      579aee9f ("powerpc: suppress some linker warnings in recent linker
      versions") which added the --no-warn-rwx-segments linker flag if the ld
      version is greater than 2.39.
      
      Due to the bug in ld_version(), ld version 2.37.20111103 is wrongly
      calculated to be greater than 2.39 and the unsupported flag is added.
      
      To fix it, if version is of the form x.y.z and length(z) == 8, then most
      probably it is a date [yyyymmdd] commonly used for release snapshots and
      not an actual new version. Hence, ignore the date part replacing it with
      0.
      
      Fixes: 579aee9f ("powerpc: suppress some linker warnings in recent linker versions")
      Signed-off-by: default avatarOjaswin Mujoo <ojaswin@linux.ibm.com>
      [mpe: Tweak change log wording/formatting, add Fixes tag]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20230104202437.90039-1-ojaswin@linux.ibm.com
      3287ebd7
  6. 05 Jan, 2023 3 commits
    • Michael Ellerman's avatar
      powerpc/vmlinux.lds: Don't discard .comment · be5f95c8
      Michael Ellerman authored
      Although the powerpc linker script mentions .comment in the DISCARD
      section, that has never actually caused it to be discarded, because the
      earlier ELF_DETAILS macro (previously STABS_DEBUG) explicitly includes
      .comment.
      
      However commit 99cb0d91 ("arch: fix broken BuildID for arm64 and
      riscv") introduced an earlier use of DISCARD as part of the RO_DATA
      macro. With binutils < 2.36 that causes the DISCARD directives later in
      the script to be applied earlier, causing .comment to actually be
      discarded.
      
      It's confusing to explicitly include and discard .comment, and even more
      so if the behaviour depends on the toolchain version. So don't discard
      .comment in order to maintain the existing behaviour in all cases.
      
      Fixes: 83a092cf ("powerpc: Link warning for orphan sections")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20230105132349.384666-3-mpe@ellerman.id.au
      be5f95c8
    • Michael Ellerman's avatar
      powerpc/vmlinux.lds: Don't discard .rela* for relocatable builds · 07b050f9
      Michael Ellerman authored
      Relocatable kernels must not discard relocations, they need to be
      processed at runtime. As such they are included for CONFIG_RELOCATABLE
      builds in the powerpc linker script (line 340).
      
      However they are also unconditionally discarded later in the
      script (line 414). Previously that worked because the earlier inclusion
      superseded the discard.
      
      However commit 99cb0d91 ("arch: fix broken BuildID for arm64 and
      riscv") introduced an earlier use of DISCARD as part of the RO_DATA
      macro (line 137). With binutils < 2.36 that causes the DISCARD
      directives later in the script to be applied earlier, causing .rela* to
      actually be discarded at link time, leading to build warnings and a
      kernel that doesn't boot:
      
        ld: warning: discarding dynamic section .rela.init.rodata
      
      Fix it by conditionally discarding .rela* only when CONFIG_RELOCATABLE
      is disabled.
      
      Fixes: 99cb0d91 ("arch: fix broken BuildID for arm64 and riscv")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      
      Link: https://lore.kernel.org/r/20230105132349.384666-2-mpe@ellerman.id.au
      07b050f9
    • Michael Ellerman's avatar
      powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT · 4b9880db
      Michael Ellerman authored
      The powerpc linker script explicitly includes .exit.text, because
      otherwise the link fails due to references from __bug_table and
      __ex_table. The code is freed (discarded) at runtime along with
      .init.text and data.
      
      That has worked in the past despite powerpc not defining
      RUNTIME_DISCARD_EXIT because DISCARDS appears late in the powerpc linker
      script (line 410), and the explicit inclusion of .exit.text
      earlier (line 280) supersedes the discard.
      
      However commit 99cb0d91 ("arch: fix broken BuildID for arm64 and
      riscv") introduced an earlier use of DISCARD as part of the RO_DATA
      macro (line 136). With binutils < 2.36 that causes the DISCARD
      directives later in the script to be applied earlier [1], causing
      .exit.text to actually be discarded at link time, leading to build
      errors:
      
        '.exit.text' referenced in section '__bug_table' of crypto/algboss.o: defined in
        discarded section '.exit.text' of crypto/algboss.o
        '.exit.text' referenced in section '__ex_table' of drivers/nvdimm/core.o: defined in
        discarded section '.exit.text' of drivers/nvdimm/core.o
      
      Fix it by defining RUNTIME_DISCARD_EXIT, which causes the generic
      DISCARDS macro to not include .exit.text at all.
      
      1: https://lore.kernel.org/lkml/87fscp2v7k.fsf@igel.home/
      
      Fixes: 99cb0d91 ("arch: fix broken BuildID for arm64 and riscv")
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20230105132349.384666-1-mpe@ellerman.id.au
      4b9880db
  7. 01 Jan, 2023 6 commits
  8. 31 Dec, 2022 2 commits
  9. 30 Dec, 2022 15 commits