- 17 May, 2018 5 commits
-
-
Nicholas Piggin authored
A kernel crash in process context that calls emergency_restart from panic will end up calling opal_event_shutdown with interrupts disabled but not in interrupt. This causes a sleeping function to be called which gives the following warning with sysrq+c: Rebooting in 10 seconds.. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 in_atomic(): 0, irqs_disabled(): 1, pid: 7669, name: bash CPU: 20 PID: 7669 Comm: bash Tainted: G D W 4.17.0-rc5+ #3 Call Trace: dump_stack+0xb0/0xf4 (unreliable) ___might_sleep+0x174/0x1a0 mutex_lock+0x38/0xb0 __free_irq+0x68/0x460 free_irq+0x70/0xc0 opal_event_shutdown+0xb4/0xf0 opal_shutdown+0x24/0xa0 pnv_shutdown+0x28/0x40 machine_shutdown+0x44/0x60 machine_restart+0x28/0x80 emergency_restart+0x30/0x50 panic+0x2a0/0x328 oops_end+0x1ec/0x1f0 bad_page_fault+0xe8/0x154 handle_page_fault+0x34/0x38 --- interrupt: 300 at sysrq_handle_crash+0x44/0x60 LR = __handle_sysrq+0xfc/0x260 flag_spec.62335+0x12b844/0x1e8db4 (unreliable) __handle_sysrq+0xfc/0x260 write_sysrq_trigger+0xa8/0xb0 proc_reg_write+0xac/0x110 __vfs_write+0x6c/0x240 vfs_write+0xd0/0x240 ksys_write+0x6c/0x110 Fixes: 9f0fd049 ("powerpc/powernv: Add a virtual irqchip for opal events") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
arch/powerpc/Makefile activates -mmultiple on BE PPC32 configs in order to use multiple word instructions in functions entry/exit. The patch does the same for the asm parts, for consistency. On processors like the 8xx on which insn fetching is pretty slow, this speeds up registers save/restore. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: PPC32 is BE only, so drop the endian checks] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
Doing the test at exit of the function avoids an unnecessary test and branch inside longjmp(). Semantics are unchanged. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
This reverts commit 6ad966d7. That commit was pointless, because csum_add() sums two 32 bits values, so the sum is 0x1fffffffe at the maximum. And then when adding upper part (1) and lower part (0xfffffffe), the result is 0xffffffff which doesn't carry. Any lower value will not carry either. And behind the fact that this commit is useless, it also kills the whole purpose of having an arch specific inline csum_add() because the resulting code gets even worse than what is obtained with the generic implementation of csum_add() 0000000000000240 <.csum_add>: 240: 38 00 ff ff li r0,-1 244: 7c 84 1a 14 add r4,r4,r3 248: 78 00 00 20 clrldi r0,r0,32 24c: 78 89 00 22 rldicl r9,r4,32,32 250: 7c 80 00 38 and r0,r4,r0 254: 7c 09 02 14 add r0,r9,r0 258: 78 09 00 22 rldicl r9,r0,32,32 25c: 7c 00 4a 14 add r0,r0,r9 260: 78 03 00 20 clrldi r3,r0,32 264: 4e 80 00 20 blr In comparison, the generic implementation of csum_add() gives: 0000000000000290 <.csum_add>: 290: 7c 63 22 14 add r3,r3,r4 294: 7f 83 20 40 cmplw cr7,r3,r4 298: 7c 10 10 26 mfocrf r0,1 29c: 54 00 ef fe rlwinm r0,r0,29,31,31 2a0: 7c 60 1a 14 add r3,r0,r3 2a4: 78 63 00 20 clrldi r3,r3,32 2a8: 4e 80 00 20 blr And the reverted implementation for PPC64 gives: 0000000000000240 <.csum_add>: 240: 7c 84 1a 14 add r4,r4,r3 244: 78 80 00 22 rldicl r0,r4,32,32 248: 7c 80 22 14 add r4,r0,r4 24c: 78 83 00 20 clrldi r3,r4,32 250: 4e 80 00 20 blr Fixes: 6ad966d7 ("powerpc/64: Fix checksum folding in csum_add()") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Christophe Leroy authored
PMD_PAGE_SIZE() is nowhere used and _PMD_SIZE is only used by PMD_PAGE_SIZE(). This patch removes them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 15 May, 2018 14 commits
-
-
Jonathan Neuschäfer authored
The interrupt controller inside the Wii's Hollywood chip is connected to two masters, the "Broadway" PowerPC and the "Starlet" ARM926, each with their own interrupt status and mask registers. When booting the Wii with mini[1], interrupts from the SD card controller (IRQ 7) are handled by the ARM, because mini provides SD access over IPC. Linux however can't currently use or disable this IPC service, so both sides try to handle IRQ 7 without coordination. Let's instead make sure that all interrupts that are unmasked on the PPC side are masked on the ARM side; this will also make sure that Linux can properly talk to the SD card controller (and potentially other devices). If access to a device through IPC is desired in the future, interrupts from that device should not be handled by Linux directly. [1]: https://github.com/lewurm/miniSigned-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Jonathan Neuschäfer authored
On the Wii, there is a secondary IRQ controller (hlwd-pic), so flipper-pic's match operation should not be hardcoded to return 1. In fact, the default matching logic is sufficient, and we can completely omit flipper_pic_match. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Testing with a threaded version of mmap_bench which allocate 1G chunks and with large number of threads we find: without patch 32.72% mmap_bench [kernel.vmlinux] [k] do_raw_spin_lock | ---do_raw_spin_lock | --32.68%--0 | |--15.82%--pte_fragment_alloc | | | --15.79%--do_huge_pmd_anonymous_page | __handle_mm_fault | handle_mm_fault | __do_page_fault | handle_page_fault | test_mmap | test_mmap | start_thread | __clone | |--14.95%--do_huge_pmd_anonymous_page | __handle_mm_fault | handle_mm_fault | __do_page_fault | handle_page_fault | test_mmap | test_mmap | start_thread | __clone | with patch 12.89% mmap_bench [kernel.vmlinux] [k] do_raw_spin_lock | ---do_raw_spin_lock | --12.83%--0 | |--3.21%--pagevec_lru_move_fn | __lru_cache_add | | | --2.74%--do_huge_pmd_anonymous_page | __handle_mm_fault | handle_mm_fault | __do_page_fault | handle_page_fault | test_mmap | test_mmap | start_thread | __clone | |--3.11%--do_huge_pmd_anonymous_page | __handle_mm_fault | handle_mm_fault | __do_page_fault | handle_page_fault | test_mmap | test_mmap | start_thread | __clone ..... | --0.55%--pte_fragment_alloc | --0.55%--do_huge_pmd_anonymous_page __handle_mm_fault handle_mm_fault __do_page_fault handle_page_fault test_mmap test_mmap start_thread __clone Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Instead of encoding shift in the table address, use an enumerated index value. This allow us to do different things in the callback for pte and pmd. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
4K config use one full page at level 4 of the pagetable. Add support for single fragment allocation in pagetable fragment code and and use that for 4K config. This makes both 4k and 64k use the same code path. Later we will switch pmd to use the page table fragment code. This is done only for 64bit platforms which is using page table fragment support. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Now that we have removed 64K page size support, the RCU page table free can be much simpler for nohash. Make a copy of the the rcu callback to pgalloc.h header similar to nohash 32. We could possibly merge 32 and 64 bit there. But that is for a later patch We also move the book3s specific handler to pgtable_book3s64.c. This will be updated in a later patch to handle split pmd ptlock. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
We have in Kconfig config PPC_64K_PAGES bool "64k page size" depends on !PPC_FSL_BOOK3E && (44x || PPC_BOOK3S_64 || PPC_BOOK3E_64) select HAVE_ARCH_SOFT_DIRTY if PPC_BOOK3S_64 Only supported BOOK3E 64 bit platforms is FSL_BOOK3E. Remove the dead 64k page support code from 64bit nohash. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
We rename the alloc and get_from_cache to indicate they operate on pte fragments. In later patch we will add pmd fragment support. No functional change in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
In later patch we switch pmd_lock from mm->page_table_lock to split pmd ptlock. It avoid compilations issues, use pmd_lockptr helper. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Only code movement and avoid #ifdef. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
This brings in one commit that we may want to share with the kvm-ppc tree, to avoid merge conflicts and get wider testing.
-
Aneesh Kumar K.V authored
In the next set of patches, we will switch pmd allocator to use page fragments and the locking will be updated to split pmd ptlock. We want to avoid using fragments for partition-scoped table. Use slab cache similar to level 4 table Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 14 May, 2018 5 commits
-
-
Souptick Joarder authored
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. See commit 1c8f4220 ("mm: change return type to vm_fault_t"). We are fixing a minor bug, that the error from vm_insert_pfn() was being ignored and the effect of this is likely to be only felt in OOM situations. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Colin Ian King authored
Trivial fix to spelling mistake in debug messages of a structure field name Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
The exec_target binary could segfault calling _exit(2) because r13 is not set up properly (and libc looks at that when performing a syscall). Call SYS_exit using syscall(2) which doesn't seem to have this problem. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Alexey Kardashevskiy authored
At the moment we assume that IODA2 and newer PHBs can always do 4K/64K/16M IOMMU pages, however this is not the case for POWER9 and now skiboot advertises the supported sizes via the device so we use that instead of hard coding the mask. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Currently memtrace doesn't build if NUMA=n: In function ‘memtrace_alloc_node’: arch/powerpc/platforms/powernv/memtrace.c:134:6: error: the address of ‘contig_page_data’ will always evaluate as ‘true’ if (!NODE_DATA(nid) || !node_spanned_pages(nid)) ^ This is because for NUMA=n NODE_DATA(nid) points to an always allocated structure, contig_page_data. But even in the NUMA=y case memtrace_alloc_node() is only called for online nodes, and we should always have a NODE_DATA() allocated for an online node. So remove the (hopefully) overly paranoid check, which also means we can build when NUMA=n. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 11 May, 2018 5 commits
-
-
Michael Ellerman authored
In commit e6a6928c ("of/fdt: Convert FDT functions to use libfdt") (Apr 2014), the generic flat device tree code dropped support for flat device tree's older than version 0x10 (16). We still have code in our CPU scanning to cope with flat device tree versions earlier than 2, which can now never trigger, so drop it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Add a test of the relative branch patching logic in the alternate section feature fixup code. This tests that if we branch past the last instruction of the alternate section, the branch is not patched. That's because the assembler will have created a branch that already points to the first instruction after the patched section, which is correct and needs no further patching. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
We want this to remain the last test (because it's disabled by default), so give it a non-numbered name so we don't have to renumber it when adding new tests before it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
The code patching code has always been a bit confused about whether it's best to use void *, unsigned int *, char *, etc. to point to instructions. In fact in the feature fixups tests we use both unsigned int[] and u8[] in different places. Unfortunately the tests that use unsigned int[] calculate the size of the code blocks using subtraction of those unsigned int pointers, and then pass the result to memcmp(). This means we're only comparing 1/4 of the bytes we need to, because we need to multiply by sizeof(unsigned int) to get the number of *bytes*. The result is that the tests do all the patching and then only compare some of the resulting code, so patching bugs that only effect that last 3/4 of the code could slip through undetected. It turns out that hasn't been happening, although one test had a bad expected case (see previous commit). Fix it for now by multiplying the size by 4 in the affected functions. Fixes: 362e7701 ("powerpc: Add self-tests of the feature fixup code") Epic-brown-paper-bag-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
The expected case for this test was wrong, the source of the alternate code sequence is: FTR_SECTION_ELSE 2: or 2,2,2 PPC_LCMPI r3,1 beq 3f blt 2b b 3f b 1b ALT_FTR_SECTION_END(0, 1) 3: or 1,1,1 or 2,2,2 4: or 3,3,3 So when it's patched the '3' label should still be on the 'or 1,1,1', and the 4 label is irrelevant and can be removed. Fixes: 362e7701 ("powerpc: Add self-tests of the feature fixup code") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 10 May, 2018 11 commits
-
-
Michael Ellerman authored
If the systbl_chk.sh checks fail we print a message, but with no indication that it's an error. That makes it hard to find in build logs with eg. grep. So prefix any output with "Error:". Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Al Viro authored
it had always been pointless - compat_sys_select() sign-extends the first argument just fine on its own. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [mpe: Use COMPAT_SPU_NEW() to keep systbl_chk.sh happy] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Currently the select system call is wired up with the SYSX_SPU() macro. The SYSX_SPU() is not handled by systbl_chk.c, which means the syscall number for select is not checked. That hides the fact that the syscall number for select is actually __NR__newselect not __NR_select. In a following patch we'd like to drop ppc32_select() which means select will become a regular COMPAT_SYS_SPU() syscall. But COMPAT_SYS_SPU() can't deal with the fact that the syscall number is actually __NR__newselect. We also can't just redefine __NR_select because that's still used for the old select call. So add a new COMPAT_NEW_SPU() that does the same thing as COMPAT_SYS_SPU() except it encodes that we're using the new number. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [mpe: Update sys_ni.c for s/ppc_rtas/sys_rtas/] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [mpe: Fix sys_debug_setcontext() prototype to return long] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Al Viro authored
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Torsten Duwe authored
The "Power Architecture 64-Bit ELF V2 ABI" says in section 2.3.2.3: [...] There are several rules that must be adhered to in order to ensure reliable and consistent call chain backtracing: * Before a function calls any other function, it shall establish its own stack frame, whose size shall be a multiple of 16 bytes. – In instances where a function’s prologue creates a stack frame, the back-chain word of the stack frame shall be updated atomically with the value of the stack pointer (r1) when a back chain is implemented. (This must be supported as default by all ELF V2 ABI-compliant environments.) [...] – The function shall save the link register that contains its return address in the LR save doubleword of its caller’s stack frame before calling another function. To me this sounds like the equivalent of HAVE_RELIABLE_STACKTRACE. This patch may be unneccessarily limited to ppc64le, but OTOH the only user of this flag so far is livepatching, which is only implemented on PPCs with 64-LE, a.k.a. ELF ABI v2. Feel free to add other ppc variants, but so far only ppc64le got tested. This change also implements save_stack_trace_tsk_reliable() for ppc64le that checks for the above conditions, where possible. Signed-off-by: Torsten Duwe <duwe@suse.de> Signed-off-by: Nicolai Stange <nstange@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Provide timebase and timebase of last heartbeat in watchdog lockup messages. Also provide a stack trace of when a CPU becomes un-stuck, which can be useful -- it could be where irqs are re-enabled, so it may be the end of the critical section which is responsible for the latency which is useful information. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
The watchdog heartbeat timestamp is updated when the local heartbeat timer fires (or touch_nmi_watchdog() is called). This is an interesting data point, so don't overwrite it when the soft-NMI interrupt detects a hard lockup. That code came from a pre- merge version to prevent hard lockup messages flood, but that's taken care of with the stuck CPU logic now, so there is no reason to update the heartbeat timestamp here. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Cédric Le Goater authored
This is not the case for the moment, but future releases of pHyp might need to introduce some synchronisation routines under the hood which would make the XIVE hcalls longer to complete. As this was done for H_INT_RESET, let's wrap the other hcalls in a loop catching the H_LONG_BUSY_* codes. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-