- 24 Aug, 2017 1 commit
-
-
Cédric Le Goater authored
When called from xive_irq_startup(), the size of the cpumask can be larger than nr_cpu_ids. This can result in a WARN_ON such as: WARNING: CPU: 10 PID: 1 at ../arch/powerpc/sysdev/xive/common.c:476 xive_find_target_in_mask+0x110/0x2f0 ... NIP [c00000000008a310] xive_find_target_in_mask+0x110/0x2f0 LR [c00000000008a2e4] xive_find_target_in_mask+0xe4/0x2f0 Call Trace: xive_find_target_in_mask+0x74/0x2f0 (unreliable) xive_pick_irq_target.isra.1+0x200/0x230 xive_irq_startup+0x60/0x180 irq_startup+0x70/0xd0 __setup_irq+0x7bc/0x880 request_threaded_irq+0x14c/0x2c0 request_event_sources_irqs+0x100/0x180 __machine_initcall_pseries_init_ras_IRQ+0x104/0x134 do_one_initcall+0x68/0x1d0 kernel_init_freeable+0x290/0x374 kernel_init+0x24/0x170 ret_from_kernel_thread+0x5c/0x74 This happens because we're being called with our affinity mask set to irq_default_affinity. That in turn was populated using cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a mask which has > nr_cpu_ids bits set. Fix it by limiting the value returned by cpumask_weight(). Signed-off-by: Cédric Le Goater <clg@kaod.org> [mpe: Add change log details on actual cause] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 23 Aug, 2017 19 commits
-
-
Nicholas Piggin authored
On modern CPUs the CTRL register is read-only except bit 63 which is the run latch control. This means it can be updated with a mtspr rather than mfspr/mtspr. To accomodate older CPUs (Cell at least), where there are other bits in the register, we still do a read/modify/write on pre 2.06 CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Update change log to mention 2.06 workaround] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
HVI interrupts have always used 0x500, so remove the dead branch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
POWER9 host external interrupts use the h_virt_irq_common handler, so use that to replay them rather than using the hardware_interrupt_common handler. Both call do_IRQ, but using the correct handler reduces i-cache footprint. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
This results in smaller code, and fewer branches. This relies on the fact that both the 0xe80 and 0xa00 handlers call the same upper level code, namely doorbell_exception(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Mention we rely on the implementation of the 0xe80/0xa00 handlers] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Move the clearing of irq_happened bits into the condition where they were found to be set. This reduces instruction count slightly, and reduces stores into irq_happened. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Places in the kernel where r13 is not the PACA pointer must have maskable interrupts disabled, so r13 does not have to be restored when returning from a soft-masked interrupt. We should never have interrupts soft disabled when we're in user space. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
MSR_EE is always enabled in SRR1 for masked interrupts, so we can use xor to clear it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Nicholas Piggin authored
Interrupts which do not require EE to be cleared can all be tested with a single bitwise test. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
It's too big to be inline, there is no reason to keep it that way. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Rework to incorporate the comment changes via fixes branch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
Instead of comparing the whole CPU mask every time, let's keep a counter of how many bits are set in the mask. Thus testing for a local mm only requires testing if that counter is 1 and the current CPU bit is set in the mask. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
We open-code testing for the mm being local to the current CPU in a few places. Use our existing helper instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
It calls switch_mm() which already does the irq save/restore these days. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
Makes switch_mm_irqs_off() a bit more readable Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
In __replay_interrupt() we take the address of a local label so we can return to it later. However the assembler turns the local label into a symbol with a name like ".L1^B42" - where "^B" is literally "\002". This does not make for pleasant stack traces. Fix it by giving the label a sensible name. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Rob Herring authored
In preparation to stop storing the full node path in full_name, remove the dependency on full_name from dlpar_attach_node(). Callers of dlpar_attach_node() already have the parent device_node, so just pass the parent node into dlpar_attach_node instead of the path. This avoids doing a lookup of the parent node by the path. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Rob Herring authored
Now that we have a custom printf format specifier, convert users of full_name to use %pOF instead. This is preparation to remove storing of the full path string for each node. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anatolij Gustschin <agust@denx.de> Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Currently in the vio.c code we use a comparision against the parent device node's full path to decide if the device is a PFO or VIO family device. Both the ibm,platform-facilities and vdevice nodes are defined by PAPR, and must have a matching device_type. So instead of using the path we can instead compare the device_type. I've checked Qemu and kvmtool both do this correctly, and all the PowerVM systems I have access to do also. So it seems to be safe. This removes the dependency on full_name, which is being removed upstream. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
There's a non-trivial dependency between some commits we want to put in next and the KVM prefetch work around that went into fixes. So merge fixes into next.
-
- 21 Aug, 2017 1 commit
-
-
Arvind Yadav authored
of_device_ids are not supposed to change at runtime. All functions working with of_device_ids provided by <linux/of.h> work with const of_device_ids. So mark the non-const structs as const. File size before: text data bss dec hex filename 407 576 0 983 3d7 drivers/macintosh/rack-meter.o File size after constify rackmeter_match. text data bss dec hex filename 807 176 0 983 3d7 drivers/macintosh/rack-meter.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 18 Aug, 2017 1 commit
-
-
Benjamin Herrenschmidt authored
There is no guarantee that the various isync's involved with the context switch will order the update of the CPU mask with the first TLB entry for the new context being loaded by the HW. Be safe here and add a memory barrier to order any subsequent load/store which may bring entries into the TLB. The corresponding barrier on the other side already exists as pte updates use pte_xchg() which uses __cmpxchg_u64 which has a sync after the atomic operation. Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add comments in the code] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 17 Aug, 2017 11 commits
-
-
Aneesh Kumar K.V authored
We use mm cpumask for serializing against lockless page table walk. Anybody who is doing a lockless page table walk is expected to disable irq and only cpus in mm cpumask is expected do the lockless walk. This ensure that a THP split can send IPI to only cpus in the mm cpumask, to make sure there are no parallel lockless page table walk. Add the CAPI fault handling cpu to the mm cpumask so that we can do the lockless page table walk while inserting hash page table entries. Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Now that we made sure that lockless walk of linux page table is mostly limitted to current task(current->mm->pgdir) we can update the THP update sequence to only send IPI to CPUs on which this task has run. This helps in reducing the IPI overload on systems with large number of CPUs. WRT kvm even though kvm is walking page table with vpc->arch.pgdir, it is done only on secondary CPUs and in that case we have primary CPU added to task's mm cpumask. Sending an IPI to primary will force the secondary to do a vm exit and hence this mm cpumask usage is safe here. WRT CAPI, we still end up walking linux page table with capi context MM. For now the pte lookup serialization sends an IPI to all CPUs in CPI is in use. We can further improve this by adding the CAPI interrupt handling CPU to task mm cpumask. That will be done in a later patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Michael Ellerman authored
Bring in the commit to rename find_linux_pte_or_hugepte() which touches arch and KVM code, and might need to be merged with the kvmppc tree to avoid conflicts.
-
Aneesh Kumar K.V authored
Add newer helpers to make the function usage simpler. It is always recommended to use find_current_mm_pte() for walking the page table. If we cannot use find_current_mm_pte(), it should be documented why the said usage of __find_linux_pte() is safe against a parallel THP split. For now we have KVM code using __find_linux_pte(). This is because kvm code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but with PACA soft_enabled = 1. We may want to fix that later and make sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that we can switch kvm to use find_linux_pte(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
Use the newly introduced memset32() to pre-fill BPF page(s) with trap instructions. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Naveen N. Rao authored
Based on Matthew Wilcox's patches for other architectures. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Geoff Levand authored
Cc: Markus Elfring <elfring@users.sourceforge.net> Cc: Jim Paris <jim@jtan.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Markus Elfring authored
Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdfSigned-off-by: Markus Elfring <elfring@users.sourceforge.net> Cc: Jim Paris <jim@jtan.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Sam Bobroff authored
The tm-resched-dscr self test can, in some situations, run for several minutes before being successfully interrupted by the context switch it needs in order to perform the test. This often seems to occur when the test is being run in a virtual machine. Improve the test by running it under eat_cpu() to guarantee contention for the CPU and increase the chance of a context switch. In practice this seems to reduce the test time, in some cases, from more than two minutes to under a second. Also remove the "progress dots" so that if the test does run for a long time, it doesn't produce large amounts of unnecessary output. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Madhavan Srinivasan authored
nest_imc_refc is a reference count struct, used to track number of active perf sessions using the nest units. Currently the code accesses nest_imc_refc using node_id, which is incorrect, the array is indexed by node number. Meaning in the case of sparse node ids we index off the end of the array. Fix it to use get_nest_pmu_ref() which uses the existing per-cpu variable local_nest_imc_refc. Fixes: 885dcd70 ('powerpc/perf: Add nest IMC PMU support') Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Tweak change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Bhumika Goyal authored
Declare bin_attribute structures as const as they are only passed as an argument to the function sysfs_create_bin_file. This argument is of type const, so declare the structure as const. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
- 16 Aug, 2017 7 commits
-
-
Benjamin Herrenschmidt authored
__giveup_vsx/save_vsx are completely equivalent to testing MSR_FP and MSR_VEC and calling the corresponding giveup/save function so just remove the spurious VSX cases. Also add WARN_ONs checking that we never have VSX enabled without the two other. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
__giveup_fpu() already does it and we cannot have MSR_VSX set without having MSR_FP also set. This also adds a warning to check we indeed do Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
__giveup_vsx() already calls those two functions. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Dou Liyang authored
Commit a7be6e5a ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in POWERPC platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Benjamin Herrenschmidt authored
VSX uses a combination of the old vector registers, the old FP registers and new "second halves" of the FP registers. Thus when we need to see the VSX state in the thread struct (flush_vsx_to_thread()) or when we'll use the VSX in the kernel (enable_kernel_vsx()) we need to ensure they are all flushed into the thread struct if either of them is individually enabled. Unfortunately we only tested if the whole VSX was enabled, not if they were individually enabled. Fixes: 72cd7b44 ("powerpc: Uncomment and make enable_kernel_vsx() routine available") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-
Aneesh Kumar K.V authored
Now that we have GIGANTIC_PAGE enabled on powerpc, use this for 16G hugepages with hash translation mode. Depending on the total system memory we have, we may be able to allocate 16G hugepages runtime. This also remove the hugetlb setup difference between hash/radix translation mode. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-