- 06 Apr, 2016 40 commits
-
-
Suganath prabu Subramani authored
BugLink: http://bugs.launchpad.net/bugs/1512221 Track msix of each IO and use the same msix for issuing abort to timed out IO. With this driver will process IO's reply first followed by TM. Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com> Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit 06071c8bae271ea2f8af3951c40ffad340b2cbbd) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Suganath prabu Subramani authored
BugLink: http://bugs.launchpad.net/bugs/1512221 Updated MPI version and MPI header files. Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com> Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit ff711f645e0de9adec7fff52f2a5f0a1959d5be6) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Suganath prabu Subramani authored
BugLink: http://bugs.launchpad.net/bugs/1512221 Added support for configurable Chain Frame Size. Calculate the Chain Message Frame size from the IOCMaxChainSegementSize (iocfacts). Applicable only for mpt3sas/SAS3.0 HBA's. Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com> Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit 84619ca86eb0797aad61f962dafa82788861c4e0) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Suganath Prabu Subramani authored
BugLink: http://bugs.launchpad.net/bugs/1512221 Module parameter to enable/disable configuring affinity hint for msix vector. SMP affinity feature can be enabled/disabled by setting module parameter "smp_affinity_enable" to 1/0. By default this feature is enabled. (smp_affinity_enable = 1 enabled). Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com> Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit be65e666abdd21865b3ea2713257a66e624eeaec) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Suganath prabu Subramani authored
BugLink: http://bugs.launchpad.net/bugs/1512221 Driver assumes HighPriority credit as part of Global credit. But, Firmware treats HighPriority credit value and global cedits as two different values. Changed host queue algorithm to treat global credits and highPriority credits as two different values. Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com> Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit 3ffa7c60b71fcd1070a68a74c380c05c4c161710) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Suganath prabu Subramani authored
BugLink: http://bugs.launchpad.net/bugs/1512221 Never block the SEP device (i.e. Never invoke the scsi_internal_device_block() API for SEP device) even for the delay not responding events. Blocking the SEP device will create a deadlock while adding any device to the OS. Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com> Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit b17d0b7ff0768070180e1021c0f32988445644d8) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Suganath prabu Subramani authored
BugLink: http://bugs.launchpad.net/bugs/1512221 1.Wrong size of argument is being passed The size of struct being passed as an argument to memset func and area of memory being pointed by an instance of struct in memset func should be of same structure type. 2.Dereference null return value 3.Array compared against '0' Check whether value pointed by particular index of an array is null or not in "if" statement. Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com> Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit dc2ed1660060a04ae9857047ace1169bddeb2ef6) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Suganath prabu Subramani authored
BugLink: http://bugs.launchpad.net/bugs/1512221 As driver was using MPI SGL while framing the SMP Passthrough request message due to which firmware unable to post the Reply Data in the host memory and timeout is observed for this SMP Passthrough request message and so unable to perform phy disable operation. Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com> Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit 415f1c3fe636022dd2d91fc152c0665adf588e44) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Suganath prabu Subramani authored
BugLink: http://bugs.launchpad.net/bugs/1512221 Updated hardware description headers with MPI v2.6 and mpt3sas_pci_table[] with vendor_ids, device_ids of Cutlass and Intruder HBA which have support for 4 ports. Signed-off-by: Suganath prabu Subramani <suganath-prabu.subramani@avagotech.com> Signed-off-by: Chaitra P B <chaitra.basappa@avagotech.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit 1abeff9c9ab2384a0da5019956d52944350301e6) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Tomas Henzl authored
BugLink: http://bugs.launchpad.net/bugs/1512221 It might happen that we try to free an already freed pointer. Reported-by: Maurizio Lombardi <mlombard@redhat.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Acked-by: Chaitra P B <chaitra.basappa@avagotech.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from commit 5f985d88) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Tyrel Datwyler authored
BugLink: http://bugs.launchpad.net/bugs/1547153 The values returned by the show functions for the host os_type, mad_version, and partition_number attributes get their values directly from the madapter_info struct whose associated fields are __be32 typed. Added endian conversion to ensure these values are sane on LE platforms. Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> (cherry picked from linux-next commit f9e3de0d6053ae89828188291a405c64d8d95a0e) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Tim Gardner authored
Ignore: yes Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Tim Gardner authored
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Lorenzo Pieralisi authored
BugLink: http://bugs.launchpad.net/bugs/1547047 The SBBR and ACPI specifications allow ACPI based systems that do not implement PSCI (eg systems with no EL3) to boot through the ACPI parking protocol specification[1]. This patch implements the ACPI parking protocol CPU operations, and adds code that eases parsing the parking protocol data structures to the ARM64 SMP initializion carried out at the same time as cpus enumeration. To wake-up the CPUs from the parked state, this patch implements a wakeup IPI for ARM64 (ie arch_send_wakeup_ipi_mask()) that mirrors the ARM one, so that a specific IPI is sent for wake-up purpose in order to distinguish it from other IPI sources. Given the current ACPI MADT parsing API, the patch implements a glue layer that helps passing MADT GICC data structure from SMP initialization code to the parking protocol implementation somewhat overriding the CPU operations interfaces. This to avoid creating a completely trasparent DT/ACPI CPU operations layer that would require creating opaque structure handling for CPUs data (DT represents CPU through DT nodes, ACPI through static MADT table entries), which seems overkill given that ACPI on ARM64 mandates only two booting protocols (PSCI and parking protocol), so there is no need for further protocol additions. Based on the original work by Mark Salter <msalter@redhat.com> [1] https://acpica.org/sites/acpica/files/MP%20Startup%20for%20ARM%20platforms.docxSigned-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Loc Ho <lho@apm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mark Salter <msalter@redhat.com> Cc: Al Stone <ahs3@redhat.com> [catalin.marinas@arm.com: Added WARN_ONCE(!acpi_parking_protocol_valid() on the IPI] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (backported from linux-next commit b518dc969cea61100ec7a7692716a0e82a189b2d) Signed-off-by: Craig Magina <craig.magina@canonical.com>
-
Craig Magina authored
BugLink: http://bugs.launchpad.net/bugs/1547047Signed-off-by: Craig Magina <craig.magina@canonical.com>
-
Josh Poimboeuf authored
Calling set_memory_rw() and set_memory_ro() for every iteration of the loop in klp_write_object_relocations() is messy, inefficient, and error-prone. Change all the read-only pages to read-write before the loop and convert them back to read-only again afterwards. Suggested-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> (cherry picked from commit b56b36ee) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Miroslav Benes authored
Currently, percpu symbols from .data..percpu ELF section of a module are not copied over and stored in final symtab array of struct module. Consequently such symbol cannot be returned via kallsyms API (for example kallsyms_lookup_name). This can be especially confusing when the percpu symbol is exported. Only its __ksymtab et al. are present in its symtab. The culprit is in layout_and_allocate() function where SHF_ALLOC flag is dropped for .data..percpu section. There is in fact no need to copy the section to final struct module, because kernel module loader allocates extra percpu section by itself. Unfortunately only symbols from SHF_ALLOC sections are copied due to a check in is_core_symbol(). The patch changes is_core_symbol() function to copy over also percpu symbols (their st_shndx points to .data..percpu ELF section). We do it only if CONFIG_KALLSYMS_ALL is set to be consistent with the rest of the function (ELF section is SHF_ALLOC but !SHF_EXECINSTR). Finally elf_type() returns type 'a' for a percpu symbol because its address is absolute. Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz> (cherry picked from commit e0224418) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Rusty Russell authored
Modules have three sections: text, rodata and writable data. The code handled the case where these overlapped, however they never can: debug_align() ensures they are always page-aligned. This is why we got away with manually traversing the pages in set_all_modules_text_rw() without rounding. We create three helper functions: frob_text(), frob_rodata() and frob_writable_data(). We then call these explicitly at every point, so it's clear what we're doing. We also expose module_enable_ro() and module_disable_ro() for livepatch to use. Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz> (cherry picked from commit 85c898db) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Rusty Russell authored
Makes it easier to handle init vs core cleanly, though the change is fairly invasive across random architectures. It simplifies the rbtree code immediately, however, while keeping the core data together in the same cachline (now iff the rbtree code is enabled). Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz> (cherry picked from commit 7523e4dc) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Rusty Russell authored
An exact mapping would be within_module_core(), but at this stage (MODULE_STATE_GOING) the init section is empty, and this is clearer. Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz> (cherry picked from commit c65abf35) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Josh Poimboeuf authored
When setting a module's RO and NX permissions, set_section_ro_nx() is used, but when clearing them, unset_module_{init,core}_ro_nx() are used. The unset functions don't have the same checks the set function has for partial page protections. It's probably harmless, but it's still confusingly asymmetrical. Instead, use the same logic to do both. Also add some new set_module_{init,core}_ro_nx() helper functions for more symmetry with the unset functions. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz> (cherry picked from commit 20ef10c1) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Chris J Arges authored
The following directory structure will allow for cases when the same function name exists in a single object. /sys/kernel/livepatch/<patch>/<object>/<function,sympos> The sympos number corresponds to the nth occurrence of the symbol name in kallsyms for the patched object. An example of patching multiple symbols can be found here: https://github.com/dynup/kpatch/issues/493Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> (cherry picked from commit 444f9e99) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Chris J Arges authored
In cases of duplicate symbols, sympos will be used to disambiguate instead of val. By default sympos will be 0, and patching will only succeed if the symbol is unique. Specifying a positive value will ensure that occurrence of the symbol in kallsyms for the patched object will be used for patching if it is valid. For external relocations sympos is not supported. Remove klp_verify_callback, klp_verify_args and klp_verify_vmlinux_symbol as they are no longer used. From the klp_reloc structure remove val, as it can be refactored as a local variable in klp_write_object_relocations. Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> (cherry picked from commit 064c89df) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Chris J Arges authored
Currently, patching objects with duplicate symbol names fail because the creation of the sysfs function directory collides with the previous attempt. Appending old_addr to the function name is problematic as it reveals the address of the function being patch to a normal user. Using the symbol's occurrence in kallsyms to postfix the function name in the sysfs directory solves the issue of having consistent unique names and ensuring that the address is not exposed to a normal user. In addition, using the symbol position as the user's method to disambiguate symbols instead of addr allows for disambiguating symbols in modules as well for both function addresses and for relocs. This also simplifies much of the code. Special handling for kASLR is no longer needed and can be removed. The klp_find_verify_func_addr function can be replaced by klp_find_object_symbol, and klp_verify_vmlinux_symbol and its callback can be removed completely. In cases of duplicate symbols, old_sympos will be used to disambiguate instead of old_addr. By default old_sympos will be 0, and patching will only succeed if the symbol is unique. Specifying a positive value will ensure that occurrence of the symbol in kallsyms for the patched object will be used for patching if it is valid. In addition, make old_addr an internal structure field not to be specified by the user. Finally, remove klp_find_verify_func_addr as it can be replaced by klp_find_object_symbol directly. Support for symbol position disambiguation for relocations is added in the next patch in this series. Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> (cherry picked from commit b2b018ef) Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Colin Ian King authored
smatch detected a suspicious looking bitop condition: drivers/net/ethernet/cavium/liquidio/lio_main.c:2529 handle_timestamp() warn: suspicious bitop condition (skb_shinfo(skb)->tx_flags | SKBTX_IN_PROGRESS is always non-zero, so the logic is definitely not correct. Use & to mask the correct bit. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from linux-next commit 19a6d156) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Amitoj Kaur Chawla authored
The return value of vmalloc on failure of allocation of memory should be -ENOMEM and not -1. Found using Coccinelle. A simplified version of the semantic patch used is: //<smpl> @@ expression *e; identifier l1; position p,q; @@ e@q = vmalloc(...); if@p (e == NULL) { ... goto l1; } l1: ... return -1 + -ENOMEM ; //</smpl The single call site of the containing function checks whether the returned value is -1, so this check is changed as well. The single call site of this call site, however, only checks whether the value is not 0, so no further change was required. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from linux-next commit 08a965ec) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Sunil Goutham authored
Allocate higher order pages when pagesize is small, this will reduce number of calls to page allocator and wastage of memory. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from linux-next commit 6e4be8d6) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Robert Richter authored
Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from linux-next commit 1d82efac) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
David Daney authored
In the case of OF device tree, the firmware information is attached to the BGX device structure in the standard manner, so use the firmware iterators and accessors where possible. Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from linux-next commit eee326fd) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Sunil Goutham authored
This affinity hint can be used by user space irqbalance tool to set preferred CPU mask for irqs registered by this VF. Irqbalance needs to be in 'exact' mode to set irq affinity same as indicated by affinity hint. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from linux-next commit fb4b7d98) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Sunil Goutham authored
napi_schedule is being called from hard irq context, hence switch to napi_schedule_irqoff which avoids unneeded call to local_irq_save and local_irq_restore. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from linux-next commit ef0a4d86) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Thanneeru Srinivasulu authored
When system is low on atomic memory, too many error messages are logged. Since this is not a total failure but a simple switch to non-atomic allocation better to have a stat. Also add a stat for reset, kicked due to transmit watchdog timeout. Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from linux-next commit a05d4845) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Will Deacon authored
As of 52e662326e1e ("arm64: prefetch: don't provide spin_lock_prefetch with LSE"), spin_lock_prefetch is patched at runtime when the LSE atomics are in use. This relies on the ARM64_LSE_ATOMIC_INSN macro to drive the alternatives framework, but that macro is only available via asm/lse.h, which isn't explicitly included in processor.h. Consequently, drivers can run into build failures such as: In file included from include/linux/prefetch.h:14:0, from drivers/net/ethernet/intel/i40e/i40e_txrx.c:27: arch/arm64/include/asm/processor.h: In function 'spin_lock_prefetch': arch/arm64/include/asm/processor.h:183:15: error: expected string literal before 'ARM64_LSE_ATOMIC_INSN' asm volatile(ARM64_LSE_ATOMIC_INSN( This patch add the missing include and gets things building again. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (cherry picked from linux-next commit afb83cc3) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Andrew Pinski authored
On ThunderX T88 pass 1 and pass 2, there is no hardware prefetching so we need to patch in explicit software prefetching instructions Prefetching improves this code by 60% over the original code and 2x over the code without prefetching for the affected hardware using the benchmark code at https://github.com/apinski-cavium/copy_page_benchmarkSigned-off-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (cherry picked from linux-next commit 60e0a09d) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Will Deacon authored
We want to avoid lots of different copy_page implementations, settling for something that is "good enough" everywhere and hopefully easy to understand and maintain whilst we're at it. This patch reworks our copy_page implementation based on discussions with Cavium on the list and benchmarking on Cortex-A processors so that: - The loop is unrolled to copy 128 bytes per iteration - The reads are offset so that we read from the next 128-byte block in the same iteration that we store the previous block - Explicit prefetch instructions are removed for now, since they hurt performance on CPUs with hardware prefetching - The loop exit condition is calculated at the start of the loop Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (cherry picked from linux-next commit 223e23e8) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Will Deacon authored
Most CPUs have a hardware prefetcher which generally performs better without explicit prefetch instructions issued by software, however some CPUs (e.g. Cavium ThunderX) rely solely on explicit prefetch instructions. This patch adds an alternative pattern (ARM64_HAS_NO_HW_PREFETCH) to allow our library code to make use of explicit prefetch instructions during things like copy routines only when the CPU does not have the capability to perform the prefetching itself. Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (cherry picked from linux-next commit d5370f75) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Will Deacon authored
The LSE atomics rely on us not dirtying data at L1 if we can avoid it, otherwise many of the potential scalability benefits are lost. This patch replaces spin_lock_prefetch with a nop when the LSE atomics are in use, so that users don't shoot themselves in the foot by causing needless coherence traffic at L1. Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (cherry picked from linux-next commit cd5e10bd) Signed-off-by: Dann Frazier <dann.frazier@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Tirumalesh Chalamarla authored
Setting TCR_EL2.PS to 40 bits is wrong on systems with less that less than 40 bits of physical addresses. and breaks KVM on systems where the RAM is above 40 bits. This patch uses ID_AA64MMFR0_EL1.PARange to set TCR_EL2.PS dynamically, just like we already do for VTCR_EL2.PS. [Marc: rewrote commit message, patch tidy up] Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit 3c5b1d92) [ dannf: backported to v4.4 ] Signed-off-by: dann frazier <dann.frazier@canonical.com>
-
Tirumalesh Chalamarla authored
The ARM GICv3 specification mentions the need for dsb after a read from the ICC_IAR1_EL1 register: 4.1.1 Physical CPU Interface: The effects of reading ICC_IAR0_EL1 and ICC_IAR1_EL1 on the state of a returned INTID are not guaranteed to be visible until after the execution of a DSB. Not having this could result in missed interrupts, so let's add the required barrier. [Marc: fixed commit message] Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit 1a1ebd5f) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-
Kefeng Wang authored
Convert the driver to use ns_to_timespec64() to keep consistency with timespec64_to_ns() instead of open coding the same logic. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 286af315) Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-