An error occurred fetching the project authors.
- 15 May, 2020 1 commit
-
-
Mauro Carvalho Chehab authored
There is an special chapter inside the core-api book about some debug infrastructure like tracepoints and debug objects. It sounded to me that this is the best place to add a chapter explaining how to use a FireWire controller to do remote kernel debugging, as explained on this document. Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/9b489d36d08ad89d3ad5aefef1f52a0715b29716.1588345503.git.mchehab+huawei@kernel.orgSigned-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- 10 Apr, 2020 1 commit
-
-
Roman Gushchin authored
Commit 944d9fec ("hugetlb: add support for gigantic page allocation at runtime") has added the run-time allocation of gigantic pages. However it actually works only at early stages of the system loading, when the majority of memory is free. After some time the memory gets fragmented by non-movable pages, so the chances to find a contiguous 1GB block are getting close to zero. Even dropping caches manually doesn't help a lot. At large scale rebooting servers in order to allocate gigantic hugepages is quite expensive and complex. At the same time keeping some constant percentage of memory in reserved hugepages even if the workload isn't using it is a big waste: not all workloads can benefit from using 1 GB pages. The following solution can solve the problem: 1) On boot time a dedicated cma area* is reserved. The size is passed as a kernel argument. 2) Run-time allocations of gigantic hugepages are performed using the cma allocator and the dedicated cma area In this case gigantic hugepages can be allocated successfully with a high probability, however the memory isn't completely wasted if nobody is using 1GB hugepages: it can be used for pagecache, anon memory, THPs, etc. * On a multi-node machine a per-node cma area is allocated on each node. Following gigantic hugetlb allocation are using the first available numa node if the mask isn't specified by a user. Usage: 1) configure the kernel to allocate a cma area for hugetlb allocations: pass hugetlb_cma=10G as a kernel argument 2) allocate hugetlb pages as usual, e.g. echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages If the option isn't enabled or the allocation of the cma area failed, the current behavior of the system is preserved. x86 and arm-64 are covered by this patch, other architectures can be trivially added later. The patch contains clean-ups and fixes proposed and implemented by Aslan Bakirov and Randy Dunlap. It also contains ideas and suggestions proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks! Signed-off-by:
Roman Gushchin <guro@fb.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Tested-by:
Andreas Schaufler <andreas.schaufler@gmx.de> Acked-by:
Mike Kravetz <mike.kravetz@oracle.com> Acked-by:
Michal Hocko <mhocko@kernel.org> Cc: Aslan Bakirov <aslan@fb.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Joonsoo Kim <js1304@gmail.com> Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.comSigned-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 07 Apr, 2020 3 commits
-
-
Jimmy Assarsson authored
Fix remaining broken references in kernel-parameters.txt. Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by:
Jimmy Assarsson <jimmyassarsson@gmail.com> Link: https://lore.kernel.org/r/20200402172614.3020-2-jimmyassarsson@gmail.comSigned-off-by:
Jonathan Corbet <corbet@lwn.net>
-
Jimmy Assarsson authored
x86/mpx was removed in commit 45fc24e8 ("x86/mpx: remove MPX from arch/x86"), this removes the documentation of parameter nompx. Fixes: 45fc24e8 ("x86/mpx: remove MPX from arch/x86") Signed-off-by:
Jimmy Assarsson <jimmyassarsson@gmail.com> Acked-by:
Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/r/20200402172614.3020-1-jimmyassarsson@gmail.comSigned-off-by:
Jonathan Corbet <corbet@lwn.net>
-
Baoquan He authored
In commit 357b4da5 ("x86: respect memory size limiting via mem= parameter") a global varialbe max_mem_size is added to store the value parsed from 'mem= ', then checked when memory region is added. This truly stops those DIMMs from being added into system memory during boot-time. However, it also limits the later memory hotplug functionality. Any DIMM can't be hotplugged any more if its region is beyond the max_mem_size. We will get errors like: [ 216.387164] acpi PNP0C80:02: add_memory failed [ 216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error [ 216.392187] acpi PNP0C80:02: Enumeration failure This will cause issue in a known use case where 'mem=' is added to the hypervisor. The memory that lies after 'mem=' boundary will be assigned to KVM guests. After commit 357b4da5 merged, memory can't be extended dynamically if system memory on hypervisor is not sufficient. So fix it by also checking if it's during boot-time restricting to add memory. Otherwise, skip the restriction. And also add this use case to document of 'mem=' kernel parameter. Fixes: 357b4da5 ("x86: respect memory size limiting via mem= parameter") Signed-off-by:
Baoquan He <bhe@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Juergen Gross <jgross@suse.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Link: http://lkml.kernel.org/r/20200204050643.20925-1-bhe@redhat.comSigned-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 02 Apr, 2020 1 commit
-
-
Chen Yu authored
Debug messages from the system suspend/hibernation infrastructure are disabled by default, and can only be enabled after the system has boot up via /sys/power/pm_debug_messages. This makes the hibernation resume hard to track as it involves system boot up across hibernation. There's no chance for software_resume() to track the resume process, for example. Add a kernel command line option to set pm_debug_messages during boot up. Signed-off-by:
Chen Yu <yu.c.chen@intel.com> [ rjw: Subject & changelog ] Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 01 Apr, 2020 1 commit
-
-
Randy Dunlap authored
Update the Documentation for "acpi_backlight" by adding 2 new options (native and none). Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Reviewed-by:
Hans de Goede <hdegoede@redhat.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 24 Mar, 2020 1 commit
-
-
Alexey Makhalov authored
Set paravirt_steal_rq_enabled if steal clock present. paravirt_steal_rq_enabled is used in sched/core.c to adjust task progress by offsetting stolen time. Use 'no-steal-acc' off switch (share same name with KVM) to disable steal time accounting. Signed-off-by:
Alexey Makhalov <amakhalov@vmware.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200323195707.31242-5-amakhalov@vmware.com
-
- 14 Mar, 2020 1 commit
-
-
Alex Hung authored
BGRT is for displaying seamless OEM logo from booting to login screen; however, this mechanism does not always work well on all configurations and the OEM logo can be displayed multiple times. This looks worse than without BGRT enabled. This patch adds a kernel parameter to disable BGRT in boot time. This is easier than re-compiling a kernel with CONFIG_ACPI_BGRT disabled. Signed-off-by:
Alex Hung <alex.hung@canonical.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 10 Mar, 2020 1 commit
-
-
Guilherme G. Piccoli authored
Commit 9c44bc03 ("softlockup: allow panic on lockup") added the softlockup_panic sysctl, but didn't add information about it to the file Documentation/admin-guide/sysctl/kernel.rst (which in that time certainly wasn't rst and had other name!). This patch just adds the respective documentation and references it from the corresponding entry in Documentation/admin-guide/kernel-parameters.txt. This patch was strongly based on Scott Wood's commit d22881dc ("Documentation: Better document the hardlockup_panic sysctl"). Reviewed-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Guilherme G. Piccoli <gpiccoli@canonical.com> Link: https://lore.kernel.org/r/20200310183649.23163-1-gpiccoli@canonical.comSigned-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- 06 Mar, 2020 1 commit
-
-
Thara Gopinath authored
Thermal pressure follows pelt signals which means the decay period for thermal pressure is the default pelt decay period. Depending on SoC characteristics and thermal activity, it might be beneficial to decay thermal pressure slower, but still in-tune with the pelt signals. One way to achieve this is to provide a command line parameter to set a decay shift parameter to an integer between 0 and 10. Signed-off-by:
Thara Gopinath <thara.gopinath@linaro.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20200222005213.3873-10-thara.gopinath@linaro.org
-
- 04 Mar, 2020 2 commits
-
-
Saravana Kannan authored
With the addition of fw_devlink kernel commandline option, of_devlink is redundant and not useful anymore. So, delete it. Acked-by:
Rob Herring <robh@kernel.org> Signed-off-by:
Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20200222014038.180923-6-saravanak@google.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Saravana Kannan authored
fwnode_operations.add_links allows creating device links from information provided by firmware. fwnode_operations.add_links is currently implemented only by OF/devicetree code and a specific case of efi. However, there's nothing preventing ACPI or other firmware types from implementing it. The OF implementation is currently controlled by a kernel commandline parameter called of_devlink. Since this feature is generic isn't limited to OF, add a generic fw_devlink kernel commandline parameter to control this feature across firmware types. Signed-off-by:
Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20200222014038.180923-3-saravanak@google.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 02 Mar, 2020 2 commits
-
-
Niklas Söderlund authored
When converting and moving nfsroot.txt to nfsroot.rst the references to the old text file was not updated to match the change, fix this. Fixes: f9a93498 ("Documentation: nfsroot.txt: convert to ReST") Signed-off-by:
Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by:
Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20200212181332.520545-1-niklas.soderlund+renesas@ragnatech.seSigned-off-by:
Jonathan Corbet <corbet@lwn.net>
-
Jonathan Neuschäfer authored
drivers/tty/serial/imx.c implements these earlycon options. Signed-off-by:
Jonathan Neuschäfer <j.neuschaefer@gmx.net> Link: https://lore.kernel.org/r/20200229132750.2783-1-j.neuschaefer@gmx.netSigned-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- 27 Feb, 2020 1 commit
-
-
Vasily Gorbik authored
Add "prot_virt" command line option which controls if the kernel protected VMs support is enabled at early boot time. This has to be done early, because it needs large amounts of memory and will disable some features like STP time sync for the lpar. Extend ultravisor info definitions and expose it via uv_info struct filled in during startup. Signed-off-by:
Vasily Gorbik <gor@linux.ibm.com> Reviewed-by:
Thomas Huth <thuth@redhat.com> Acked-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Cornelia Huck <cohuck@redhat.com> Acked-by:
Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com>
-
- 25 Feb, 2020 1 commit
-
-
Alex Hung authored
"untrusted" was mis-spelled as "unstrusted" Signed-off-by:
Alex Hung <alex.hung@canonical.com> Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- 21 Feb, 2020 3 commits
-
-
Paul E. McKenney authored
In theory, RCU-hotplug operations are supposed to work as soon as there is more than one CPU online. However, in practice, in normal production there is no way to make them happen until userspace is up and running. Besides which, on smaller systems, rcutorture doesn't start doing hotplug operations until 30 seconds after the start of boot, which on most systems also means the better part of 30 seconds after the end of boot. This commit therefore provides a new torture.disable_onoff_at_boot kernel boot parameter that suppresses CPU-hotplug torture operations until about the time that init is spawned. Of course, if you know of a need for boottime CPU-hotplug operations, then you should avoid passing this argument to any of the torture tests. You might also want to look at the splats linked to below. Link: https://lore.kernel.org/lkml/20191206185208.GA25636@paulmck-ThinkPad-P72/Signed-off-by:
Paul E. McKenney <paulmck@kernel.org>
-
Paul E. McKenney authored
In normal production, an RCU CPU stall warning at boottime is often just as bad as at any other time. In fact, given the desire for fast boot, any sort of long-term stall at boot is a bad idea. However, heavy rcutorture testing on large hyperthreaded systems can generate boottime RCU CPU stalls as a matter of course. This commit therefore provides a kernel boot parameter that suppresses reporting of boottime RCU CPU stall warnings and similarly of rcutorture writer stalls. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org>
-
Paul E. McKenney authored
In default configutions, RCU currently waits at least 100 milliseconds before asking cond_resched() and/or resched_rcu() for help seeking quiescent states to end a grace period. But 100 milliseconds can be one good long time during an RCU callback flood, for example, as can happen when user processes repeatedly open and close files in a tight loop. These 100-millisecond gaps in successive grace periods during a callback flood can result in excessive numbers of callbacks piling up, unnecessarily increasing memory footprint. This commit therefore asks cond_resched() and/or resched_rcu() for help as early as the first FQS scan when at least one of the CPUs has more than 20,000 callbacks queued, a number that can be changed using the new rcutree.qovld kernel boot parameter. An auxiliary qovld_calc variable is used to avoid acquisition of locks that have not yet been initialized. Early tests indicate that this reduces the RCU-callback memory footprint during rcutorture floods by from 50% to 4x, depending on configuration. Reported-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Reported-by:
Tejun Heo <tj@kernel.org> [ paulmck: Fix bug located by Qian Cai. ] Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Tested-by:
Dexuan Cui <decui@microsoft.com> Tested-by:
Qian Cai <cai@lca.pw>
-
- 20 Feb, 2020 1 commit
-
-
Peter Zijlstra (Intel) authored
A split-lock occurs when an atomic instruction operates on data that spans two cache lines. In order to maintain atomicity the core takes a global bus lock. This is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). For real-time systems this may mean missing deadlines. For other systems it may just be very annoying. Some CPUs have the capability to raise an #AC trap when a split lock is attempted. Provide a command line option to give the user choices on how to handle this: split_lock_detect= off - not enabled (no traps for split locks) warn - warn once when an application does a split lock, but allow it to continue running. fatal - Send SIGBUS to applications that cause split lock On systems that support split lock detection the default is "warn". Note that if the kernel hits a split lock in any mode other than "off" it will OOPs. One implementation wrinkle is that the MSR to control the split lock detection is per-core, not per thread. This might result in some short lived races on HT systems in "warn" mode if Linux tries to enable on one thread while disabling on the other. Race analysis by Sean Christopherson: - Toggling of split-lock is only done in "warn" mode. Worst case scenario of a race is that a misbehaving task will generate multiple #AC exceptions on the same instruction. And this race will only occur if both siblings are running tasks that generate split-lock #ACs, e.g. a race where sibling threads are writing different values will only occur if CPUx is disabling split-lock after an #AC and CPUy is re-enabling split-lock after *its* previous task generated an #AC. - Transitioning between off/warn/fatal modes at runtime isn't supported and disabling is tracked per task, so hardware will always reach a steady state that matches the configured mode. I.e. split-lock is guaranteed to be enabled in hardware once all _TIF_SLD threads have been scheduled out. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by:
Fenghua Yu <fenghua.yu@intel.com> Signed-off-by:
Fenghua Yu <fenghua.yu@intel.com> Co-developed-by:
Tony Luck <tony.luck@intel.com> Signed-off-by:
Tony Luck <tony.luck@intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com
-
- 19 Feb, 2020 1 commit
-
-
Jonathan Neuschäfer authored
cmdlinepart.c has been moved to drivers/mtd/parsers/. Fixes: a3f12a35 ("mtd: parsers: Move CMDLINE parser") Signed-off-by:
Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- 10 Feb, 2020 2 commits
-
-
Stephen Smalley authored
Deprecate setting the SELinux checkreqprot tunable to 1 via kernel parameter or /sys/fs/selinux/checkreqprot. Setting it to 0 is left intact for compatibility since Android and some Linux distributions do so for security and treat an inability to set it as a fatal error. Eventually setting it to 0 will become a no-op and the kernel will stop using checkreqprot's value internally altogether. checkreqprot was originally introduced as a compatibility mechanism for legacy userspace and the READ_IMPLIES_EXEC personality flag. However, if set to 1, it weakens security by allowing mappings to be made executable without authorization by policy. The default value for the SECURITY_SELINUX_CHECKREQPROT_VALUE config option was changed from 1 to 0 in commit 2a35d196 ("selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default") and both Android and Linux distributions began explicitly setting /sys/fs/selinux/checkreqprot to 0 some time ago. Signed-off-by:
Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by:
Paul Moore <paul@paul-moore.com>
-
Jean Delvare authored
In case the WDAT interface is broken, give the user an option to ignore it to let a native driver bind to the watchdog device instead. Signed-off-by:
Jean Delvare <jdelvare@suse.de> Acked-by:
Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 05 Feb, 2020 1 commit
-
-
Steven Rostedt (VMware) authored
As the bootconfig is appended to the initrd it is not as easy to modify as the kernel command line. If there's some issue with the kernel, and the developer wants to boot a pristine kernel, it should not be needed to modify the initrd to remove the bootconfig for a single boot. As bootconfig is silently added (if the admin does not know where to look they may not know it's being loaded). It should be explicitly added to the kernel cmdline. The loading of the bootconfig is only done if "bootconfig" is on the kernel command line. This will let admins know that the kernel command line is extended. Note, after adding printk()s for when the size is too great or the checksum is wrong, exposed that the current method always looked for the boot config, and if this size and checksum matched, it would parse it (as if either is wrong a printk has been added to show this). It's better to only check this if the boot config is asked to be looked for. Link: https://lore.kernel.org/r/CAHk-=wjfjO+h6bQzrTf=YCZA53Y3EDyAs3Z4gEsT7icA3u_Psw@mail.gmail.comAcked-by:
Masami Hiramatsu <mhiramat@kernel.org> Suggested-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Steven Rostedt (VMware) <rostedt@goodmis.org>
-
- 31 Jan, 2020 1 commit
-
-
Mikhail Zaslonko authored
Add the new kernel command line parameter 'dfltcc=' to configure s390 zlib hardware support. Format: { on | off | def_only | inf_only | always } on: s390 zlib hardware support for compression on level 1 and decompression (default) off: No s390 zlib hardware support def_only: s390 zlib hardware support for deflate only (compression on level 1) inf_only: s390 zlib hardware support for inflate only (decompression) always: Same as 'on' but ignores the selected compression level always using hardware support (used for debugging) Link: http://lkml.kernel.org/r/20200103223334.20669-5-zaslonko@linux.ibm.comSigned-off-by:
Mikhail Zaslonko <zaslonko@linux.ibm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David Sterba <dsterba@suse.com> Cc: Eduard Shishkin <edward6@linux.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 24 Jan, 2020 2 commits
-
-
Joel Fernandes (Google) authored
Now that the kfree_rcu() special-casing has been removed from tree RCU, this commit removes kfree_call_rcu_nobatch() since it is no longer needed. Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org>
-
Joel Fernandes (Google) authored
This test runs kfree_rcu() in a loop to measure performance of the new kfree_rcu() batching functionality. The following table shows results when booting with arguments: rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 rcuperf.kfree_no_batch=X rcuperf.kfree_no_batch=X # Grace Periods Test Duration (s) X=1 (old behavior) 9133 11.5 X=0 (new behavior) 1732 12.5 On a 16 CPU system with the above boot parameters, we see that the total number of grace periods that elapse during the test drops from 9133 when not batching to 1732 when batching (a 5X improvement). The kfree_rcu() flood itself slows down a bit when batching, though, as shown. Note that the active memory consumption during the kfree_rcu() flood does increase to around 200-250MB due to the batching (from around 50MB without batching). However, this memory consumption is relatively constant. In other words, the system is able to keep up with the kfree_rcu() load. The memory consumption comes down considerably if KFREE_DRAIN_JIFFIES is increased from HZ/50 to HZ/80. A later patch will reduce memory consumption further by using multiple lists. Also, when running the test, please disable CONFIG_DEBUG_PREEMPT and CONFIG_PROVE_RCU for realistic comparisons with/without batching. Signed-off-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Paul E. McKenney <paulmck@kernel.org>
-
- 22 Jan, 2020 1 commit
-
-
Ming Lei authored
The affinity of managed interrupts is completely handled in the kernel and cannot be changed via the /proc/irq/* interfaces from user space. As the kernel tries to spread out interrupts evenly accross CPUs on x86 to prevent vector exhaustion, it can happen that a managed interrupt whose affinity mask contains both isolated and housekeeping CPUs is routed to an isolated CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts on the isolated CPU. Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding logic in the interrupt affinity selection code. The subparameter indicates to the interrupt affinity selection logic that it should try to avoid the above scenario. This isolation is best effort and only effective if the automatically assigned interrupt mask of a device queue contains isolated and housekeeping CPUs. If housekeeping CPUs are online then such interrupts are directed to the housekeeping CPU so that IO submitted on the housekeeping CPU cannot disturb the isolated CPU. If a queue's affinity mask contains only isolated CPUs then this parameter has no effect on the interrupt routing decision, though interrupts are only happening when tasks running on those isolated CPUs submit IO. IO submitted on housekeeping CPUs has no influence on those queues. If the affinity mask contains both housekeeping and isolated CPUs, but none of the contained housekeeping CPUs is online, then the interrupt is also routed to an isolated CPU. Interrupts are only delivered when one of the isolated CPUs in the affinity mask submits IO. If one of the contained housekeeping CPUs comes online, the CPU hotplug logic migrates the interrupt automatically back to the upcoming housekeeping CPU. Depending on the type of interrupt controller, this can require that at least one interrupt is delivered to the isolated CPU in order to complete the migration. [ tglx: Removed unused parameter, added and edited comments/documentation and rephrased the changelog so it contains more details. ] Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200120091625.17912-1-ming.lei@redhat.com
-
- 20 Jan, 2020 1 commit
-
-
Ard Biesheuvel authored
We carry a quirk in the x86 EFI code to switch back to an older method of mapping the EFI runtime services memory regions, because it was deemed risky at the time to implement a new method without providing a fallback to the old method in case problems arose. Such problems did arise, but they appear to be limited to SGI UV1 machines, and so these are the only ones for which the fallback gets enabled automatically (via a DMI quirk). The fallback can be enabled manually as well, by passing efi=old_map, but there is very little evidence that suggests that this is something that is being relied upon in the field. Given that UV1 support is not enabled by default by the distros (Ubuntu, Fedora), there is no point in carrying this fallback code all the time if there are no other users. So let's move it into the UV support code, and document that efi=old_map now requires this support code to be enabled. Note that efi=old_map has been used in the past on other SGI UV machines to work around kernel regressions in production, so we keep the option to enable it by hand, but only if the kernel was built with UV support. Signed-off-by:
Ard Biesheuvel <ardb@kernel.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20200113172245.27925-8-ardb@kernel.org
-
- 10 Jan, 2020 1 commit
-
-
Matthew Garrett authored
Add an option to disable the busmaster bit in the control register on all PCI bridges before calling ExitBootServices() and passing control to the runtime kernel. System firmware may configure the IOMMU to prevent malicious PCI devices from being able to attack the OS via DMA. However, since firmware can't guarantee that the OS is IOMMU-aware, it will tear down IOMMU configuration when ExitBootServices() is called. This leaves a window between where a hostile device could still cause damage before Linux configures the IOMMU again. If CONFIG_EFI_DISABLE_PCI_DMA is enabled or "efi=disable_early_pci_dma" is passed on the command line, the EFI stub will clear the busmaster bit on all PCI bridges before ExitBootServices() is called. This will prevent any malicious PCI devices from being able to perform DMA until the kernel reenables busmastering after configuring the IOMMU. This option may cause failures with some poorly behaved hardware and should not be enabled without testing. The kernel commandline options "efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma" may be used to override the default. Note that PCI devices downstream from PCI bridges are disconnected from their drivers first, using the UEFI driver model API, so that DMA can be disabled safely at the bridge level. [ardb: disconnect PCI I/O handles first, as suggested by Arvind] Co-developed-by:
Matthew Garrett <mjg59@google.com> Signed-off-by:
Matthew Garrett <mjg59@google.com> Signed-off-by:
Ard Biesheuvel <ardb@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Matthew Garrett <matthewgarrett@google.com> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20200103113953.9571-18-ardb@kernel.orgSigned-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 07 Jan, 2020 1 commit
-
-
Stephen Smalley authored
selinuxfs was originally mounted on /selinux, and various docs and kconfig help texts referred to nodes under it. In Linux 3.0, /sys/fs/selinux was introduced as the preferred mount point for selinuxfs. Fix all the old references to /selinux/ to /sys/fs/selinux/. While we are there, update the description of the selinux boot parameter to reflect the fact that the default value is always 1 since commit be6ec88f ("selinux: Remove SECURITY_SELINUX_BOOTPARAM_VALUE") and drop discussion of runtime disable since it is deprecated. Signed-off-by:
Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by:
Paul Moore <paul@paul-moore.com>
-
- 22 Nov, 2019 1 commit
-
-
Masami Hiramatsu authored
Remove bootmem_debug kernel paramenter because it has been replaced by memblock=debug. Signed-off-by:
Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/157443061745.20995.9432492850513217966.stgit@devnote2Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- 19 Nov, 2019 1 commit
-
-
Yunfeng Ye authored
The commit 0f27cff8 ("ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs") says: "Use a bitmap of size 0xFF instead of a u64 for the GPE mask so 256 GPEs can be masked" But the masking of GPE 0xFF it not supported and the check condition "gpe > ACPI_MASKABLE_GPE_MAX" is not valid because the type of gpe is u8. So modify the macro ACPI_MASKABLE_GPE_MAX to 0x100, and drop the "gpe > ACPI_MASKABLE_GPE_MAX" check. In addition, update the docs "Format" for acpi_mask_gpe parameter. Fixes: 0f27cff8 ("ACPI: sysfs: Make ACPI GPE mask kernel parameter cover all GPEs") Signed-off-by:
Yunfeng Ye <yeyunfeng@huawei.com> [ rjw: Use u16 as gpe data type in acpi_gpe_apply_masked_gpes() ] Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 18 Nov, 2019 1 commit
-
-
Oliver Neukum authored
Document which flags work storage, UAS or both Signed-off-by:
Oliver Neukum <oneukum@suse.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191114112758.32747-4-oneukum@suse.comSigned-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 16 Nov, 2019 1 commit
-
-
Waiman Long authored
For MDS vulnerable processors with TSX support, enabling either MDS or TAA mitigations will enable the use of VERW to flush internal processor buffers at the right code path. IOW, they are either both mitigated or both not. However, if the command line options are inconsistent, the vulnerabilites sysfs files may not report the mitigation status correctly. For example, with only the "mds=off" option: vulnerabilities/mds:Vulnerable; SMT vulnerable vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable The mds vulnerabilities file has wrong status in this case. Similarly, the taa vulnerability file will be wrong with mds mitigation on, but taa off. Change taa_select_mitigation() to sync up the two mitigation status and have them turned off if both "mds=off" and "tsx_async_abort=off" are present. Update documentation to emphasize the fact that both "mds=off" and "tsx_async_abort=off" have to be specified together for processors that are affected by both TAA and MDS to be effective. [ bp: Massage and add kernel-parameters.txt change too. ] Fixes: 1b42f017 ("x86/speculation/taa: Add mitigation for TSX Async Abort") Signed-off-by:
Waiman Long <longman@redhat.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: linux-doc@vger.kernel.org Cc: Mark Gross <mgross@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.com
-
- 12 Nov, 2019 1 commit
-
-
Jonathan Corbet authored
This reverts commit 7f70ae56. Christoph H. notes that the information is redundant, and Paul W. agrees with reverting. Signed-off-by:
Jonathan Corbet <corbet@lwn.net>
-
- 07 Nov, 2019 2 commits
-
-
Dan Williams authored
Given that EFI_MEMORY_SP is platform BIOS policy decision for marking memory ranges as "reserved for a specific purpose" there will inevitably be scenarios where the BIOS omits the attribute in situations where it is desired. Unlike other attributes if the OS wants to reserve this memory from the kernel the reservation needs to happen early in init. So early, in fact, that it needs to happen before e820__memblock_setup() which is a pre-requisite for efi_fake_memmap() that wants to allocate memory for the updated table. Introduce an x86 specific efi_fake_memmap_early() that can search for attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table accordingly. The KASLR code that scans the command line looking for user-directed memory reservations also needs to be updated to consider "efi_fake_mem=nn@ss:0x40000" requests. Acked-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
Dan Williams authored
UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the interpretation of the EFI Memory Types as "reserved for a specific purpose". The proposed Linux behavior for specific purpose memory is that it is reserved for direct-access (device-dax) by default and not available for any kernel usage, not even as an OOM fallback. Later, through udev scripts or another init mechanism, these device-dax claimed ranges can be reconfigured and hot-added to the available System-RAM with a unique node identifier. This device-dax management scheme implements "soft" in the "soft reserved" designation by allowing some or all of the reservation to be recovered as typical memory. This policy can be disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with efi=nosoftreserve. As for this patch, define the common helpers to determine if the EFI_MEMORY_SP attribute should be honored. The determination needs to be made early to prevent the kernel from being loaded into soft-reserved memory, or otherwise allowing early allocations to land there. Follow-on changes are needed per architecture to leverage these helpers in their respective mem-init paths. Reviewed-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Dan Williams <dan.j.williams@intel.com> Acked-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 04 Nov, 2019 1 commit
-
-
Junaid Shahid authored
The page table pages corresponding to broken down large pages are zapped in FIFO order, so that the large page can potentially be recovered, if it is not longer being used for execution. This removes the performance penalty for walking deeper EPT page tables. By default, one large page will last about one hour once the guest reaches a steady state. Signed-off-by:
Junaid Shahid <junaids@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-