-
Guilherme G. Piccoli authored
BugLink: https://bugs.launchpad.net/bugs/1797990 We observed a kdump failure in x86 that was narrowed down to MSI irq storm coming from a PCI network device. The bug manifests as a lack of progress in the boot process of kdump kernel, and a flood of kernel messages like: [...] [ 342.265294] do_IRQ: 0.155 No irq handler for vector [ 342.266916] do_IRQ: 0.155 No irq handler for vector [ 347.258422] do_IRQ: 14053260 callbacks suppressed [...] The root cause of the issue is that kexec process of the kdump kernel doesn't ensure PCI devices are reset or MSI capabilities are disabled, so a PCI adapter could produce a huge amount of irqs which would steal all the processing time for the CPU (specially since we usually restrict kdump kernel to use a single CPU only). This patch implements the kernel parameter "pci=clearmsi" to clear the MSI/MSI-X enable bits in the Message Control register for all PCI devices during early boot time, thus preventing potential issues in the kexec'ed kernel. PCI spec also supports/enforces this need (see PCI Local Bus spec sections 6.8.1.3 and 6.8.2.3). Suggested-by: Dan Streetman <ddstreet@canonical.com> Suggested-by: Gavin Shan <shan.gavin@linux.alibaba.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> [mfo: backport to ubuntu-xenial: - different path for Documentation/.../kernel-parameters.txt - update context lines in pci-direct.h and early-quirks.c] Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
e3ce78c4