Commits · 8371696a975a52eb055dcf36ac1e562bfda493cc · Kirill Smelkov / linux

22 Apr, 2024 3 commits

irqchip/mxs: Declare icoll_handle_irq() as static · 8371696a

Stefan Wahren authored Apr 12, 2024

After commit 5bb578a0 ("ARM: 9298/1: Drop custom mdesc->handle_irq()")
the function icoll_handle_irq() is only used within irq-mxs.c.  So declare
it as static to fix the warning about a missing prototype when building
with W=1.
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

8371696a

irqchip/loongson-pch-pic: Update interrupt registration policy · 234a557e

Baoqi Zhang authored Apr 22, 2024

The current code is using a fixed mapping between the LS7A interrupt source
and the HT interrupt vector. This prevents the utilization of the full
interrupt vector space and therefore limits the number of interrupt source
in a system.

Replace the fixed mapping with a dynamic mapping which allocates a
vector when an interrupt source is set up. This avoids that unused
sources prevent vectors from being used for other devices.

Introduce a mapping table in struct pch_pic, where each interrupt source
will allocate an index as a 'hwirq' number from the table in the order of
application and set table value as interrupt source number. This hwirq
number will be configured as vector in the HT interrupt controller. For an
interrupt source, the validity period of the obtained hwirq will last until
the system reset.
Co-developed-by: Biao Dong <dongbiao@loongson.cn>
Signed-off-by: Biao Dong <dongbiao@loongson.cn>
Co-developed-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn>
Signed-off-by: Baoqi Zhang <zhangbaoqi@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240422093830.27212-1-zhangtianyang@loongson.cn

234a557e

genirq: Simplify the checks for irq_set_percpu_devid_partition() · bb58c1ba

Jinjie Ruan authored Apr 17, 2024

Since whether desc is NULL or desc->percpu_enabled is true, it returns
-EINVAL, check them together, and assign desc->percpu_affinity using a
ternary to simplify the code.
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240417085356.3785381-1-ruanjinjie@huawei.com

bb58c1ba

14 Apr, 2024 1 commit

irqchip/riscv-imsic: Fix boot time update effective affinity warning · 35d77eb7

Anup Patel authored Apr 13, 2024

Currently, the following warning is observed on the QEMU virt machine:
genirq: irq_chip APLIC-MSI-d000000.aplic did not update eff. affinity mask of irq 12

The above warning is because the IMSIC driver does not set the initial
value of effective affinity in the interrupt descriptor. To address this,
initialize the effective affinity in imsic_irq_domain_alloc().

Fixes: 027e125a ("irqchip/riscv-imsic: Add device MSI domain support for platform devices")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240413065210.315896-1-apatel@ventanamicro.com

35d77eb7

12 Apr, 2024 5 commits

watchdog/softlockup: Report the most frequent interrupts · e9a9292e

Bitao Hu authored Apr 11, 2024

When the watchdog determines that the current soft lockup is due to an
interrupt storm based on CPU utilization, reporting the most frequent
interrupts could be good enough for further troubleshooting.

Below is an example of interrupt storm. The call tree does not provide
useful information, but analyzing which interrupt caused the soft lockup by
comparing the counts of interrupts during the lockup period allows to
identify the culprit.

[  638.870231] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [swapper/9:0]
[  638.870825] CPU#9 Utilization every 4s during lockup:
[  638.871194]  #1:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.871652]  #2:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.872107]  #3:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.872563]  #4:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.873018]  #5:   0% system,          0% softirq,   100% hardirq,     0% idle
[  638.873494] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs:
[  638.873994]  #1: 330945      irq#7
[  638.874236]  #2: 31          irq#82
[  638.874493]  #3: 10          irq#10
[  638.874744]  #4: 2           irq#89
[  638.874992]  #5: 1           irq#102
...
[  638.875313] Call trace:
[  638.875315]  __do_softirq+0xa8/0x364
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240411074134.30922-6-yaoma@linux.alibaba.com

e9a9292e

watchdog/softlockup: Low-overhead detection of interrupt storm · d7037381

Bitao Hu authored Apr 11, 2024

The following softlockup is caused by interrupt storm, but it cannot be
identified from the call tree. Because the call tree is just a snapshot
and doesn't fully capture the behavior of the CPU during the soft lockup.
  watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
  ...
  Call trace:
    __do_softirq+0xa0/0x37c
    __irq_exit_rcu+0x108/0x140
    irq_exit+0x14/0x20
    __handle_domain_irq+0x84/0xe0
    gic_handle_irq+0x80/0x108
    el0_irq_naked+0x50/0x58

Therefore, it is necessary to report CPU utilization during the
softlockup_threshold period (report once every sample_period, for a total
of 5 reportings), like this:
  watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
  CPU#28 Utilization every 4s during lockup:
    #1: 0% system, 0% softirq, 100% hardirq, 0% idle
    #2: 0% system, 0% softirq, 100% hardirq, 0% idle
    #3: 0% system, 0% softirq, 100% hardirq, 0% idle
    #4: 0% system, 0% softirq, 100% hardirq, 0% idle
    #5: 0% system, 0% softirq, 100% hardirq, 0% idle
  ...

This is helpful in determining whether an interrupt storm has occurred or
in identifying the cause of the softlockup. The criteria for determination
are as follows:

  a. If the hardirq utilization is high, then interrupt storm should be
     considered and the root cause cannot be determined from the call tree.
  b. If the softirq utilization is high, then the call might not necessarily
     point at the root cause.
  c. If the system utilization is high, then analyzing the root
     cause from the call tree is possible in most cases.

The mechanism requires a considerable amount of global storage space
when configured for the maximum number of CPUs. Therefore, adding a
SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes"
if the max number of CPUs is <= 128.
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240411074134.30922-5-yaoma@linux.alibaba.com

d7037381

genirq: Avoid summation loops for /proc/interrupts · 25a4a015

Bitao Hu authored Apr 11, 2024

show_interrupts() unconditionally accumulates the per CPU interrupt
statistics to determine whether an interrupt was ever raised.

This can be avoided for all interrupts which are not strictly per CPU
and not of type NMI because those interrupts provide already an
accumulated counter. The required logic is already implemented in
kstat_irqs().

Split the inner access logic out of kstat_irqs() and use it for
kstat_irqs() and show_interrupts() to avoid the accumulation loop
when possible.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240411074134.30922-4-yaoma@linux.alibaba.com

25a4a015

genirq: Provide a snapshot mechanism for interrupt statistics · 99cf63c5

Bitao Hu authored Apr 11, 2024

The soft lockup detector lacks a mechanism to identify interrupt storms as
root cause of a lockup. To enable this the detector needs a mechanism to
snapshot the interrupt count statistics on a CPU when the detector observes
a potential lockup scenario and compare that against the interrupt count
when it warns about the lockup later on. The number of interrupts in that
period give a hint whether the lockup might have been caused by an interrupt
storm.

Instead of having extra storage in the lockup detector and accessing the
internals of the interrupt descriptor directly, add a snapshot member to
the per CPU irq_desc::kstat_irq structure and provide interfaces to take a
snapshot of all interrupts on the current CPU and to retrieve the delta of
a specific interrupt later on.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240411074134.30922-3-yaoma@linux.alibaba.com

99cf63c5

genirq: Convert kstat_irqs to a struct · 86d2a2f5

Bitao Hu authored Apr 11, 2024

The irq_desc::kstat_irqs member is a per-CPU variable of type int, which is
only capable of counting. A snapshot mechanism for interrupt statistics
will be added soon, which requires an additional variable to store the
snapshot.

To facilitate expansion, convert kstat_irqs here to a struct containing
only the count.
Originally-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240411074134.30922-2-yaoma@linux.alibaba.com

86d2a2f5

11 Apr, 2024 2 commits

genirq: Update MAINTAINERS to include interrupt related header files · 81e4cb0f

Andy Shevchenko authored Apr 05, 2024

Interrupt related header files seems orphaned, add them to the respective
subsystem records.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240405185726.3931703-2-andriy.shevchenko@linux.intel.com

81e4cb0f

genirq: Fix trivial typo in the comment CPY ==> COPY · 63752ad1

Andy Shevchenko authored Apr 05, 2024

IRQ_SET_MASK_NOCOPY is defined with 'O' letter. Fix the comment.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240405185726.3931703-3-andriy.shevchenko@linux.intel.com

63752ad1

09 Apr, 2024 5 commits

irqchip/loongson: Select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP for IRQ_LOONGARCH_CPU · 42a7d887

Tiezhu Yang authored Mar 26, 2024

An interrupt's effective affinity can only be different from its configured
affinity if there are multiple CPUs. Make it clear that this option is only
meaningful when SMP is enabled. Otherwise, there exists "WARNING: unmet
direct dependencies detected for GENERIC_IRQ_EFFECTIVE_AFF_MASK" when make
menuconfig if CONFIG_SMP is not set on LoongArch.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240326121130.16622-3-yangtiezhu@loongson.cn

42a7d887

irqchip/loongson-eiointc: Set CPU affinity only on SMP machines for LoongArch · a64003da

Tiezhu Yang authored Mar 26, 2024

According to the code comment of "struct irq_chip", the member
"irq_set_affinity" is to set the CPU affinity on SMP machines, so define
and call eiointc_set_irq_affinity() only under CONFIG_SMP.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240326121130.16622-4-yangtiezhu@loongson.cn

a64003da

irqchip/loongson-pch-msi: Fix off-by-one on allocation error path · b3277087

Zenghui Yu authored Mar 27, 2024

When pch_msi_parent_domain_alloc() returns an error, there is an off-by-one
in the number of interrupts to be freed.

Fix it by passing the number of successfully allocated interrupts, instead of the
relative index of the last allocated one.

Fixes: 632dcc2c ("irqchip: Add Loongson PCH MSI controller")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Link: https://lore.kernel.org/r/20240327142334.1098-1-yuzenghui@huawei.com

b3277087

irqchip/alpine-msi: Fix off-by-one in allocation error path · ff3669a7

Zenghui Yu authored Mar 27, 2024

When alpine_msix_gic_domain_alloc() fails, there is an off-by-one in the
number of interrupts to be freed.

Fix it by passing the number of successfully allocated interrupts, instead
of the relative index of the last allocated one.

Fixes: 3841245e ("irqchip/alpine-msi: Fix freeing of interrupts on allocation error path")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240327142305.1048-1-yuzenghui@huawei.com

ff3669a7

irqchip/riscv-aplic: Fix spelling mistake "forwared" -> "forwarded" · 14ced475

Colin Ian King authored Mar 27, 2024

There is a spelling mistake in a dev_info message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240327110516.283738-1-colin.i.king@gmail.com

14ced475

08 Apr, 2024 1 commit

irqdomain: Check virq for 0 before use in irq_dispose_mapping() · a2ea3cd7

Andy Shevchenko authored Apr 05, 2024

It's a bit hard to read the logic since the virq is used before checking it
for 0. Rearrange the code to make it better to understand.

This, in particular, should clearly answer the question whether the caller
needs to perform this check or not, and there are plenty of places for both
variants, confirming a confusion.

Fun fact that the new code is shorter:

  Function                                     old     new   delta
  irq_dispose_mapping                          278     271      -7
  Total: Before=11625, After=11618, chg -0.06%

when compiled by GCC on Debian for x86_64.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240405190105.3932034-1-andriy.shevchenko@linux.intel.com

a2ea3cd7

25 Mar, 2024 11 commits

irqchip: Remove redundant irq_chip::name initialization · 7b6f0f27

Keguang Zhang authored Mar 11, 2024

Since commit 021a8ca2 ("genirq/generic-chip: Fix the irq_chip name for
/proc/interrupts"), the chip name of all chip types are set to the same
name by irq_init_generic_chip() now. So the initialization to the same
irq_chip name are no longer needed. Drop them.
Signed-off-by: Keguang Zhang <keguang.zhang@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20240311115344.72567-1-keguang.zhang@gmail.com