• Wei Li's avatar
    arm64: fix kernel stack overflow in kdump capture kernel · e1d22385
    Wei Li authored
    When enabling ARM64_PSEUDO_NMI feature in kdump capture kernel, it will
    report a kernel stack overflow exception:
    
    [    0.000000] CPU features: detected: IRQ priority masking
    [    0.000000] alternatives: patching kernel code
    [    0.000000] Insufficient stack space to handle exception!
    [    0.000000] ESR: 0x96000044 -- DABT (current EL)
    [    0.000000] FAR: 0x0000000000000040
    [    0.000000] Task stack:     [0xffff0000097f0000..0xffff0000097f4000]
    [    0.000000] IRQ stack:      [0x0000000000000000..0x0000000000004000]
    [    0.000000] Overflow stack: [0xffff80002b7cf290..0xffff80002b7d0290]
    [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.34-lw+ #3
    [    0.000000] pstate: 400003c5 (nZcv DAIF -PAN -UAO)
    [    0.000000] pc : el1_sync+0x0/0xb8
    [    0.000000] lr : el1_irq+0xb8/0x140
    [    0.000000] sp : 0000000000000040
    [    0.000000] pmr_save: 00000070
    [    0.000000] x29: ffff0000097f3f60 x28: ffff000009806240
    [    0.000000] x27: 0000000080000000 x26: 0000000000004000
    [    0.000000] x25: 0000000000000000 x24: ffff000009329028
    [    0.000000] x23: 0000000040000005 x22: ffff000008095c6c
    [    0.000000] x21: ffff0000097f3f70 x20: 0000000000000070
    [    0.000000] x19: ffff0000097f3e30 x18: ffffffffffffffff
    [    0.000000] x17: 0000000000000000 x16: 0000000000000000
    [    0.000000] x15: ffff0000097f9708 x14: ffff000089a382ef
    [    0.000000] x13: ffff000009a382fd x12: ffff000009824000
    [    0.000000] x11: ffff0000097fb7b0 x10: ffff000008730028
    [    0.000000] x9 : ffff000009440018 x8 : 000000000000000d
    [    0.000000] x7 : 6b20676e69686374 x6 : 000000000000003b
    [    0.000000] x5 : 0000000000000000 x4 : ffff000008093600
    [    0.000000] x3 : 0000000400000008 x2 : 7db2e689fc2b8e00
    [    0.000000] x1 : 0000000000000000 x0 : ffff0000097f3e30
    [    0.000000] Kernel panic - not syncing: kernel stack overflow
    [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.34-lw+ #3
    [    0.000000] Call trace:
    [    0.000000]  dump_backtrace+0x0/0x1b8
    [    0.000000]  show_stack+0x24/0x30
    [    0.000000]  dump_stack+0xa8/0xcc
    [    0.000000]  panic+0x134/0x30c
    [    0.000000]  __stack_chk_fail+0x0/0x28
    [    0.000000]  handle_bad_stack+0xfc/0x108
    [    0.000000]  __bad_stack+0x90/0x94
    [    0.000000]  el1_sync+0x0/0xb8
    [    0.000000]  init_gic_priority_masking+0x4c/0x70
    [    0.000000]  smp_prepare_boot_cpu+0x60/0x68
    [    0.000000]  start_kernel+0x1e8/0x53c
    [    0.000000] ---[ end Kernel panic - not syncing: kernel stack overflow ]---
    
    The reason is init_gic_priority_masking() may unmask PSR.I while the
    irq stacks are not inited yet. Some "NMI" could be raised unfortunately
    and it will just go into this exception.
    
    In this patch, we just write the PMR in smp_prepare_boot_cpu(), and delay
    unmasking PSR.I after irq stacks inited in init_IRQ().
    
    Fixes: e7932188 ("arm64: Switch to PMR masking when starting CPUs")
    Cc: Will Deacon <will.deacon@arm.com>
    Reviewed-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
    Signed-off-by: default avatarWei Li <liwei391@huawei.com>
    [JT: make init_gic_priority_masking() not modify daif, rebase on other
         priority masking fixes]
    Signed-off-by: default avatarJulien Thierry <julien.thierry@arm.com>
    Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
    e1d22385
smp.c 23.8 KB