• Michael Ellerman's avatar
    powerpc/64s/radix: Don't warn on copros in radix__tlb_flush() · 20045f01
    Michael Ellerman authored
    Sachin reported a warning when running the inject-ra-err selftest:
    
      # selftests: powerpc/mce: inject-ra-err
      Disabling lock debugging due to kernel taint
      MCE: CPU19: machine check (Severe)  Real address Load/Store (foreign/control memory) [Not recovered]
      MCE: CPU19: PID: 5254 Comm: inject-ra-err NIP: [0000000010000e48]
      MCE: CPU19: Initiator CPU
      MCE: CPU19: Unknown
      ------------[ cut here ]------------
      WARNING: CPU: 19 PID: 5254 at arch/powerpc/mm/book3s64/radix_tlb.c:1221 radix__tlb_flush+0x160/0x180
      CPU: 19 PID: 5254 Comm: inject-ra-err Kdump: loaded Tainted: G   M        E      6.6.0-rc3-00055-g9ed22ae6 #4
      Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
      ...
      NIP radix__tlb_flush+0x160/0x180
      LR  radix__tlb_flush+0x104/0x180
      Call Trace:
        radix__tlb_flush+0xf4/0x180 (unreliable)
        tlb_finish_mmu+0x15c/0x1e0
        exit_mmap+0x1a0/0x510
        __mmput+0x60/0x1e0
        exit_mm+0xdc/0x170
        do_exit+0x2bc/0x5a0
        do_group_exit+0x4c/0xc0
        sys_exit_group+0x28/0x30
        system_call_exception+0x138/0x330
        system_call_vectored_common+0x15c/0x2ec
    
    And bisected it to commit e43c0a0c ("powerpc/64s/radix: combine
    final TLB flush and lazy tlb mm shootdown IPIs"), which added a warning
    in radix__tlb_flush() if mm->context.copros is still elevated.
    
    However it's possible for the copros count to be elevated if a process
    exits without first closing file descriptors that are associated with a
    copro, eg. VAS.
    
    If the process exits with a VAS file still open, the release callback
    is queued up for exit_task_work() via:
      exit_files()
        put_files_struct()
          close_files()
            filp_close()
              fput()
    
    And called via:
      exit_task_work()
        ____fput()
          __fput()
            file->f_op->release(inode, file)
              coproc_release()
                vas_user_win_ops->close_win()
                  vas_deallocate_window()
                    mm_context_remove_vas_window()
                      mm_context_remove_copro()
    
    But that is after exit_mm() has been called from do_exit() and triggered
    the warning.
    
    Fix it by dropping the warning, and always calling __flush_all_mm().
    
    In the normal case of no copros, that will result in a call to
    _tlbiel_pid(mm->context.id, RIC_FLUSH_ALL) just as the current code
    does.
    
    If the copros count is elevated then it will cause a global flush, which
    should flush translations from any copros. Note that the process table
    entry was cleared in arch_exit_mmap(), so copros should not be able to
    fetch any new translations.
    
    Fixes: e43c0a0c ("powerpc/64s/radix: combine final TLB flush and lazy tlb mm shootdown IPIs")
    Reported-by: default avatarSachin Sant <sachinp@linux.ibm.com>
    Closes: https://lore.kernel.org/all/A8E52547-4BF1-47CE-8AEA-BC5A9D7E3567@linux.ibm.com/Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Tested-by: default avatarSachin Sant <sachinp@linux.ibm.com>
    Link: https://msgid.link/20231017121527.1574104-1-mpe@ellerman.id.au
    20045f01
radix_tlb.c 43.1 KB