• bob picco's avatar
    sparc64: sun4v TLB error power off events · ac1addf5
    bob picco authored
    [ Upstream commit 4ccb9272 ]
    
    We've witnessed a few TLB events causing the machine to power off because
    of prom_halt. In one case it was some nfs related area during rmmod. Another
    was an mmapper of /dev/mem. A more recent one is an ITLB issue with
    a bad pagesize which could be a hardware bug. Bugs happen but we should
    attempt to not power off the machine and/or hang it when possible.
    
    This is a DTLB error from an mmapper of /dev/mem:
    [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
    SUN4V-DTLB: TPC<0xfffff80100903e6c>
    SUN4V-DTLB: O7[fffff801081979d0]
    SUN4V-DTLB: O7<0xfffff801081979d0>
    SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
    .
    
    This is recent mainline for ITLB:
    [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
    [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
    [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
    [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
    .
    
    Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
    prom_halt() and drop us to OF command prompt "ok". This isn't the case for
    LDOMs and the machine powers off.
    
    For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
    a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
    of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
    one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().
    
    The logic of this patch was partially inspired by David Miller's feedback.
    
    Power off of large sparc64 machines is painful. Plus die_if_kernel provides
    more context. A reset sequence isn't a brief period on large sparc64 but
    better than power-off/power-on sequence.
    
    Cc: sparclinux@vger.kernel.org
    Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    ac1addf5
traps_64.c 79.3 KB