• Nicholas Piggin's avatar
    powerpc/64/signal: Balance return predictor stack in signal trampoline · 0138ba57
    Nicholas Piggin authored
    Returning from an interrupt or syscall to a signal handler currently
    begins execution directly at the handler's entry point, with LR set to
    the address of the sigreturn trampoline. When the signal handler
    function returns, it runs the trampoline. It looks like this:
    
        # interrupt at user address xyz
        # kernel stuff... signal is raised
        rfid
        # void handler(int sig)
        addis 2,12,.TOC.-.LCF0@ha
        addi 2,2,.TOC.-.LCF0@l
        mflr 0
        std 0,16(1)
        stdu 1,-96(1)
        # handler stuff
        ld 0,16(1)
        mtlr 0
        blr
        # __kernel_sigtramp_rt64
        addi    r1,r1,__SIGNAL_FRAMESIZE
        li      r0,__NR_rt_sigreturn
        sc
        # kernel executes rt_sigreturn
        rfid
        # back to user address xyz
    
    Note the blr with no matching bl. This can corrupt the return
    predictor.
    
    Solve this by instead resuming execution at the signal trampoline
    which then calls the signal handler. qtrace-tools link_stack checker
    confirms the entire user/kernel/vdso cycle is balanced after this
    patch, whereas it's not upstream.
    
    Alan confirms the dwarf unwind info still looks good. gdb still
    recognises the signal frame and can step into parent frames if it
    break inside a signal handler.
    
    Performance is pretty noisy, not a very significant change on a POWER9
    here, but branch misses are consistently a lot lower on a
    microbenchmark:
    
     Performance counter stats for './signal':
    
           13,085.72 msec task-clock                #    1.000 CPUs utilized
      45,024,760,101      cycles                    #    3.441 GHz
      65,102,895,542      instructions              #    1.45  insn per cycle
      11,271,673,787      branches                  #  861.372 M/sec
          59,468,979      branch-misses             #    0.53% of all branches
    
           12,989.09 msec task-clock                #    1.000 CPUs utilized
      44,692,719,559      cycles                    #    3.441 GHz
      65,109,984,964      instructions              #    1.46  insn per cycle
      11,282,136,057      branches                  #  868.585 M/sec
          39,786,942      branch-misses             #    0.35% of all branches
    Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/20200511101952.1463138-1-npiggin@gmail.com
    0138ba57
ppc-opcode.h 21.1 KB