• Peter Zijlstra's avatar
    x86/retpoline: Simplify retpolines · 11925185
    Peter Zijlstra authored
    Due to:
    
      c9c324dc ("objtool: Support stack layout changes in alternatives")
    
    it is now possible to simplify the retpolines.
    
    Currently our retpolines consist of 2 symbols:
    
     - __x86_indirect_thunk_\reg: the compiler target
     - __x86_retpoline_\reg:  the actual retpoline.
    
    Both are consecutive in code and aligned such that for any one register
    they both live in the same cacheline:
    
      0000000000000000 <__x86_indirect_thunk_rax>:
       0:   ff e0                   jmpq   *%rax
       2:   90                      nop
       3:   90                      nop
       4:   90                      nop
    
      0000000000000005 <__x86_retpoline_rax>:
       5:   e8 07 00 00 00          callq  11 <__x86_retpoline_rax+0xc>
       a:   f3 90                   pause
       c:   0f ae e8                lfence
       f:   eb f9                   jmp    a <__x86_retpoline_rax+0x5>
      11:   48 89 04 24             mov    %rax,(%rsp)
      15:   c3                      retq
      16:   66 2e 0f 1f 84 00 00 00 00 00   nopw   %cs:0x0(%rax,%rax,1)
    
    The thunk is an alternative_2, where one option is a JMP to the
    retpoline. This was done so that objtool didn't need to deal with
    alternatives with stack ops. But that problem has been solved, so now
    it is possible to fold the entire retpoline into the alternative to
    simplify and consolidate unused bytes:
    
      0000000000000000 <__x86_indirect_thunk_rax>:
       0:   ff e0                   jmpq   *%rax
       2:   90                      nop
       3:   90                      nop
       4:   90                      nop
       5:   90                      nop
       6:   90                      nop
       7:   90                      nop
       8:   90                      nop
       9:   90                      nop
       a:   90                      nop
       b:   90                      nop
       c:   90                      nop
       d:   90                      nop
       e:   90                      nop
       f:   90                      nop
      10:   90                      nop
      11:   66 66 2e 0f 1f 84 00 00 00 00 00        data16 nopw %cs:0x0(%rax,%rax,1)
      1c:   0f 1f 40 00             nopl   0x0(%rax)
    
    Notice that since the longest alternative sequence is now:
    
       0:   e8 07 00 00 00          callq  c <.altinstr_replacement+0xc>
       5:   f3 90                   pause
       7:   0f ae e8                lfence
       a:   eb f9                   jmp    5 <.altinstr_replacement+0x5>
       c:   48 89 04 24             mov    %rax,(%rsp)
      10:   c3                      retq
    
    17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW, if
    we can shrink the retpoline by 1 byte we can pack it more densely).
    
     [ bp: Massage commit message. ]
    Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
    Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Link: https://lkml.kernel.org/r/20210326151259.506071949@infradead.org
    11925185
nospec-branch.h 9.75 KB