• Alexander van Heukelum's avatar
    x86: move entry_64.S register saving out of the macros · d99015b1
    Alexander van Heukelum authored
    Here is a combined patch that moves "save_args" out-of-line for
    the interrupt macro and moves "error_entry" mostly out-of-line
    for the zeroentry and errorentry macros.
    
    The save_args function becomes really straightforward and easy
    to understand, with the possible exception of the stack switch
    code, which now needs to copy the return address of to the
    calling function. Normal interrupts arrive with ((~vector)-0x80)
    on the stack, which gets adjusted in common_interrupt:
    
    <common_interrupt>:
    (5)  addq   $0xffffffffffffff80,(%rsp)		/* -> ~(vector) */
    (4)  sub    $0x50,%rsp				/* space for registers */
    (5)  callq  ffffffff80211290 <save_args>
    (5)  callq  ffffffff80214290 <do_IRQ>
    <ret_from_intr>:
         ...
    
    An apic interrupt stub now look like this:
    
    <thermal_interrupt>:
    (5)  pushq  $0xffffffffffffff05			/* ~(vector) */
    (4)  sub    $0x50,%rsp				/* space for registers */
    (5)  callq  ffffffff80211290 <save_args>
    (5)  callq  ffffffff80212b8f <smp_thermal_interrupt>
    (5)  jmpq   ffffffff80211f93 <ret_from_intr>
    
    Similarly the exception handler register saving function becomes
    simpler, without the need of any parameter shuffling. The stub
    for an exception without errorcode looks like this:
    
    <overflow>:
    (6)  callq  *0x1cad12(%rip)        # ffffffff803dd448 <pv_irq_ops+0x38>
    (2)  pushq  $0xffffffffffffffff			/* no syscall */
    (4)  sub    $0x78,%rsp				/* space for registers */
    (5)  callq  ffffffff8030e3b0 <error_entry>
    (3)  mov    %rsp,%rdi				/* pt_regs pointer */
    (2)  xor    %esi,%esi				/* no error code */
    (5)  callq  ffffffff80213446 <do_overflow>
    (5)  jmpq   ffffffff8030e460 <error_exit>
    
    And one for an exception with errorcode like this:
    
    <segment_not_present>:
    (6)  callq  *0x1cab92(%rip)        # ffffffff803dd448 <pv_irq_ops+0x38>
    (4)  sub    $0x78,%rsp				/* space for registers */
    (5)  callq  ffffffff8030e3b0 <error_entry>
    (3)  mov    %rsp,%rdi				/* pt_regs pointer */
    (5)  mov    0x78(%rsp),%rsi			/* load error code */
    (9)  movq   $0xffffffffffffffff,0x78(%rsp)	/* no syscall */
    (5)  callq  ffffffff80213209 <do_segment_not_present>
    (5)  jmpq   ffffffff8030e460 <error_exit>
    
    Unfortunately, this last type is more than 32 bytes. But the total space
    savings due to this patch is about 2500 bytes on an smp-configuration,
    and I think the code is clearer than it was before. The tested kernels
    were non-paravirt ones (i.e., without the indirect call at the top of
    the exception handlers).
    
    Anyhow, I tested this patch on top of a recent -tip. The machine
    was an 2x4-core Xeon at 2333MHz. Measured where the delays between
    (almost-)adjacent rdtsc instructions. The graphs show how much
    time is spent outside of the program as a function of the measured
    delay. The area under the graph represents the total time spent
    outside the program. Eight instances of the rdtsctest were
    started, each pinned to a single cpu. The histogams are added.
    For each kernel two measurements were done: one in mostly idle
    condition, the other while running "bonnie++ -f", bound to cpu 0.
    Each measurement took 40 minutes runtime. See the attached graphs
    for the results. The graphs overlap almost everywhere, but there
    are small differences.
    Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    d99015b1
entry_64.S 35.8 KB