• Andi Kleen's avatar
    [PATCH] Fast path context switch - microoptimize FPU reload · 916d2b26
    Andi Kleen authored
    Following some changes on x86-64.
    
    When cpu_has_fxsr is defined to 1 like in many kernels unlazy_fpu can
    collapse to three instructions. For that inlining is a very good idea.
    Otherwise it's 10 instructions or so, which can be still inlined.
    
    We don't need the lock prefix to test our local thread flags state.
    Unfortunately test_thread_flag currently always uses test_bit which
    has a LOCK on SMP, but that's unnecessary. LOCK is costly on P4,
    so it's a good idea to avoid it.
    
    Work around this for now by testing directly. Better would be
    probably to define __set_bit for all architectures to not guarantee
    atomicity and then always use that for local thread_info accesses
    in linux/thread_info.h
    916d2b26
i387.h 2.8 KB