• Nadav Amit's avatar
    x86/mm/tlb: Avoid reading mm_tlb_gen when possible · aa442849
    Nadav Amit authored
    
    
    On extreme TLB shootdown storms, the mm's tlb_gen cacheline is highly
    contended and reading it should (arguably) be avoided as much as
    possible.
    
    Currently, flush_tlb_func() reads the mm's tlb_gen unconditionally,
    even when it is not necessary (e.g., the mm was already switched).
    This is wasteful.
    
    Moreover, one of the existing optimizations is to read mm's tlb_gen to
    see if there are additional in-flight TLB invalidations and flush the
    entire TLB in such a case. However, if the request's tlb_gen was already
    flushed, the benefit of checking the mm's tlb_gen is likely to be offset
    by the overhead of the check itself.
    
    Running will-it-scale with tlb_flush1_threads show a considerable
    benefit on 56-core Skylake (up to +24%):
    
    threads		Baseline (v5.17+)	+Patch
    1		159960			160202
    5		310808			308378 (-0.7%)
    10		479110			490728
    15		526771			562528
    20		534495			587316
    25		547462			628296
    30		579616			666313
    35		594134			701814
    40		612288			732967
    45		617517			749727
    50		637476			735497
    55		614363			778913 (+24%)
    Signed-off-by: default avatarNadav Amit <namit@vmware.com>
    Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
    Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
    Link: https://lkml.kernel.org/r/20220606180123.2485171-1-namit@vmware.com
    aa442849
tlb.c 38.2 KB