• Denys Vlasenko's avatar
    x86/asm/bitops: Force inlining of test_and_set_bit and friends · 8dd5032d
    Denys Vlasenko authored
    Sometimes GCC mysteriously doesn't inline very small functions
    we expect to be inlined, see:
    
      https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
    
    Arguably, GCC should do better, but GCC people aren't willing
    to invest time into it and are asking to use __always_inline
    instead.
    
    With this .config:
    
      http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os
    
    here's an example of functions getting deinlined many times:
    
      test_and_set_bit (166 copies, ~1260 calls)
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             f0 48 0f ab 3e          lock bts %rdi,(%rsi)
             72 04                   jb     <test_and_set_bit+0xf>
             31 c0                   xor    %eax,%eax
             eb 05                   jmp    <test_and_set_bit+0x14>
             b8 01 00 00 00          mov    $0x1,%eax
             5d                      pop    %rbp
             c3                      retq
    
      test_and_clear_bit (124 copies, ~1000 calls)
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             f0 48 0f b3 3e          lock btr %rdi,(%rsi)
             72 04                   jb     <test_and_clear_bit+0xf>
             31 c0                   xor    %eax,%eax
             eb 05                   jmp    <test_and_clear_bit+0x14>
             b8 01 00 00 00          mov    $0x1,%eax
             5d                      pop    %rbp
             c3                      retq
    
      change_bit (3 copies, 8 calls)
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             f0 48 0f bb 3e          lock btc %rdi,(%rsi)
             5d                      pop    %rbp
             c3                      retq
    
      clear_bit_unlock (2 copies, 11 calls)
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             f0 48 0f b3 3e          lock btr %rdi,(%rsi)
             5d                      pop    %rbp
             c3                      retq
    
    This patch works it around via s/inline/__always_inline/.
    
    Code size decrease by ~13.5k after the patch:
    
          text     data      bss       dec    filename
      92110727 20826144 36417536 149354407    vmlinux.before
      92097234 20826176 36417536 149340946    vmlinux.after
    Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Thomas Graf <tgraf@suug.ch>
    Link: http://lkml.kernel.org/r/1454881887-1367-1-git-send-email-dvlasenk@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    8dd5032d
bitops.h 13.4 KB