1. 21 Jun, 2008 1 commit
    • Ingo Molnar's avatar
      x86, bitops: make constant-bit set/clear_bit ops faster, gcc workaround · 437a0a54
      Ingo Molnar authored
      Jeremy Fitzhardinge reported this compiler bug:
      
      Suggestion from Linus: add "r" to the input constraint of the
      set_bit()/clear_bit()'s constant 'nr' branch:
      
      Blows up on "gcc version 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)":
      
       CC      init/main.o
      include2/asm/bitops.h: In function `start_kernel':
      include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
      include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
      include2/asm/bitops.h:59: warning: asm operand 1 probably doesn't match constraints
      include2/asm/bitops.h:59: error: impossible constraint in `asm'
      include2/asm/bitops.h:59: error: impossible constraint in `asm'
      include2/asm/bitops.h:59: error: impossible constraint in `asm'
      Reported-by: default avatarJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      437a0a54
  2. 20 Jun, 2008 1 commit
  3. 19 Jun, 2008 1 commit
    • Linus Torvalds's avatar
      x86, bitops: make constant-bit set/clear_bit ops faster · 1a750e0c
      Linus Torvalds authored
      On Wed, 18 Jun 2008, Linus Torvalds wrote:
      >
      > And yes, the "lock andl" should be noticeably faster than the xchgl.
      
      I dunno. Here's a untested (!!) patch that turns constant-bit
      set/clear_bit ops into byte mask ops (lock orb/andb).
      
      It's not exactly pretty. The reason for using the byte versions is that a
      locked op is serialized in the memory pipeline anyway, so there are no
      forwarding issues (that could slow down things when we access things with
      different sizes), and the byte ops are a lot smaller than 32-bit and
      particularly 64-bit ops (big constants, and the 64-bit ops need the REX
      prefix byte too).
      
      [ Side note: I wonder if we should turn the "test_bit()" C version into a
        "char *" version too.. It could actually help with alias analysis, since
        char pointers can alias anything. So it might be the RightThing(tm) to
        do for multiple reasons. I dunno. It's a separate issue. ]
      
      It does actually shrink the kernel image a bit (a couple of hundred bytes
      on the text segment for my everything-compiled-in image), and while it's
      totally untested the (admittedly few) code generation points I looked at
      seemed sane. And "lock orb" should be noticeably faster than "lock bts".
      
      If somebody wants to play with it, go wild. I didn't do "change_bit()",
      because nobody sane uses that thing anyway. I guarantee nothing. And if it
      breaks, nobody saw me do anything.  You can't prove this email wasn't sent
      by somebody who is good at forging smtp.
      
      This does require a gcc that is recent enough for "__builtin_constant_p()"
      to work in an inline function, but I suspect our kernel requirements are
      already higher than that. And if you do have an old gcc that is supported,
      the worst that would happen is that the optimization doesn't trigger.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1a750e0c
  4. 25 May, 2008 1 commit
  5. 22 May, 2008 14 commits
  6. 21 May, 2008 22 commits