1. 17 Jul, 2018 1 commit
  2. 27 Jul, 2015 1 commit
  3. 09 May, 2012 1 commit
  4. 21 Jan, 2012 2 commits
    • Jan Beulich's avatar
      x86: atomic64 assembly improvements · cb8095bb
      Jan Beulich authored
      
      In the "xchg" implementation, %ebx and %ecx don't need to be copied
      into %eax and %edx respectively (this is only necessary when desiring
      to only read the stored value).
      
      In the "add_unless" implementation, swapping the use of %ecx and %esi
      for passing arguments allows %esi to become an input only (i.e.
      permitting the register to be re-used to address the same object
      without reload).
      
      In "{add,sub}_return", doing the initial read64 through the passed in
      %ecx decreases a register dependency.
      
      In "inc_not_zero", a branch can be eliminated by or-ing together the
      two halves of the current (64-bit) value, and code size can be further
      reduced by adjusting the arithmetic slightly.
      
      v2: Undo the folding of "xchg" and "set".
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Link: http://lkml.kernel.org/r/4F19A2BC020000780006E0DC@nat28.tlf.novell.com
      
      
      Cc: Luca Barbieri <luca@luca-barbieri.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      cb8095bb
    • Jan Beulich's avatar
      x86: Adjust asm constraints in atomic64 wrappers · 819165fb
      Jan Beulich authored
      
      Eric pointed out overly restrictive constraints in atomic64_set(), but
      there are issues throughout the file. In the cited case, %ebx and %ecx
      are inputs only (don't get changed by either of the two low level
      implementations). This was also the case elsewhere.
      
      Further in many cases early-clobber indicators were missing.
      
      Finally, the previous implementation rolled a custom alternative
      instruction macro from scratch, rather than using alternative_call()
      (which was introduced with the commit that the description of the
      change in question actually refers to). Adjusting has the benefit of
      not hiding referenced symbols from the compiler, which however requires
      them to be declared not just in the exporting source file (which, as a
      desirable side effect, in turn allows that exporting file to become a
      real 5-line stub).
      
      This patch does not eliminate the overly restrictive memory clobbers,
      however: Doing so would occasionally make the compiler set up a second
      register for accessing the memory object (to satisfy the added "m"
      constraint), and it's not clear which of the two non-optimal
      alternatives is better.
      
      v2: Re-do the declaration and exporting of the internal symbols.
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Link: http://lkml.kernel.org/r/4F19A2A5020000780006E0D9@nat28.tlf.novell.com
      
      
      Cc: Luca Barbieri <luca@luca-barbieri.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      819165fb
  5. 10 Jan, 2012 1 commit
  6. 26 Jul, 2011 1 commit
  7. 26 Feb, 2010 1 commit
    • Luca Barbieri's avatar
      x86-32: Rewrite 32-bit atomic64 functions in assembly · a7e926ab
      Luca Barbieri authored
      This patch replaces atomic64_32.c with two assembly implementations,
      one for 386/486 machines using pushf/cli/popf and one for 586+ machines
      using cmpxchg8b.
      
      The cmpxchg8b implementation provides the following advantages over the
      current one:
      
      1. Implements atomic64_add_unless, atomic64_dec_if_positive and
         atomic64_inc_not_zero
      
      2. Uses the ZF flag changed by cmpxchg8b instead of doing a comparison
      
      3. Uses custom register calling conventions that reduce or eliminate
         register moves to suit cmpxchg8b
      
      4. Reads the initial value instead of using cmpxchg8b to do that.
         Currently we use lock xaddl and movl, which seems the fastest.
      
      5. Does not use the lock prefix for atomic64_set
         64-bit writes are already atomic, so we don't need that.
         We still need it for atomic64_read to avoid restoring a value
         changed in the meantime.
      
      6. Allocates registers as well or better than gcc
      
      The 386 implementation provides support for 386 and 486 machines.
      386/486 SMP is not supported (we dropped it), but such support can be
      added easily if desired.
      
      A pure assembly implementation is required due to the custom calling
      conventions, and desire to use %ebp in atomic64_add_return (we need
      7 registers...), as well as the ability to use pushf/popf in the 386
      code without an intermediate pop/push.
      
      The parameter names are changed to match the convention in atomic_64.h
      
      Changes in v3 (due to rebasing to tip/x86/asm):
      - Patches atomic64_32.h instead of atomic_32.h
      - Uses the CALL alternative mechanism from commit
        1b1d9258
      
      
      
      Changes in v2:
      - Merged 386 and cx8 support in the same patch
      - 386 support now done in assembly, C code no longer used at all
      - cmpxchg64 is used for atomic64_cmpxchg
      - stop using macros, use one-line inline functions instead
      - miscellanous changes and improvements
      Signed-off-by: default avatarLuca Barbieri <luca@luca-barbieri.com>
      LKML-Reference: <1267005265-27958-5-git-send-email-luca@luca-barbieri.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      a7e926ab
  8. 07 Jan, 2010 1 commit