1. 26 Feb, 2010 2 commits
    • Luca Barbieri's avatar
      x86-32: Allow UP/SMP lock replacement in cmpxchg64 · 9c76b384
      Luca Barbieri authored
      Use the functionality just introduced in the previous patch: mark the
      lock prefixes in cmpxchg64 alternatives for UP removal.
      
      Changes in v2:
      - Naming change
      Signed-off-by: default avatarLuca Barbieri <luca@luca-barbieri.com>
      LKML-Reference: <1267005265-27958-3-git-send-email-luca@luca-barbieri.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      9c76b384
    • Luca Barbieri's avatar
      x86: Add support for lock prefix in alternatives · b3ac891b
      Luca Barbieri authored
      The current lock prefix UP/SMP alternative code doesn't allow
      LOCK_PREFIX to be used in alternatives code.
      
      This patch solves the problem by adding a new LOCK_PREFIX_ALTERNATIVE_PATCH
      macro that only records the lock prefix location but does not emit
      the prefix.
      
      The user of this macro can then start any alternative sequence with
      "lock" and have it UP/SMP patched.
      
      To make this work, the UP/SMP alternative code is changed to do the
      lock/DS prefix switching only if the byte actually contains a lock or
      DS prefix.
      
      Thus, if an alternative without the "lock" is selected, it will now do
      nothing instead of clobbering the code.
      
      Changes in v2:
      - Naming change
      - Change label to not conflict with alternatives
      Signed-off-by: default avatarLuca Barbieri <luca@luca-barbieri.com>
      LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      b3ac891b
  2. 16 Feb, 2010 1 commit
  3. 13 Jan, 2010 3 commits
    • Brian Gerst's avatar
      x86: Merge show_regs() · 3bef4447
      Brian Gerst authored
      Using kernel_stack_pointer() allows 32-bit and 64-bit versions to
      be merged.  This is more correct for 64-bit, since the old %rsp is
      always saved on the stack.
      Signed-off-by: default avatarBrian Gerst <brgerst@gmail.com>
      LKML-Reference: <1263397555-27695-1-git-send-email-brgerst@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      3bef4447
    • Dave Jones's avatar
      x86: Macroise x86 cache descriptors · 2ca49b2f
      Dave Jones authored
      Use a macro to define the cache sizes when cachesize > 1 MB.
      
      This is less typing, and less prone to introducing bugs like we
      saw in e02e0e1a, and means we
      don't have to do maths when adding new non-power-of-2 updates
      like those seen recently.
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      LKML-Reference: <20100104144735.GA18390@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2ca49b2f
    • Linus Torvalds's avatar
      x86-32: clean up rwsem inline asm statements · 59c33fa7
      Linus Torvalds authored
      This makes gcc use the right register names and instruction operand sizes
      automatically for the rwsem inline asm statements.
      
      So instead of using "(%%eax)" to specify the memory address that is the
      semaphore, we use "(%1)" or similar. And instead of forcing the operation
      to always be 32-bit, we use "%z0", taking the size from the actual
      semaphore data structure itself.
      
      This doesn't actually matter on x86-32, but if we want to use the same
      inline asm for x86-64, we'll need to have the compiler generate the proper
      64-bit names for the registers (%rax instead of %eax), and if we want to
      use a 64-bit counter too (in order to avoid the 15-bit limit on the
      write counter that limits concurrent users to 32767 threads), we'll need
      to be able to generate instructions with "q" accesses rather than "l".
      
      Since this header currently isn't enabled on x86-64, none of that matters,
      but we do want to use the xadd version of the semaphores rather than have
      to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended
      when you have lots of threads all taking page faults, and the fallback
      rwsem code that uses a spinlock performs abysmally badly in that case.
      
      [ hpa: modified the patch to skip size suffixes entirely when they are
        redundant due to register operands. ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <alpine.LFD.2.00.1001121613560.17145@localhost.localdomain>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      59c33fa7
  4. 07 Jan, 2010 3 commits
  5. 30 Dec, 2009 3 commits
    • Jan Beulich's avatar
      x86-64: Modify memcpy()/memset() alternatives mechanism · 7269e881
      Jan Beulich authored
      In order to avoid unnecessary chains of branches, rather than
      implementing memcpy()/memset()'s access to their alternative
      implementations via a jump, patch the (larger) original function
      directly.
      
      The memcpy() part of this is slightly subtle: while alternative
      instruction patching does itself use memcpy(), with the
      replacement block being less than 64-bytes in size the main loop
      of the original function doesn't get used for copying memcpy_c()
      over memcpy(), and hence we can safely write over its beginning.
      
      Also note that the CFI annotations are fine for both variants of
      each of the functions.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B2BB8D30200007800026AF2@vpn.id2.novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7269e881
    • Jan Beulich's avatar
      x86-64: Modify copy_user_generic() alternatives mechanism · 1b1d9258
      Jan Beulich authored
      In order to avoid unnecessary chains of branches, rather than
      implementing copy_user_generic() as a function consisting of
      just a single (possibly patched) branch, instead properly deal
      with patching call instructions in the alternative instructions
      framework, and move the patching into the callers.
      
      As a follow-on, one could also introduce something like
      __EXPORT_SYMBOL_ALT() to avoid patching call sites in modules.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B2BB8180200007800026AE7@vpn.id2.novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1b1d9258
    • Jan Beulich's avatar
      x86: Lift restriction on the location of FIX_BTMAP_* · 499a5f1e
      Jan Beulich authored
      The early ioremap fixmap entries cover half (or for 32-bit
      non-PAE, a quarter) of a page table, yet they got
      uncondtitionally aligned so far to a 256-entry boundary. This is
      not necessary if the range of page table entries anyway falls
      into a single page table.
      
      This buys back, for (theoretically) 50% of all configurations
      (25% of all non-PAE ones), at least some of the lowmem
      necessarily lost with commit e621bd18.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B2BB66F0200007800026AD6@vpn.id2.novell.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      499a5f1e
  6. 28 Dec, 2009 1 commit
    • Akinobu Mita's avatar
      x86, core: Optimize hweight32() · 39d997b5
      Akinobu Mita authored
      Optimize hweight32 by using the same technique in hweight64.
      
      The proof of this technique can be found in the commit log for
      f9b41929 ("bitops: hweight()
      speedup").
      
      The userspace benchmark on x86_32 showed 20% speedup with
      bitmap_weight() which uses hweight32 to count bits for each
      unsigned long on 32bit architectures.
      
       int main(void)
       {
      	#define SZ (1024 * 1024 * 512)
      
      	static DECLARE_BITMAP(bitmap, SZ) = {
      	        [0 ... 100] = 1,
      	};
      
      	return bitmap_weight(bitmap, SZ);
       }
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1258603932-4590-1-git-send-email-akinobu.mita@gmail.com>
      [ only x86 sets ARCH_HAS_FAST_MULTIPLIER so we do this via the x86 tree]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      39d997b5
  7. 24 Dec, 2009 27 commits