1. 02 May, 2007 40 commits
    • David P. Reed's avatar
      [PATCH] x86-64: Avoid overflows during apic timer calibration · 4637a74c
      David P. Reed authored
      - Use 64bit TSC calculations to avoid handling overflow
      - Use 32bit unsigned arithmetic for the APIC timer. This
      way overflows are handled correctly.
      - Fix exit check of loop to account for apic timer counting down
      
      Signed-off-by: dpreed@reed.com
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      4637a74c
    • Andi Kleen's avatar
      [PATCH] x86-64: Shut up 32bit emulation for SIOCGIFCOUNT · 9d016dd4
      Andi Kleen authored
      The kernel doesn't implement it, but some programs like java use it
      anyways. Shut the code up.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      9d016dd4
    • Andi Kleen's avatar
      [PATCH] x86-64: Define IGNORE_IOCTL() macro for compat_ioctls · 421f0281
      Andi Kleen authored
      Define a new IGNORE_IOCTL() to let a compat ioctl not be warned about even when
      it is not implemented.
      
      This is the same as COMPATIBLE_IOCTL internally, but better self documentng.
      
      Valid reasons to use this:
      - It is implemented with ->compat_ioctl on some device, but programs
        call it on others too.
      - The ioctl is not implemented in the native kernel, but programs
        call it commonly anyways.
      Most other reasons are not valid.
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      421f0281
    • Andi Kleen's avatar
      [PATCH] x86-64: Use the 32bit wd_ops for 64bit too. · 05cb007d
      Andi Kleen authored
      This mainly removes a lot of code, replacing it with calls into the new 32bit
      perfctr-watchdog.c
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      05cb007d
    • Andi Kleen's avatar
      [PATCH] i386: Clean up NMI watchdog code · 09198e68
      Andi Kleen authored
      - Introduce a wd_ops structure
      - Convert the various nmi watchdogs over to it
      - This allows to split the perfctr reservation from the watchdog
      setup cleanly.
      - Do perfctr reservation globally as it should have always been
      - Remove dead code referenced only by unused EXPORT_SYMBOLs
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      09198e68
    • Suresh Siddha's avatar
      [PATCH] x86-64: set node_possible_map at runtime - try 2 · e3f1caee
      Suresh Siddha authored
      Set the node_possible_map at runtime on x86_64.  On a non NUMA system,
      num_possible_nodes() will now say '1'.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      e3f1caee
    • Tim Hockin's avatar
      [PATCH] x86-64: Dynamically adjust machine check interval · 8a336b0a
      Tim Hockin authored
      Background:
       We've found that MCEs (specifically DRAM SBEs) tend to come in bunches,
       especially when we are trying really hard to stress the system out.  The
       current MCE poller uses a static interval which does not care whether it
       has or has not found MCEs recently.
      
      Description:
       This patch makes the MCE poller adjust the polling interval dynamically.
       If we find an MCE, poll 2x faster (down to 10 ms).  When we stop finding
       MCEs, poll 2x slower (up to check_interval seconds).  The check_interval
       tunable becomes the max polling interval.  The "Machine check events
       logged" printk() is rate limited to the check_interval, which should be
       identical behavior to the old functionality.
      
      Result:
       If you start to take a lot of correctable errors (not exceptions), you
       log them faster and more accurately (less chance of overflowing the MCA
       registers).  If you don't take a lot of errors, you will see no change.
      
      Alternatives:
       I considered simply reducing the polling interval to 10 ms immediately
       and keeping it there as long as we continue to find errors.  This felt a
       bit heavy handed, but does perform significantly better for the default
       check_interval of 5 minutes (we're using a few seconds when testing for
       DRAM errors).  I could be convinced to go with this, if anyone felt it
       was not too aggressive.
      
      Testing:
       I used an error-injecting DIMM to create lots of correctable DRAM errors
       and verified that the polling interval accelerates.  The printk() only
       happens once per check_interval seconds.
      
      Patch:
       This patch is against 2.6.21-rc7.
      Signed-Off-By: default avatarTim Hockin <thockin@google.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      8a336b0a
    • Gerd Hoffmann's avatar
      [PATCH] x86-64: ignore vgacon if hardware not present · f82af20e
      Gerd Hoffmann authored
      Avoid trying to set up vgacon if there's no vga hardware present.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Alan <alan@lxorguk.ukuu.org.uk>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      f82af20e
    • Andi Kleen's avatar
      [PATCH] i386: fix wrong comment for syscall stack layout · 889f21ce
      Andi Kleen authored
      `ret_from_sys_call' label no longer exist and `syscall_exit' label was
      introduced instead.
      Signed-off-by: default avatarSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      889f21ce
    • Andrew Morton's avatar
      [PATCH] x86-64: unexport cpu_llc_id · 425001fe
      Andrew Morton authored
      WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.data:cpu_llc_id from __ksymtab between '__ksymtab_cpu_llc_id' (at offset 0x4a0) and '__ksymtab_smp_num_siblings'
      
      It is strange to export a __cpuinitdata symbols to modules, and no module
      appears to use it anyway.
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      425001fe
    • Eric W. Biederman's avatar
      [PATCH] i386: convert to the kthread API · f26d6a2b
      Eric W. Biederman authored
      This patch just trivial converts from calling kernel_thread and daemonize
      to just calling kthread_run.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f26d6a2b
    • Zachary Amsden's avatar
      [PATCH] i386: pte simplify ops · 9e5e3162
      Zachary Amsden authored
      Add comment and condense code to make use of native_local_ptep_get_and_clear
      function.  Also, it turns out the 2-level and 3-level paging definitions were
      identical, so move the common definition into pgtable.h
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      9e5e3162
    • Zachary Amsden's avatar
      [PATCH] i386: pte xchg optimization · 142dd975
      Zachary Amsden authored
      In situations where page table updates need only be made locally, and there is
      no cross-processor A/D bit races involved, we need not use the heavyweight
      xchg instruction to atomically fetch and clear page table entries.  Instead,
      we can just read and clear them directly.
      
      This introduces a neat optimization for non-SMP kernels; drop the atomic xchg
      operations from page table updates.
      
      Thanks to Michel Lespinasse for noting this potential optimization.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      142dd975
    • Zachary Amsden's avatar
      [PATCH] i386: pte clear optimization · c2c1accd
      Zachary Amsden authored
      When exiting from an address space, no special hypervisor notification of page
      table updates needs to occur; direct page table hypervisors, such as Xen,
      switch to another address space first (init_mm) and unprotects the page tables
      to avoid the cost of trapping to the hypervisor for each pte_clear.  Shadow
      mode hypervisors, such as VMI and lhype don't need to do the extra work of
      calling through paravirt-ops, and can just directly clear the page table
      entries without notifiying the hypervisor, since all the page tables are about
      to be freed.
      
      So introduce native_pte_clear functions which bypass any paravirt-ops
      notification.  This results in a significant performance win for VMI and
      removes some indirect calls from zap_pte_range.
      
      Note the 3-level paging already had a native_pte_clear function, thus
      demanding argument conformance and extra args for the 2-level definition.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      c2c1accd
    • Daniel Walker's avatar
      [PATCH] i386: remove xtime_lock'ing around cpufreq notifier · df3624aa
      Daniel Walker authored
      The locking of the xtime_lock around the cpu notifier is unessesary now.
      At one time the tsc was used after a frequency change for timekeeping, but
      the re-write of timekeeping no longer uses the TSC unless the frequency is
      constant.
      
      The variables that are changed in this section of code had also once been
      used for timekeeping, but not any longer ..
      Signed-off-by: default avatarDaniel Walker <dwalker@mvista.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      df3624aa
    • Siddha, Suresh B's avatar
      [PATCH] x86-64: skip cache_free_alien() on non NUMA · 62918a03
      Siddha, Suresh B authored
      Set use_alien_caches to 0 on non NUMA platforms.  And avoid calling the
      cache_free_alien() when use_alien_caches is not set.  This will avoid the
      cache miss that happens while dereferencing slabp to get nodeid.
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      62918a03
    • Andi Kleen's avatar
      [PATCH] x86-64: Auto compute __NR_syscall_max at compile time · 57a4f91a
      Andi Kleen authored
      No need to maintain it anymore
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      57a4f91a
    • Joachim Deguara's avatar
      [PATCH] i386: check capability · 2f3c30e6
      Joachim Deguara authored
      Currently the i386 architecture checks the family for mce capability and this
      removes that and uses the CPUID information.  Tested on a K8 revE and a
      family10h processor.
      
      This eliminates checking of a set AMD procesor family if mce is
      allowed and relies on the information being in CPUID.
      Signed-off-by: default avatarJoachim Deguara <joachim.deguara@amd.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2f3c30e6
    • Keshavamurthy, Anil S's avatar
      [PATCH] i386: clean up flush_tlb_others fn · 1bdae458
      Keshavamurthy, Anil S authored
      Cleanup flush_tlb_others(), no functional change.
      Signed-off-by: default avatarAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1bdae458
    • Hisashi Hifumi's avatar
      [PATCH] i386: replace spin_lock_irqsave with spin_lock · 62dbc210
      Hisashi Hifumi authored
      IRQ is already disabled through local_irq_disable().  So
      spin_lock_irqsave() can be replaced with spin_lock().
      Signed-off-by: default avatarHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      62dbc210
    • Keshavamurthy, Anil S's avatar
      [PATCH] i386: avoid checking for cpu gone when CONFIG_HOTPLUG_CPU not defined · e8a72ffa
      Keshavamurthy, Anil S authored
      Avoid checking for cpu gone in mm hot path when CONFIG_HOTPLUG_CPU is not
      defined.
      Signed-off-by: default avatarAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e8a72ffa
    • Eric Dumazet's avatar
      [PATCH] x86-64: move __vgetcpu_mode & __jiffies to the vsyscall_2 zone · 141a892f
      Eric Dumazet authored
      We apparently hit the 1024 limit of vsyscall_0 zone when some debugging
      options are set, or if __vsyscall_gtod_data is 64 bytes larger.
      
      In order to save 128 bytes from the vsyscall_0 zone, we move __vgetcpu_mode
      & __jiffies to vsyscall_2 zone where they really belong, since they are
      used only from vgetcpu() (which is in this vsyscall_2 area).
      
      After patch is applied, new layout is :
      
      ffffffffff600000 T vgettimeofday
      ffffffffff60004e t vsysc2
      ffffffffff600140 t vread_hpet
      ffffffffff600150 t vread_tsc
      ffffffffff600180 D __vsyscall_gtod_data
      ffffffffff600400 T vtime
      ffffffffff600413 t vsysc1
      ffffffffff600800 T vgetcpu
      ffffffffff600870 D __vgetcpu_mode
      ffffffffff600880 D __jiffies
      ffffffffff600c00 T venosys_1
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      141a892f
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: PARAVIRT: fix startup_ipi_hook config dependency · 0260c196
      Jeremy Fitzhardinge authored
      startup_ipi_hook depends on CONFIG_X86_LOCAL_APIC, so move it to the
      right part of the paravirt_ops initialization.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      0260c196
    • Randy Dunlap's avatar
      [PATCH] i386: fix mtrr sections · 25c16b99
      Randy Dunlap authored
      Fix section mismatch warnings in mtrr code.
      Fix line length on one source line.
      
      WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.data: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x103)
      WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x180)
      WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x199)
      WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x1c1)
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      25c16b99
    • Fernando Luis [** ISO-8859-1 charset **] VzquezCao's avatar
      [PATCH] x86-64: Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64 · 70ae77f4
      Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
      NMI_VECTOR to avoid potential hangups in the event of crash when kdump
      tries to stop the other CPUs.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      70ae77f4
    • Fernando Luis [** ISO-8859-1 charset **] VzquezCao's avatar
      [PATCH] i386: Use safe_apic_wait_icr_idle in safe_apic_wait_icr_idle - i386 · f5efb41e
      Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
      NMI_VECTOR to avoid potential hangups in the event of crash when kdump
      tries to stop the other CPUs.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      f5efb41e
    • Fernando Luis [** ISO-8859-1 charset **] VzquezCao's avatar
      [PATCH] x86-64: __send_IPI_dest_field - x86_64 · 9062d888
      Implement __send_IPI_dest_field which can be used to send IPIs when the
      "destination shorthand" field of the ICR is set to 00 (destination
      field). Use it whenever possible.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      9062d888
    • Fernando Luis [** ISO-8859-1 charset **] VzquezCao's avatar
      [PATCH] i386: __send_IPI_dest_field - i386 · 45ae5e96
      Implement __send_IPI_dest_field which can be used to send IPIs when the
      "destination shorthand" field of the ICR is set to 00 (destination
      field). Use it whenever possible.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      45ae5e96
    • Fernando Luis VazquezCao's avatar
      [PATCH] x86-64: use safe_apic_wait_icr_idle in smpboot.c - x86_64 · 3144c332
      Fernando Luis VazquezCao authored
      inquire_remote_apic is used for APIC debugging, so use
      safe_apic_wait_icr_idle  instead of apic_wait_icr_idle to avoid possible
      lockups when APIC delivery fails.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      3144c332
    • Fernando Luis VazquezCao's avatar
      [PATCH] i386: use safe_apic_wait_icr_idle in smpboot.c · 4312fa81
      Fernando Luis VazquezCao authored
      __inquire_remote_apic is used for APIC debugging, so use
      safe_apic_wait_icr_idle  instead of apic_wait_icr_idle to avoid possible
      lockups when APIC delivery fails.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      4312fa81
    • Fernando Luis VazquezCao's avatar
      [PATCH] x86-64: use safe_apic_wait_icr_idle in smpboot.c - x86_64 · ea8c733b
      Fernando Luis VazquezCao authored
      The functionality provided by the new safe_apic_wait_icr_idle is being
      open-coded all over "kernel/smpboot.c". Use safe_apic_wait_icr_idle
      instead to consolidate code and ease maintenance.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      ea8c733b
    • Fernando Luis VazquezCao's avatar
      [PATCH] i386: use safe_apic_wait_icr_idle - i386 · ae08e43e
      Fernando Luis VazquezCao authored
      The functionality provided by the new safe_apic_wait_icr_idle is being
      open-coded all over "kernel/smpboot.c". Use safe_apic_wait_icr_idle
      instead to consolidate code and ease maintenance.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      ae08e43e
    • Fernando Luis VazquezCao's avatar
      [PATCH] x86-64: safe_apic_wait_icr_idle - x86_64 · 8339e9fb
      Fernando Luis VazquezCao authored
      apic_wait_icr_idle looks like this:
      
      static __inline__ void apic_wait_icr_idle(void)
      {
        while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
          cpu_relax();
      }
      
      The busy loop in this function would not be problematic if the
      corresponding status bit in the ICR were always updated, but that does
      not seem to be the case under certain crash scenarios. Kdump uses an IPI
      to stop the other CPUs in the event of a crash, but when any of the
      other CPUs are locked-up inside the NMI handler the CPU that sends the
      IPI will end up looping forever in the ICR check, effectively
      hard-locking the whole system.
      
      Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
      
      "A local APIC unit indicates successful dispatch of an IPI by
      resetting the Delivery Status bit in the Interrupt Command
      Register (ICR). The operating system polls the delivery status
      bit after sending an INIT or STARTUP IPI until the command has
      been dispatched.
      
      A period of 20 microseconds should be sufficient for IPI dispatch
      to complete under normal operating conditions. If the IPI is not
      successfully dispatched, the operating system can abort the
      command. Alternatively, the operating system can retry the IPI by
      writing the lower 32-bit double word of the ICR. This “time-out”
      mechanism can be implemented through an external interrupt, if
      interrupts are enabled on the processor, or through execution of
      an instruction or time-stamp counter spin loop."
      
      Intel's documentation suggests the implementation of a time-out
      mechanism, which, by the way, is already being open-coded in some parts
      of the kernel that tinker with ICR.
      
      Create a apic_wait_icr_idle replacement that implements the time-out
      mechanism and that can be used to solve the aforementioned problem.
      
      AK: moved both functions out of line
      AK: Added improved loop from Keith Owens
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      8339e9fb
    • Fernando Luis VazquezCao's avatar
      [PATCH] i386: safe_apic_wait_icr_idle - i386 · f2b218dd
      Fernando Luis VazquezCao authored
      apic_wait_icr_idle looks like this:
      
      static __inline__ void apic_wait_icr_idle(void)
      {
        while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
          cpu_relax();
      }
      
      The busy loop in this function would not be problematic if the
      corresponding status bit in the ICR were always updated, but that does
      not seem to be the case under certain crash scenarios. Kdump uses an IPI
      to stop the other CPUs in the event of a crash, but when any of the
      other CPUs are locked-up inside the NMI handler the CPU that sends the
      IPI will end up looping forever in the ICR check, effectively
      hard-locking the whole system.
      
      Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
      
      "A local APIC unit indicates successful dispatch of an IPI by
      resetting the Delivery Status bit in the Interrupt Command
      Register (ICR). The operating system polls the delivery status
      bit after sending an INIT or STARTUP IPI until the command has
      been dispatched.
      
      A period of 20 microseconds should be sufficient for IPI dispatch
      to complete under normal operating conditions. If the IPI is not
      successfully dispatched, the operating system can abort the
      command. Alternatively, the operating system can retry the IPI by
      writing the lower 32-bit double word of the ICR. This “time-out”
      mechanism can be implemented through an external interrupt, if
      interrupts are enabled on the processor, or through execution of
      an instruction or time-stamp counter spin loop."
      
      Intel's documentation suggests the implementation of a time-out
      mechanism, which, by the way, is already being open-coded in some parts
      of the kernel that tinker with ICR.
      
      Create a apic_wait_icr_idle replacement that implements the time-out
      mechanism and that can be used to solve the aforementioned problem.
      
      AK: moved both functions out of line
      AK: added improved loop from Keith Owens
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      f2b218dd
    • Bernhard Kaindl's avatar
      [PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync · de938c51
      Bernhard Kaindl authored
      If our copy of the MTRRs of the BSP has RdMem or WrMem set, and
      we are running on an AMD64/K8 system, the boot CPU must have had
      MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would
      have copied these bits cleared), so we set them on this CPU as well.
      
      This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync
      across the CPUs of SMP systems in order to fullfill the duty of
      system software to "initialize and maintain MTRR consistency
      across all processors." as written in the AMD and Intel manuals.
      
      If an WRMSR instruction fails because MtrrFixDramModEn is not
      set, I expect that also the Intel-style MTRR bits are not updated.
      
      AK: minor cleanup, moved MSR defines around
      Signed-off-by: default avatarBernhard Kaindl <bk@suse.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      de938c51
    • Bernhard Kaindl's avatar
      [PATCH] x86: Save and restore the fixed-range MTRRs of the BSP when suspending · 3ebad590
      Bernhard Kaindl authored
      Note: This patch didn'nt need an update since it's initial post.
      
      Some BIOSes may modify fixed-range MTRRs in SMM, e.g. when they
      transition the system into ACPI mode, which is entered thru an SMI,
      triggered by Linux in acpi_enable().
      
      SMIs which cause that Linux is interrupted and BIOS code is
      executed (which may change e.g. fixed-range MTRRs) in SMM may
      be raised by an embedded system controller which is often found
      in notebooks also at other occasions.
      
      If we would not update our copy of the fixed-range MTRRs before
      suspending to RAM or to disk, restore_processor_state() would
      set the fixed-range MTRRs of the BSP using old backup values
      which may be outdated and this could cause the system to fail
      later during resume.
      
      This patch ensures that our copy of the fixed-range MTRRs
      is updated when saving the boot processor state on suspend
      to disk and suspend to RAM.
      
      In combination with other patches this allows to fix s2ram
      and s2disk on the Acer Ferrari 1000 notebook and at least
      s2disk on the Acer Ferrari 5000 notebook.
      Signed-off-by: default avatarBernhard Kaindl <bk@suse.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      3ebad590
    • Bernhard Kaindl's avatar
      [PATCH] x86: Save the MTRRs of the BSP before booting an AP · 2b1f6278
      Bernhard Kaindl authored
      Applied fix by Andew Morton:
      http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'.
      
      AMD and Intel x86 CPU manuals state that it is the responsibility of
      system software to initialize and maintain MTRR consistency across
      all processors in Multi-Processing Environments.
      
      Quote from page 188 of the AMD64 System Programming manual (Volume 2):
      
      7.6.5 MTRRs in Multi-Processing Environments
      
      "In multi-processing environments, the MTRRs located in all processors must
      characterize memory in the same way. Generally, this means that identical
      values are written to the MTRRs used by the processors." (short omission here)
      "Failure to do so may result in coherency violations or loss of atomicity.
      Processor implementations do not check the MTRR settings in other processors
      to ensure consistency. It is the responsibility of system software to
      initialize and maintain MTRR consistency across all processors."
      
      Current Linux MTRR code already implements the above in the case that the
      BIOS does not properly initialize MTRRs on the secondary processors,
      but the case where the fixed-range MTRRs of the boot processor are changed
      after Linux started to boot, before the initialsation of a secondary
      processor, is not handled yet.
      
      In this case, secondary processors are currently initialized by Linux
      with MTRRs which the boot processor had very early, when mtrr_bp_init()
      did run, but not with the MTRRs which the boot processor uses at the
      time when that secondary processors is actually booted,
      causing differing MTRR contents on the secondary processors.
      
      Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the
      BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs
      of the boot processor when it transitions the system into ACPI mode.
      The SMI handler of the BIOS does this in SMM, entered while Linux ACPI
      code runs acpi_enable().
      
      Other occasions where the SMI handler of the BIOS may change bits in
      the MTRRs could occur as well. To initialize newly booted secodary
      processors with the fixed-range MTRRs which the boot processor uses
      at that time, this patch saves the fixed-range MTRRs of the boot
      processor before new secondary processors are started. When the
      secondary processors run their Linux initialisation code, their
      fixed-range MTRRs will be updated with the saved fixed-range MTRRs.
      
      If CONFIG_MTRR is not set, we define mtrr_save_state
      as an empty statement because there is nothing to do.
      
      Possible TODOs:
      
      *) CPU-hotplugging outside of SMP suspend/resume is not yet tested
         with this patch.
      
      *) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up,
         then the calls to mtrr_save_state() could be replaced by calls to
         mtrr_save_fixed_ranges(NULL) and  mtrr_save_state() would not be
         needed.
      
         That would need either verification of the CPU-hotplug code or
         at least a test on a >2 CPU machine.
      
      *) The MTRRs of other running processors are not yet checked at this
         time but it might be interesting to syncronize the MTTRs of all
         processors before booting. That would be an incremental patch,
         but of rather low priority since there is no machine known so
         far which would require this.
      
      AK: moved prototypes on x86-64 around to fix warnings
      Signed-off-by: default avatarBernhard Kaindl <bk@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      2b1f6278
    • Bernhard Kaindl's avatar
      [PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches. · 2b3b4835
      Bernhard Kaindl authored
      In this current implementation which is used in other patches,
      mtrr_save_fixed_ranges() accepts a dummy void pointer because
      in the current implementation of one of these patches, this
      function may be called from smp_call_function_single() which
      requires that this function takes a void pointer argument.
      
      This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges
      which is the element of the static struct which stores our current
      backup of the fixed-range MTRR values which all CPUs shall be
      using.
      
      Because  mtrr_save_fixed_ranges calls get_fixed_ranges after
      kernel initialisation time, __init needs to be removed from
      the declaration of get_fixed_ranges().
      
      If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges
      as an empty statement because there is nothing to do.
      
      AK: Moved prototypes for x86-64 around to fix warnings
      Signed-off-by: default avatarBernhard Kaindl <bk@suse.de>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      2b3b4835
    • Andi Kleen's avatar
      856f44ff
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Clean up ELF note generation · 03df4f6e
      Jeremy Fitzhardinge authored
      Three cleanups:
      
      1: ELF notes are never mapped, so there's no need to have any access
      flags in their phdr.
      
      2: When generating them from asm, tell the assembler to use a SHT_NOTE
      section type.  There doesn't seem to be a way to do this from C.
      
      3: Use ANSI rather than traditional cpp behaviour to stringify the
      macro argument.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      03df4f6e