Commits · 8a6fdd3e912e0ce6f723431d66baf704bf8a1d26 · Kirill Smelkov / linux

12 Jan, 2006 40 commits

[PATCH] x86_64: Reject SRAT tables that don't cover all memory · 8a6fdd3e

Andi Kleen authored Jan 11, 2006

Broken BIOS on Iwill 8way systems reports these and it causes the bootmem
allocator to crash. Add a sanity check if all the PXMs in the
SRAT table cover all memory as reported by e820. If the sanity
check fails the SRAT is rejected and the code will fall back
to discover the NUMA topology using the K8 northbridge registers
when applicable.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

8a6fdd3e

[PATCH] x86_64: Add idle notifiers · 95833c83

Andi Kleen authored Jan 11, 2006

This adds a new notifier chain that is called with IDLE_START
when a CPU goes idle and IDLE_END when it goes out of idle.
The context can be idle thread or interrupt context.

Since we cannot rely on MONITOR/MWAIT existing the idle
end check currently has to be done in all interrupt
handlers.

They were originally inspired by the similar s390 implementation.

They have a variety of applications:
- They will be needed for CONFIG_NO_IDLE_HZ
- They can be used for oprofile to fix up the missing time
in idle when performance counters don't tick.
- They can be used for better C state management in ACPI
- They could be used for microstate accounting.

This is just infrastructure so far, no users.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

95833c83

[PATCH] x86_64: Clean up some printks in NUMA code · 6b050f80
Andi Kleen authored Jan 11, 2006
```
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
```
6b050f80

[PATCH] x86_64: Fix up coding style in numa.c · d18ff470

Andi Kleen authored Jan 11, 2006

No functional changes
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d18ff470

[PATCH] x86_64: Fix off by one in IOMMU check · ca8642f6

Andi Kleen authored Jan 11, 2006

Fix off by one when checking if the machine has enougn memory to need IOMMU
This caused the IOMMUs to be needlessly enabled for mem=4G

Based on a patch from Jon Mason

Signed-off-by: jdmason@us.ibm.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ca8642f6

[PATCH] x86_64: Handle missing local APIC timer interrupts on C3 state · d25bf7e5

Venkatesh Pallipadi authored Jan 11, 2006

Whenever we see that a CPU is capable of C3 (during ACPI cstate init), we
disable local APIC timer and switch to using a broadcast from external timer
interrupt (IRQ 0).

Patch below adds the code for x86_64.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

d25bf7e5

[PATCH] i386: Handle missing local APIC timer interrupts on C3 state · 6eb0a0fd

Venkatesh Pallipadi authored Jan 11, 2006

Whenever we see that a CPU is capable of C3 (during ACPI cstate init), we
disable local APIC timer and switch to using a broadcast from external timer
interrupt (IRQ 0). This is needed because Intel CPUs stop the local
APIC timer in C3. This is currently only enabled for Intel CPUs.

Patch below adds the code for i386 and also the ACPI hunk.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6eb0a0fd

[PATCH] i386/x86-64: Remove sub jiffy profile timer support · 5a07a30c

Venkatesh Pallipadi authored Jan 11, 2006

Remove the finer control of local APIC timer. We cannot provide a sub-jiffy
control like this when we use broadcast from external timer in place of
local APIC. Instead of removing this only on systems that may end up using
broadcast from external timer (due to C3), I am going the
"I'm feeling lucky" way to remove this fully. Basically, I am not sure about
usefulness of this code today. Few other architectures also don't seem to
support this today.

If you are using profiling and fine grained control and don't like this going
away in normal case, yell at me right now.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

5a07a30c

[PATCH] x86_64: Report hardware breakpoints in user space when triggered by the kernel · 01b8faae

John Blackwood authored Jan 11, 2006

I would like to throw out a suggestion for a possible change in the way that
the debug register traps are handled in do_debug() when the trap occurs
in kernel-mode.

In the x86_64 version of do_debug(), the code will skip around sending
a SIGTRAP to the current task if the trap occurred while in kernel mode.

On the i386-side of things, if the access happens to occur in kernel mode
(say during a read(2) of user's buffer that matches the address of a
debug register trap), then the do_debug() routine for i386 will go ahead
and call send_sigtrap() and send the SIGTRAP signal. The send_sigtrap()
code will also set the info.si_addr to NULL in this case (even though I
don't understand why, since the SIGTRAP siginfo processing doesn't use
the si_addr field...).

So I would like to suggest that the x86_64 do_debug() routine also
follow this type of behavior and have it go ahead and send the
SIGTRAP signal to the current task, even if the debug register trap
happens to have occurred in kernel mode. I have taken a stab at
a patch for this change below. (It includes the i386-ish change
for setting si_addr to NULL when the trap occurred in kernel mode.)

It seems like a useful feature to be able to 'watch' a user location that
might also be modified in the kernel via a system service call, and have the
debugger report that information back to the user, rather than to just
silently ignore the trap.

Additionally, I realize that users that pull in a kernel debugger such as
KGDB into their kernel might want to remove this change below when they add
in KGDB support. However, they could alternatively look at the current
task's thread.debugreg[] values to see if the trap occurred due to KGDB
or instead because of a user-space debugger trap, and still honor the
user SIGTRAP processing (instead of the KGDB breakpoint processing)
if the trap matches up with the thread.debugreg[] registers.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

01b8faae

[PATCH] x86_64: "extern inline" -> "static inline" in pgtable.h · 4839057c

Adrian Bunk authored Jan 11, 2006

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4839057c

[PATCH] x86_64: Convert page fault error codes to symbolic constants. · 66c58156

Andi Kleen authored Jan 11, 2006

Much better to deal with these than with the magic numbers.

And remove the comment describing the bits - kernel source
is no replacement for an architecture manual.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

66c58156

[PATCH] x86_64: Implement is_compat_task the right way · bf2fcc6f

Andi Kleen authored Jan 11, 2006

By setting a flag during a 32bit system call only
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bf2fcc6f

[PATCH] x86_64: Implement compat code for sg driver SG_GET_REQUEST_TABLE ioctl · 2966387b

Andi Kleen authored Jan 11, 2006

Apparently helps with some non SANE scanner drivers.

Cc: axboe@suse.de
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2966387b

[PATCH] x86_64: Remove unnecessary case from the page fault handler · f95190b2

Andi Kleen authored Jan 11, 2006

Don't need to do the vmalloc check for the module range because its
PML4 is shared with the kernel text.

Also removed an unnecessary TLB flush.

Pointed out by Jan Beulich

Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f95190b2

[PATCH] x86_64: Align and pad x86_64 GDT on page boundary · c11efdf9

Ravikiran G Thirumalai authored Jan 11, 2006

This patch is on the same lines as Zachary Amsden's i386 GDT page alignemnt
patch in -mm, but for x86_64.

Patch to align and pad x86_64 GDT on page boundries.

[AK: some minor cleanups and fixed incorrect TLS initialization
in CPU init.]
Signed-off-by: Nippun Goel <nippung@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

c11efdf9

[PATCH] x86_64: Allow compilation on a 32bit biarch toolchain · bb33421d

Andi Kleen authored Jan 11, 2006

This might help on distributions that use a 32bit biarch compiler.

First pass -m64 by default.

Secondly add some more .code32s because at least the Ubuntu biarch
32bit as called by gcc doesn't seem to handle -m64 -m32 as generated
by the Makefile without such assistance.

And finally make sure the linker script can be preprocessed
with a 32bit cpp.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bb33421d

[PATCH] x86_64: Make udelay more accurate · 79c62cf1

Ross Biro authored Jan 11, 2006

The attempt to avoid overflow in __delay caused varying precision
on different CPUs depending on differences in the CPU speed.

We should be able to do this multiplication with out overflowing
provided the
cpu is running at less than about 128 GHz.  xloops < 20000 * 0x10c6.
loops_per_jiffy * HZ <= cpu_clock_speed.  So if the cpu clock speed
< 2^64/(20000 * 0x10c6) = 2^64/ 51E6CC0 < 2^64/2^27 = 2^37 = 128G we
will not overflow the calculation.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

79c62cf1

[PATCH] x86_64: Return -1 for unknown PCI bus affinity · e4e94072

Andi Kleen authored Jan 11, 2006

When we don't know the node a PCI bus is connected to return -1.
This matches the generic code.

Noticed by Ravikiran G Thirumalai <kiran@scalex86.org>

Cc: Ravikiran G Thirumalai <kiran@scalex86.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e4e94072

[PATCH] x86_64: Handle unknown node (-1) in alloc_pages_node · 819a6928

Andi Kleen authored Jan 11, 2006

Following kmalloc_node.

Needed for another patch to return -1 for unknown nodes in x86-64.

Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: kiran@scalex86.org
Signed-off-by: Andi Kleen <ak@suse.de>
[ Changed 0 to numa_node_id() on suggestion by Christoph Lameter ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

819a6928

[PATCH] x86_64: Validate SLIT table · 1584b89c

Andi Kleen authored Jan 11, 2006

A lot of Opteron BIOS just pass 10 in all SLIT entries (10 is the
normalized unit). This is actually worse than the default heuristic
because it leads to pci_distance not knowing the difference between
local and remote nodes anymore. This messes up some NUMA
heuristics in generic code.

In this case it's better to fall back to the default heuristic
which just does nodea == nodeb ? 10 : 20.

This patch does some basic sanity checking on the SLIT and only accepts
the SLIT when it passes.

Invariants enforced are:
- Node to itself shall be 10
- Any other distance shouldn't be 10
- Distances smaller than 10 are illegal
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

1584b89c

[PATCH] x86_64: Fix off by one in acpi table mapping · 7a4a76cc

Andi Kleen authored Jan 11, 2006

And fix the test to include the size

Noticed by Vivek Goyal
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7a4a76cc

[PATCH] x86_64: Fix 64bit FXSAVE encoding · 7180d4fb

Jan Beulich authored Jan 11, 2006

The separation of the rex64 prefix (on fxsave/fxrstor) by way of using
a semicolon resulted in the prefix not always taking effect (because
when extended registers are needed for addressing, another rex prefix
would have been generated by the compiler), thus (depending on the
build) resulting in eventually getting 32-bit saves and/or restores.
Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7180d4fb

[PATCH] x86_64: Generalize DMI and enable for x86-64 · e9928674

Andi Kleen authored Jan 11, 2006

Some people need it now on 64bit so reuse the i386 code for
x86-64. This will be also useful for future bug workarounds.

It is a bit simplified there because there is no need
to do it very early on x86-64. This means it doesn't need
early ioremap et.al. We run it as a core initcall right now.

I hope it's not needed for early setup.

I added a general CONFIG_DMI symbol in case IA64 or someone
else wants to reuse the code later too.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

e9928674

[PATCH] x86_64: Remove bogus file in arch/x86_64/pci · b347d25f

Andi Kleen authored Jan 11, 2006

This was a backup file that somehow made it into the official
tree. Never used for anything. Remove.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b347d25f

[PATCH] x86_64: Add missing newline in IOMMU error message · f46ace69
Andi Kleen authored Jan 11, 2006
```
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
```
f46ace69

[PATCH] x86_64: fls in asm for x86_64 · 636dd2b7

Stephen Hemminger authored Jan 11, 2006

Use single instruction for find largest set bit on x86_64.

[Updated by Jan Beulich to fix wrong asm constraints in original
patch -AK]

Cc: jbeulich@novell.com
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

636dd2b7

[PATCH] x86_64: fix page fault from show_trace() · bd9cb64d

Jan Beulich authored Jan 11, 2006

The introduction of call_softirq switching to the interrupt stack several
releases earlier resulted in a problem with the code in show_trace, which
assumes that it can pick the previous stack pointer from the end of the
interrupt stack.

Cc: Andi Kleen <ak@muc.de>
Cc: Arjan van de Ven <arjanv@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

bd9cb64d

[PATCH] x86_64: fix single step handling for 32bit processes · 4724e3e8

Peter Beutner authored Jan 11, 2006

Be more careful with TF handling to fix some copy protection codes in wine

patch originally for i386 by Linus, then ported to x86_64 by Andi Kleen
see: [PATCH] x86_64: Some fixes for single step handling
commit: be61bff7

But it was never applied to the ia32 emulation code which breaks some
copy-protection schemes under wine when running on x86_64.
Signed-off-by: Peter Beutner <p.beutner@gmx.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

4724e3e8

[PATCH] x86_64: don't save eflags in x86-64 switch_to() · 60917a38

Benjamin LaHaise authored Jan 11, 2006

As discussed, the flags register on x86-64 is saved and restored by the
assembly code which sets up struct pt_regs, so we do not need to save
and restore it in the inline assembler which already informs gcc that
we're clobbering the flags.  This patch has been sanity booted and works
okay here.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

60917a38

[PATCH] i386/x86-64: Don't IPI to offline cpus on shutdown · 6e3fbee5

Eric W. Biederman authored Jan 11, 2006

So why are we calling smp_send_stop from machine_halt?

We don't.

Looking more closely at the bug report the problem here
is that halt -p is called which triggers not a halt but
an attempt to power off.

machine_power_off calls machine_shutdown which calls smp_send_stop.

If pm_power_off is set we should never make it out machine_power_off
to the call of do_exit.  So pm_power_off must not be set in this case.
When pm_power_off is not set we expect machine_power_off to devolve
into machine_halt.

So how do we fix this?

Playing too much with smp_send_stop is dangerous because it
must also be safe to be called from panic.

It looks like the obviously correct fix is to only call
machine_shutdown when pm_power_off is defined.  Doing
that will make Andi's assumption about not scheduling
true and generally simplify what must be supported.

This turns machine_power_off into a noop like machine_halt
when pm_power_off is not defined.

If the expected behavior is that sys_reboot(LINUX_REBOOT_CMD_POWER_OFF)
becomes sys_reboot(LINUX_REBOOT_CMD_HALT) if pm_power_off is NULL
this is not quite a comprehensive fix as we pass a different parameter
to the reboot notifier and we set system_state to a different value
before calling device_shutdown().

Unfortunately any fix more comprehensive I can think of is not
obviously correct.  The core problem is that there is no architecture
independent way to detect if machine_power will become a noop, without
calling it.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

6e3fbee5

[PATCH] x86_64/i386: Remove preempt disable calls in lowlevel IPI · 329d400f

Zwane Mwaikambo authored Jan 11, 2006

I noticed that some lowlevel send_IPI_mask helpers had a hotplug/preempt
race whereupon the cpu_online_map was read before disabling preemption;

...
cpumask_t mask = cpu_online_map;
int cpu = get_cpu();
cpu_clear(cpu, mask);
...

But then i realised that there is no need for these lowlevel functions to
be going through all this trouble when all the callers are already made
hotplug/preempt safe.
Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

329d400f

[PATCH] x86_64: increase MCE bank counts · 73ca5358

Shaohua Li authored Jan 11, 2006

There is one CPU here whose MCE bank count is 6. This patch increases
x86_64's MCE bank count.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

73ca5358

[PATCH] x86_64: another mb() for smpboot.c · f2ecfab9

Benjamin LaHaise authored Jan 11, 2006

The following is probably a good idea given that the atomic_set() isn't
a barrier here either.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

f2ecfab9

[PATCH] x86_64: Move int 3 handler to debug stack and allow to increase it. · b556b35e

Jan Beulich authored Jan 11, 2006

This
- switches the INT3 handler to run on an IST stack (to cope with
  breakpoints set by a kernel debugger on places where the kernel's
  %gs base hasn't been set up, yet); the IST stack used is shared with
  the INT1 handler's
[AK: this also allows setting a kprobe on the interrupt/exception entry
points]
- allows nesting of INT1/INT3 handlers so that one can, with a kernel
  debugger, debug (at least) the user-mode portions of the INT1/INT3
  handling; the nesting isn't actively enabled here since a kernel-
  debugger-free kernel doesn't need it
Signed-Off-By: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

b556b35e

[PATCH] x86_64: Don't confuse apic=... command line option with apic · ed8388a5

Andi Kleen authored Jan 11, 2006

Previously apic was foced with apic=logopt was specified.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

ed8388a5

[PATCH] x86_64: Dont't disable early PCI scan with apic · 7c0ac555

Andi Kleen authored Jan 11, 2006

It might be still needed for non APIC related issues.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

7c0ac555

[PATCH] i386/x86-64: Update AMD CPUID flags · 3f98bc49

Andi Kleen authored Jan 11, 2006

Print bits for RDTSCP, SVM, CR8-LEGACY.

Also now print power flags on i386 like x86-64 always did.
This will add a new line in the 386 cpuinfo, but that shouldn't
be an issue - did that in the past too and I haven't heard
of any breakage.

I shrunk some of the fields in the i386 cpuinfo_x86 to chars
to make up for the new int "x86_power" field. Overall it's
smaller than before.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

3f98bc49

[PATCH] x86_64: Use X86_FEATURE_CONSTANT_TSC now to clean up Intel speedstep drivers · 152bf8c5

Andi Kleen authored Jan 11, 2006

They previously tried to figure this out on their own.

Suggested by Venkatesh.

Cc: venkatesh.pallipadi@intel.com
Cc: davej@redhat.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

152bf8c5

[PATCH] i386/x86-64: Generalize X86_FEATURE_CONSTANT_TSC flag · 39b3a791

Andi Kleen authored Jan 11, 2006

Define it for i386 too.

This is a synthetic flag that signifies that the CPU's TSC runs
at a constant P state invariant frequency.

Fix up the logic on x86-64/i386 to set it on all known CPUs.
Use the AMD defined bit to set it on future AMD CPUs.

Cc: venkatesh.pallipadi@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

39b3a791

[PATCH] x86_64: Remove enable/disable_hlt · 2d52ede9

Andi Kleen authored Jan 11, 2006

Was only used by the floppy driver to work around some ancient
hardware bug that should never occur on any 64bit system.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

2d52ede9