Commits · a55702bb86b58773e7e880956f00ee55cb939355 · Kirill Smelkov / linux

23 Aug, 2004 1 commit

[PATCH] context-switching overhead in X, ioport() · a55702bb

Ingo Molnar authored Aug 22, 2004

while debugging/improving scheduling latencies i got the following
strange latency report from Lee Revell:

  http://krustophenia.net/testresults.php?dataset=2.6.8.1-P6#/var/www/2.6.8.1-P6

this trace shows a 120 usec latency caused by XFree86, on a 600 MHz x86
system. Looking closer reveals:

  00000002 0.006ms (+0.003ms): __switch_to (schedule)
  00000002 0.088ms (+0.082ms): finish_task_switch (schedule)

it took more than 80 usecs for XFree86 to do a context-switch!

it turns out that the reason for this (massive) context-switching
overhead is the following change in 2.6.8:

      [PATCH] larger IO bitmaps

To demonstrate the effect of this change i've written ioperm-latency.c
(attached), which gives the following on vanilla 2.6.8.1:

  # ./ioperm-latency
  default no ioperm:             scheduling latency: 2528 cycles
  turning on port 80 ioperm:     scheduling latency: 10563 cycles
  turning on port 65535 ioperm:  scheduling latency: 10517 cycles

the ChangeSet says:

        Now, with the lazy bitmap allocation and per-CPU TSS, this
        will really not drain any resources I think.

this is plain wrong. An increase in the IO bitmap size introduces
per-context-switch overhead as well: we now have to copy an 8K bitmap
every time XFree86 context-switches - even though XFree86 never uses
ports higher than 1024! I've straced XFree86 on a number of x86 systems
and in every instance ioperm() was used - so i'd say the majority of x86
Linux systems running 2.6.8.1 are affected by this problem.

This not only causes lots of overhead, it also trashes ~16K out of the
L1 and L2 caches, on every context-switch. It's as if XFree86 did a L1
cache flush on every context-switch ...

the simple solution would be to revert IO_BITMAP_BITS back to 1024 and
release 2.6.8.2?

I've implemented another solution as well, which tracks the
highest-enabled port # for every task and does the copying of the bitmap
intelligently. (patch attached) The patched kernel gives:

  # ./ioperm-latency
  default no ioperm:             scheduling latency: 2423 cycles
  turning on port 80 ioperm:     scheduling latency: 2503 cycles
  turning on port 65535 ioperm:  scheduling latency: 10607 cycles

this is much more acceptable - the full overhead only occurs in the very
unlikely event of a task using the high ioport range. X doesnt suffer
any significant overhead.

(tracking the maximum allowed port # also allows a simplification of
io_bitmap handling: e.g. we dont do the invalid-offset trick anymore -
the IO bitmap in the TSS is always valid and secure.)

I tested the patch on x86 SMP and UP, it works fine for me. I tested
boundary conditions as well, it all seems secure.

	Ingo

#include <errno.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <sys/io.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/unistd.h>

#define CYCLES(x) asm volatile ("rdtsc" :"=a" (x)::"edx")

#define __NR_sched_set_affinity 241
_syscall3 (int, sched_set_affinity, pid_t, pid, unsigned int, mask_len, unsigned long *, mask)

/*
 * Use a pair of RT processes bound to the same CPU to measure
 * context-switch overhead:
 */
static void measure(void)
{
	unsigned long i, min = ~0UL, pid, mask = 1, t1, t2;

	sched_set_affinity(0, sizeof(mask), &mask);

	pid = fork();
	if (!pid)
		for (;;) {
			asm volatile ("sti; nop; cli");
			sched_yield();
		}

	sched_yield();
	for (i = 0; i < 100; i++) {
		asm volatile ("sti; nop; cli");
		CYCLES(t1);
		sched_yield();
		CYCLES(t2);
		if (i > 10) {
			if (t2 - t1 < min)
				min = t2 - t1;
		}
	}
	asm volatile ("sti");

	kill(pid, 9);
	printf("scheduling latency: %ld cycles\n", min);
	sched_yield();
}

int main(void)
{
	struct sched_param p = { sched_priority: 2 };
	unsigned long mask = 1;

	if (iopl(3)) {
		printf("need to run as root!\n");
		exit(-1);
	}
	sched_setscheduler(0, SCHED_FIFO, &p);
	sched_set_affinity(0, sizeof(mask), &mask);

	printf("default no ioperm:             ");
	measure();

	printf("turning on port 80 ioperm:     ");
	ioperm(0x80,1,1);
	measure();

	printf("turning on port 65535 ioperm:  ");
	if (ioperm(0xffff,1,1))
		printf("FAILED - older kernel.\n");
	else
		measure();

	return 0;
}
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

a55702bb

22 Aug, 2004 20 commits

Make some single-bit bitfields unsigned. · 14729dbe

Linus Torvalds authored Aug 22, 2004

Signed single-bit bitfields really are a pretty strange
thing to have. They work, but it wasn't really intentional.

14729dbe

Merge http://linux-watchdog.bkbits.net/linux-2.6-watchdog · 672a4830
Linus Torvalds authored Aug 22, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
672a4830
Merge http://xfs.org:8090/xfs-linux-2.6 · 786501bb
Linus Torvalds authored Aug 22, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
786501bb

[PATCH] fix /proc/net/netstat output · b1dcc3d1

Cal Peake authored Aug 22, 2004

net/ipv4/proc.c was updated to use a new mechanism for outputting
/proc/net/snmp and /proc/net/netstat.

However, a superfluous '\n' snuck in, breaking `netstat -s`

b1dcc3d1

Merge bk://linux-dj.bkbits.net/cpufreq · 26a9b9cf
Linus Torvalds authored Aug 22, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
26a9b9cf
Add another Intel cache descriptor entry. · f9ee7122
Linus Torvalds authored Aug 22, 2004
```
This one from Dave Jones, who read the Intel docs even
more.
```
f9ee7122
Merge bk://bk.arm.linux.org.uk/linux-2.6-fb · 801c2000
Linus Torvalds authored Aug 22, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
801c2000

[PATCH] ppc32: Fix booting on some OldWolrd Macs · a971c4c2

Benjamin Herrenschmidt authored Aug 22, 2004

It seems that on some OldWolrd macs, we don't get the OF stdout device,
thus the new set_preferred_console() dies at boot trying to dereference
a NULL pointer.

Trivial fix.

a971c4c2

Merge bk://gkernel.bkbits.net/netdev-2.6 · a72f691a
Linus Torvalds authored Aug 22, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
a72f691a
[PATCH] update gianfar ethernet driver · f3c1e4cf
Andy Fleming authored Aug 22, 2004

f3c1e4cf
Merge pobox.com:/spare/repo/linux-2.6 · 633980ae
Jeff Garzik authored Aug 22, 2004
```
into pobox.com:/spare/repo/netdev-2.6/ALL
```
633980ae

[PATCH] minix block usage counting fix · 0fcd426d

Andries E. Brouwer authored Aug 22, 2004

In 2.5.18 some minix-specific stuff was moved to the minix subdirectory
where it belonged.  However, a typo crept in, causing inode disk usage
to be incorrectly reported.  A few people have complained, but so far
not sufficiently loudly.
Signed-off-by: Andries Brouwer <Andries.Brouwer@cwi.nl>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

0fcd426d

[PATCH] missing CPU descriptors · 1039575e

Alan Cox authored Aug 22, 2004

There are a couple of cache descriptors in the current Intel manuals
missing from our tables at least one of which appears in an actual
processor in the real world.

1039575e

[PATCH] fix OProfile events with zero event values · 2b74f0cd
John Levon authored Aug 21, 2004
```
A silly bug prevented certain events from being used.
```
2b74f0cd
Merge bk://linux-acpi.bkbits.net/linux-acpi-release-2.6.8 · 752b4f67
Linus Torvalds authored Aug 21, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
752b4f67

[PATCH] Fix HPT374 merge problem · 6a8be6b5

Alan Cox authored Aug 21, 2004

This got accidentally reverted in merging HPT372N support. The following
patch restores 50Mhz on the HPT374 using the 370a clocking tables.

6a8be6b5

Merge bk://bk.arm.linux.org.uk/linux-2.6-rmk · 94c4cad9
Linus Torvalds authored Aug 21, 2004
```
into ppc970.osdl.org:/home/torvalds/v2.6/linux
```
94c4cad9

[PATCH] Kconfig updates for PA-RISC · 7fd9f756

Matthew Wilcox authored Aug 21, 2004

Fix some Kconfig dependencies on PA-RISC (Grant Grundler, Martin Schulze,
					  Helge Deller, Matthew Wilcox)

7fd9f756

[PATCH] ACPI for 2.6 · 59ee499d

Jesse Barnes authored Aug 21, 2004

Define acpi_noirq on ia64 since it's used now in pci_link.c.  All ia64
machines use ACPI, so we can just define it to 0 like we do for acpi_disabled 
and acpi_pci_disabled.
Signed-off-by: Jesse Barnes <jbarnes@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

59ee499d

[PATCH] PA-RISC sound updates · 5ce15909

Matthew Wilcox authored Aug 21, 2004

PA-RISC sound updates:

 - Do a DAC/ADC reset for sampling rate changes in ad1889 (Randolph Chung)
 - Set the ad1889 interrupt configuration properly (Randolph Chung)
 - Fix dependency for the OSS Harmony driver (Thibaut Varene)
 - Forward port Stuart Brady's 2.4 Harmony driver patches (Thibaut Varene)
   - Fix sample skipping (Stuart Brady)
   - Prevent harmony_silence being called wrongly (Stuart Brady)
   - Fix crash caused by buf_to_fill becoming -1 (Stuart Brady)
   - Improve naming of mixer channels (Stuart Brady)
   - Implement SNDCTL_DSP_CHANNELS ioctl (Stuart Brady)
   - Improve toggling the recording source (Stuart Brady)
   - Sanity check MIXER_WRITE volume levels (Stuart Brady)
   - Fix MIXER_READ right_level return (Stuart Brady)
   - Reject AFMT_S16_LE format (Stuart Brady)
 - Fail OSS Harmony initialisation if no irq (Helge Deller)
 - Fix typos in ALSA Harmony (Andy Walker, Grant Grundler, Stuart Brady)

5ce15909

20 Aug, 2004 8 commits

[CPUFREQ] Recognise another Dothan variant in speedstep driver. · a0dea52b
Dave Jones authored Aug 20, 2004
```
From: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>
```
a0dea52b

[WATCHDOG] v2.6.8.1 watchdog-llseek-patch · 1f315b72

Wim Van Sebroeck authored Aug 20, 2004

The watchdog drivers use a VFS implementation and thus should not be
lseek'able, so we put a '.llseek = no_llseek' in the file_operations
structure.

1f315b72

[WATCHDOG] v2.6.8.1 cpu5wdt.c-nonseekable_open-patch · 0a691a2c
Wim Van Sebroeck authored Aug 20, 2004
```
cpu5wdt also contains a VFS and thus should be "nonseekable_open"
```
0a691a2c
[WATCHDOG] pcwd-watchdog.txt-patch · 79d6f094
Friedrich Lobenstock authored Aug 20, 2004
```
Fix example program in pcwd-watchdog.txt document.
```
79d6f094

[WATCHDOG] v2.6.8.1 compat_ioctl-patch · 24bf8954

Arnd Bergmann authored Aug 20, 2004

The watchdog ioctl interface is defined correctly for 32 bit emulation,
although WIOC_GETSUPPORT was not marked as such, for an unclear reason.
WDIOC_SETTIMEOUT and WDIOC_GETTIMEOUT were added in may 2002 to the
code but never to the ioctl list. This adds all three definitions.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>

24bf8954

[XFS] Fix up handling of SB versionnum when filesystem on disk has · 4ee6c244

Timothy Shimmin authored Aug 20, 2004

newer bit features than the kernel.

SGI Modid: xfs-linux:xfs-kern:177392a
Signed-off-by: Nathan Scott <nathans@sgi.com>

4ee6c244

[XFS] avoid using pid_t in ioctl ABI · d18add1a
Christoph Hellwig authored Aug 20, 2004
```
SGI Modid: xfs-linux:xfs-kern:177165a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
d18add1a

[XFS] Add 32bit ioctl translation · 20dafae2

Nathan Scott authored Aug 20, 2004

SGI Modid: xfs-linux:xfs-kern:177164a
Signed-off-by: Nathan Scott <nathans@sgi.com>

20dafae2

19 Aug, 2004 11 commits
- [XFS] Add a realtime inheritance bit for directory inodes so new · aa80167f
  Nathan Scott authored Aug 20, 2004
```
files can be automatically created as realtime files.

SGI Modid: xfs-linux:xfs-kern:177129a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  aa80167f
- [XFS] Use sparse whitespace approach that Al took to be more consistent. Couple more sparse fixes. · e5bfd05f
  Nathan Scott authored Aug 20, 2004
```
SGI Modid: xfs-linux:xfs-kern:177030a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  e5bfd05f
- [XFS] Remove several macros which are no longer used anywhere. · 6090afc2
  Nathan Scott authored Aug 20, 2004
```
SGI Modid: xfs-linux:xfs-kern:177029a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  6090afc2
- [XFS] Add support for unsetting realtime flag on realtime file which · c58fec8d
  Herry Wiputra authored Aug 20, 2004
```
has no extents allocated.

SGI Modid: xfs-linux:xfs-kern:18776a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  c58fec8d
- [XFS] Fix lock leak in xfs_free_file_space · 3ccebd8b
  Dean Roehrich authored Aug 20, 2004
```
SGI Modid: xfs-linux:xfs-kern:176905a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  3ccebd8b
- [XFS] Fix a blocksize-smaller-than-pagesize hang when writing buffers · cb8fb432
  Nathan Scott authored Aug 20, 2004
```
with a shared page.

SGI Modid: xfs-linux:xfs-kern:176412a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  cb8fb432
- [XFS] Fix accidental reverting of sync write preallocations. · 1a1a99df
  Nathan Scott authored Aug 20, 2004
```
SGI Modid: xfs-linux:xfs-kern:176195a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  1a1a99df
- [XFS] Code checks to trap access to fsb zero. · 7db48a7c
  Eric Sandeen authored Aug 20, 2004
```
SGI Modid: xfs-linux:xfs-kern:176159a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  7db48a7c
- [CPUFREQ] Fix up deprecation notices. · 701e8ea4
  Dave Jones authored Aug 19, 2004
```
From: Pavel Machek <pavel@ucw.cz>
- Add missing newlines
- 80-column goodness.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
```
  701e8ea4
- [XFS] Add filesystem size limit even when XFS_BIG_BLKNOS is · 1804dc87
  Eric Sandeen authored Aug 20, 2004
```
in effect; limited by page cache index size (16T on ia32)

SGI Modid: xfs-linux:xfs-kern:175103a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  1804dc87
- [XFS] Fix signed/unsigned issues in xfs_reserve_blocks routine. · e216aeea
  Nathan Scott authored Aug 20, 2004
```
SGI Modid: xfs-linux:xfs-kern:174873a
Signed-off-by: Nathan Scott <nathans@sgi.com>
```
  e216aeea