Commits · 7dd294f7c333e77c15a402a4fa798173f286e598 · Kirill Smelkov / linux

15 Aug, 2002 31 commits

[PATCH] uninitialised local in generic_file_write · 7dd294f7

Andrew Morton authored Aug 14, 2002

generic_file_write_nolock() is initialising the pagevec too late,
so if we take an early `goto out' the kernel oopses.  O_DIRECT writes
take that path.

7dd294f7

[PATCH] PCI ID's for 2.5.31 · 75754eb4

Martin Mares authored Aug 14, 2002

I've filtered all submissions to the ID database, merged new ID's from
both 2.4.x and 2.5.x kernels and here is the result -- patch to 2.5.31
pci.ids with all the new stuff. Could you please send it to Linus?
(I would do it myself, but it seems I'll have a lot of work with the
floods in Prague very soon.)

75754eb4

[PATCH] for i386 SETUP CODE · 9cbec887

Keith Mannthey authored Aug 14, 2002

   The following is a simple fix for an array overrun problem in
mpparse.c.  I am working on a multiquad box which has a EISA bus in it
for it's service processor.  It's local bus number is 18 which is > 3
(see quad_local_to_mp_bus_id.  When the NR_CPUS is close the the real
number of cpus adding the EISA bus #18 in the array stomps all over
various things in memory.  The EISA bus does not need to be mapped
anywhere in the kernel for anything.  This patch will not affect non
clustered apic (multiquad) kernels.

9cbec887

[PATCH] Clean up the RPC socket slot allocation code [2/2] · fb9100d0

Trond Myklebust authored Aug 14, 2002

Patch by Chuck Lever. Remove the timeout logic from call_reserve.
This improves the overall RPC call ordering, and ensures that soft
tasks don't time out and give up before they have attempted to send
their message down the socket.

fb9100d0

[PATCH] Clean up the RPC socket slot allocation code [1/2] · 7a72fa16
Trond Myklebust authored Aug 14, 2002
```
Another patch by Chuck Lever. Fixes up some nasty logic in
call_reserveresult().
```
7a72fa16

[PATCH] cleanup RPC accounting · be6dd3ef

Trond Myklebust authored Aug 14, 2002

The following patch is by Chuck Lever, and fixes an an accounting
error in the 'rpc' field in /proc/net/rpc/nfs.

be6dd3ef

[PATCH] Fix typo in the RPC reconnect code... · 0e6a8740
Trond Myklebust authored Aug 14, 2002
```
The following patch fixes a typo that appears both in kernel 2.4.19
and 2.5.31
```
0e6a8740
[PATCH] 2.5.31 reverse spin_lock_irq for i2c-elektor.c · 2e2fa887
Albert Cranford authored Aug 14, 2002
```
Pleaase reverse deadlocking change to i2c-elektor.c
```
2e2fa887

[PATCH] deferred and batched addition of pages to the LRU · 44260240

Andrew Morton authored Aug 14, 2002

The remaining source of page-at-a-time activity against
pagemap_lru_lock is the anonymous pagefault path, which cannot be
changed to operate against multiple pages at a time.

But what we can do is to batch up just its adding of pages to the LRU,
via buffering and deferral.

This patch is based on work from Bill Irwin.

The patch changes lru_cache_add to put the pages into a per-CPU
pagevec.  They are added to the LRU 16-at-a-time.

And in the page reclaim code, purge the local CPU's buffer before
starting.  This is mainly to decrease the chances of pages staying off
the LRU for very long periods: if the machine is under memory pressure,
CPUs will spill their pages onto the LRU promptly.

A consequence of this change is that we can have up to 15*num_cpus
pages which are not on the LRU.  Which could have a slight effect on VM
accuracy, but I find that doubtful.  If the system is under memory
pressure the pages will be added to the LRU promptly, and these pages
are the most-recently-touched ones - the VM isn't very interested in
them anyway.

This optimisation could be made SMP-specific, but I felt it best to
turn it on for UP as well for consistency and better testing coverage.

44260240

[PATCH] pagemap_lru_lock wrapup · eed29d66

Andrew Morton authored Aug 14, 2002

Some fallout from the pagemap_lru_lock changes:

- lru_cache_del() is no longer used.  Kill it.

- page_cache_release() almost never actually frees pages.  So inline
  page_cache_release() and move its rarely-called slow path into (the
  misnamed) mm/swap.c

- update the locking comment in filemap.c.  pagemap_lru_lock used to
  be one of the outermost locks in the VM locking hierarchy.  Now, we
  never take any other locks while holding pagemap_lru_lock.  So it
  doesn't have any relationship with anything.

- put_page() now removes pages from the LRU on the final put.  The
  lock is interrupt safe.

eed29d66

[PATCH] make pagemap_lru_lock irq-safe · aaba9265

Andrew Morton authored Aug 14, 2002

It is expensive for a CPU to take an interrupt while holding the page
LRU lock, because other CPUs will pile up on the lock while the
interrupt runs.

Disabling interrupts while holding the lock reduces contention by an
additional 30% on 4-way.  This is when the only source of interrupts is
disk completion.  The improvement will be higher with more CPUs and it
will be higher if there is networking happening.

The maximum hold time of this lock is 17 microseconds on 500 MHx PIII,
which is well inside the kernel's maximum interrupt latency (which was
100 usecs when I last looked, a year ago).

This optimisation is not needed on uniprocessor, but the patch disables
IRQs while holding pagemap_lru_lock anyway, so it becomes an irq-safe
spinlock, and pages can be moved from the LRU in interrupt context.

pagemap_lru_lock has been renamed to _pagemap_lru_lock to pick up any
missed uses, and to reliably break any out-of-tree patches which may be
using the old semantics.

aaba9265

[PATCH] batched removal of pages from the LRU · 008f707c

Andrew Morton authored Aug 14, 2002

Convert all the bulk callers of lru_cache_del() to use the batched
pagevec_lru_del() function.

Change truncate_complete_page() to not delete the page from the LRU.
Do it in page_cache_release() instead.  (This reintroduces the problem
with final-release-from-interrupt.  THat gets fixed further on).

This patch changes the truncate locking somewhat.  The removal from the
LRU now happens _after_ the page has been removed from the
address_space and has been unlocked.  So there is now a window where
the shrink_cache code can discover the to-be-freed page via the LRU
list.  But that's OK - the page is clean, its buffers (if any) are
clean.  It's not attached to any mapping.

008f707c

[PATCH] batched addition of pages to the LRU · 9eb76ee2

Andrew Morton authored Aug 14, 2002

The patch goes through the various places which were calling
lru_cache_add() against bulk pages and batches them up.

Also.  This whole patch series improves the behaviour of the system
under heavy writeback load.  There is a reduction in page allocation
failures, some reduction in loss of interactivity due to page
allocators getting stuck on writeback from the VM.  (This is still bad
though).

I think it's due to the change here in mpage_writepages().  That
function was originally unconditionally refiling written-back pages to
the head of the inactive list.  The theory being that they should be
moved out of the way of page allocators, who would end up waiting on
them.

It appears that this simply had the effect of pushing dirty, unwritten
data closer to the tail of the inactive list, making things worse.

So instead, if the caller is (typically) balance_dirty_pages() then
leave the pages where they are on the LRU.

If the caller is PF_MEMALLOC then the pages *have* to be refiled.  This
is because VM writeback is clustered along mapping->dirty_pages, and
it's almost certain that the pages which are being written are near the
tail of the LRU.  If they were left there, page allocators would block
on them too soon.  It would effectively become a synchronous write.

9eb76ee2

[PATCH] batched movement of lru pages in writeback · 823e0df8
Andrew Morton authored Aug 14, 2002
```
Makes mpage_writepages() move pages around on the LRU sixteen-at-a-time
rather than one-at-a-time.
```
823e0df8

[PATCH] multithread page reclaim · 3aa1dc77

Andrew Morton authored Aug 14, 2002

This patch multithreads the main page reclaim function, shrink_cache().

This function used to run under pagemap_lru_lock.  Instead, we grab
that lock, put 32 pages from the LRU into a private list, drop the
pagemap_lru_lock and then proceed to attempt to free those pages.

Any pages which were succesfully reclaimed are batch-freed.  Pages
which were not reclaimed are re-added to the LRU.

This patch reduces pagemap_lru_lock contention on the 4-way by a factor
of thirty.

The shrink_cache() code has been simplified somewhat.

refill_inactive() was being called too often - often just to process
two or three pages.  Fiddled with that so it processes pages at the
same rate, but works on 32 pages at a time.

Added a couple of mark_page_accessed() calls into mm/memory.c from 2.4.
They seem appropriate.

Change the shrink_caches() logic so that it will still trickle through
the active list (via refill_inactive) even if the inactive list is much
larger than the active list.

3aa1dc77

[PATCH] pagevec infrastructure · 6a952840

Andrew Morton authored Aug 14, 2002

This is the first patch in a series of eight which address
pagemap_lru_lock contention, and which simplify the VM locking
hierarchy.

Most testing has been done with all eight patches applied, so it would
be best not to cherrypick, please.

The workload which was optimised was: 4x500MHz PIII CPUs, mem=512m, six
disks, six filesystems, six processes each flat-out writing a large
file onto one of the disks.  ie: heavy page replacement load.

The frequency with which pagemap_lru_lock is taken is reduced by 90%.

Lockmeter claims that pagemap_lru_lock contention on the 4-way has been
reduced by 98%.  Total amount of system time lost to lock spinning went
from 2.5% to 0.85%.

Anton ran a similar test on 8-way PPC, the reduction in system time was
around 25%, and the reduction in time spent playing with
pagemap_lru_lock was 80%.

	http://samba.org/~anton/linux/2.5.30/standard/
versus
	http://samba.org/~anton/linux/2.5.30/akpm/

Throughput changes on uniprocessor are modest: a 1% speedup with this
workload due to shortened code paths and improved cache locality.

The patches do two main things:

1: In almost all places where the kernel was doing something with
   lots of pages one-at-a-time, convert the code to do the same thing
   sixteen-pages-at-a-time.  Take the lock once rather than sixteen
   times.  Take the lock for the minimum possible time.

2: Multithread the pagecache reclaim function: don't hold
   pagemap_lru_lock while reclaiming pagecache pages.  That function
   was massively expensive.

One fallout from this work is that we never take any other locks while
holding pagemap_lru_lock.  So this lock conceptually disappears from
the VM locking hierarchy.


So.  This is all basically a code tweak to improve kernel scalability.
It does it by optimising the existing design, rather than by redesign.
There is little conceptual change to how the VM works.

This is as far as I can tweak it.  It seems that the results are now
acceptable on SMP.  But things are still bad on NUMA.  It is expected
that the per-zone LRU and per-zone LRU lock patches will fix NUMA as
well, but that has yet to be tested.


This first patch introduces `struct pagevec', which is the basic unit
of batched work.  It is simply:

struct pagevec {
	unsigned nr;
	struct page *pages[16];
};

pagevecs are used in the following patches to get the VM away from
page-at-a-time operations.

This patch includes all the pagevec library functions which are used in
later patches.

6a952840

[PATCH] lockd shouldn't call posix_unblock_lock here · ecc9d325

Matthew Wilcox authored Aug 14, 2002

nlmsvc_notify_blocked() is only called via the fl_notify() pointer which
is only called immediately after we already did a locks_delete_block(),
so calling posix_unblock_lock() here is always a NOP.

ecc9d325

[PATCH] Modular x86 MTRR driver. · 6a85ced0

Dave Jones authored Aug 14, 2002

This patch from Pat Mochel cleans up the hell that was mtrr.c
into something a lot more modular and easy to understand, by
doing the implementation-per-file as has been done to various
other things by Pat and myself over the last months.

It's functionally identical from a kernel internal point of view,
and a userspace point of view, and is basically just a very large
code clean up.

6a85ced0

[PATCH] stale thread detach debugging removal · 3b307fd5

Ingo Molnar authored Aug 14, 2002

one of the debugging tests triggered a false-positive BUG() when a
detached thread was straced.

3b307fd5

[PATCH] thread release infrastructure · d2b7244f

Ingo Molnar authored Aug 14, 2002

it is much cleaner to pass in the address of the user-space VM lock -
this will also enable arbitrary implementations of the stack-unlock, as
the fifth clone() parameter.

d2b7244f

[PATCH] init_tasks is not defined anywhere. · 86ae817e

Rusty Russell authored Aug 14, 2002

It's referenced by mips and mips64 (both far out of date), but never
actually defined anywhere.

86ae817e

Merge http://linuxusb.bkbits.net/linus-2.5 · edf3d92b
Linus Torvalds authored Aug 14, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
edf3d92b
[PATCH] es1371 synchronize_irq · 17454310
Petr Vandrovec authored Aug 14, 2002
```
Update ES1371 to new synchronize_irq() API.
```
17454310

[PATCH] broken cfb* support in the 2.5.31-bk · 9299c003

Petr Vandrovec authored Aug 14, 2002

line_length, type and visual moved from display struct to the fb_info's fix
structure during last fbdev updates. Unfortunately generic code was not updated
together, so now every fbdev driver is broken.

9299c003

[PATCH] Unicode characters 0x80-0x9F are valid ISO* characters · 26036678

Petr Vandrovec authored Aug 14, 2002

Characters 0x80-0x9F from ISO encodings are U+0080-U+009F, so map
them both ways. Otherwise you cannot use chars 0x80-0x9F in filenames
on filesystems using NLS.

26036678

Merge http://linux-scsi.bkbits.net/scsi-for-linus-2.5 · f9969cbe
Linus Torvalds authored Aug 14, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
f9969cbe
Merge bk://ldm.bkbits.net/linux-2.5 · ad2d842b
Linus Torvalds authored Aug 14, 2002
```
into home.transmeta.com:/home/torvalds/v2.5/linux
```
ad2d842b
[PATCH] Trivial: remove sti from aic7xxx_old · 0352f6f5
Matthew Wilcox authored Aug 14, 2002
```
We don't need to reenable interrupts before calling panic.
```
0352f6f5
[PATCH] umem per-disk gendisks · 49ae70c0
Alexander Viro authored Aug 14, 2002

49ae70c0
[PATCH] dasd per-disk gendisks · 664aa7b2
Alexander Viro authored Aug 14, 2002

664aa7b2
[PATCH] acsi per-disk gendisks · bedbeab4
Alexander Viro authored Aug 14, 2002

bedbeab4

14 Aug, 2002 9 commits
- Merge mulgrave.(none):/home/jejb/BK/53c700-2.5 · 909a019a
  James Bottomley authored Aug 14, 2002
```
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
```
  909a019a
- USB: changed usb_match_id to not need the usb_device pointer. · f601a8a6
  Greg Kroah-Hartman authored Aug 14, 2002
  
  f601a8a6
- Merge ssh://linux-scsi@linux-scsi.bkbits.net/scsi-for-linus-2.5 · 130fbeeb
  James Bottomley authored Aug 14, 2002
```
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
```
  130fbeeb
- [PATCH] USB core cleanups · 16dc2073
  David Brownell authored Aug 14, 2002
```
Moves some functions that are only used by usbfs to be private, and
documents some of the interface issues that need to be cleaned up.
```
  16dc2073
- USB: fixed DEVICE_ATTR usage in the ehci driver · 97a75be6
  Greg Kroah-Hartman authored Aug 14, 2002
  
  97a75be6
- [SCSI debug driver] change DRIVER_ATTR usage · 8403fb48
  James Bottomley authored Aug 14, 2002
  
  8403fb48
- Merge by hand · c6efcb49
  James Bottomley authored Aug 14, 2002
  
  c6efcb49
- Merge mulgrave.(none):/home/jejb/BK/scsi-cpqfc-2.5 · 12c262b2
  James Bottomley authored Aug 14, 2002
```
into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
```
  12c262b2
- Merge by hand · 378a8995
  James Bottomley authored Aug 13, 2002
  
  378a8995