Commits · 0fc3a8cde0a53af12f9f9566b0304439b83e4d97 · Kirill Smelkov / linux

25 Nov, 2002 2 commits
- Small fixes to sync up the 2.4 and 2.5 pools. · 0fc3a8cd
  Jeff Dike authored Nov 25, 2002
```
Also fixed a stupid signal handling bug.
```
  0fc3a8cd
- A whole lot of small changes to sync up the 2.4 and 2.5 pools · 64d40969
  Jeff Dike authored Nov 25, 2002
```
somewhat.  Mostly whitespace changes, plus some code movement.
Also added checksum.S to the repository, which I had somehow
missed before.
```
  64d40969
23 Nov, 2002 7 commits
- Merge jdike.stearns.org:linux/skas-2.5 · ac7ea38e
  Jeff Dike authored Nov 23, 2002
```
into uml.karaya.com:/home/jdike/linux/2.5/skas-2.5
```
  ac7ea38e
- Merge · c42f0141
  Jeff Dike authored Nov 23, 2002
  
  c42f0141
- Merge uml.karaya.com:/home/jdike/linux/2.5/linus-2.5 · 6d1d7b0e
  Jeff Dike authored Nov 23, 2002
```
into uml.karaya.com:/home/jdike/linux/2.5/skas-2.5
```
  6d1d7b0e
- Merge jdike.wstearns.org:/home/jdike/linux/linus-2.5 · 87554cc9
  Jeff Dike authored Nov 23, 2002
```
into jdike.wstearns.org:/home/jdike/linux/skas-2.5
```
  87554cc9
- Updated to 2.5.49, which involved fixing the calls to do_fork. · 433553bd
  Jeff Dike authored Nov 23, 2002
  
  433553bd
- Merge uml.karaya.com:/home/jdike/linux/2.5/linus-2.5 · f698b940
  Jeff Dike authored Nov 23, 2002
```
into uml.karaya.com:/home/jdike/linux/2.5/updates-2.5
```
  f698b940
- Finished the skas merge by eliminating a syntax error, fixing the · 16c80381
  Jeff Dike authored Nov 23, 2002
```
new compilation warnings, and fixing a call to handle_page_fault.
```
  16c80381
22 Nov, 2002 31 commits

Merged the rest of the skas changes. · 41fb3bb0
Jeff Dike authored Nov 22, 2002

41fb3bb0
Fixed various build problems with the tlb.c merge. · 3fc0447a
Jeff Dike authored Nov 22, 2002

3fc0447a
Merged the tlb.c changes from the skas patch. · f575fea5
Jeff Dike authored Nov 22, 2002

f575fea5
Minor build fixes to the last batch of skas merges. · 30768623
Jeff Dike authored Nov 22, 2002

30768623
Merged a number of small skas changes. · 863468e5
Jeff Dike authored Nov 22, 2002

863468e5
Linux v2.5.49 · cebce9d8
Linus Torvalds authored Nov 21, 2002

cebce9d8
Merge bk://bk.arm.linux.org.uk · 1ca4ebb9
Linus Torvalds authored Nov 21, 2002
```
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
```
1ca4ebb9
[ARM] Fixups for 2.5.48-bkcur · 7ad26fa6
Russell King authored Nov 22, 2002
```
Fix compilation errors for do_fork() and print_symbol()
```
7ad26fa6
Merge bk://cifs.bkbits.net/linux-2.5cifs · 32ff6d01
Linus Torvalds authored Nov 21, 2002
```
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
```
32ff6d01

[PATCH] no-buffer-head ext2 option · b1ad1f4e

Andrew Morton authored Nov 21, 2002

Implements a new set of block address_space_operations which will never
attach buffer_heads to file pagecache.  These can be turned on for ext2
with the `nobh' mount option.

During write-intensive testing on a 7G machine, total buffer_head
storage remained below 0.3 megabytes.  And those buffer_heads are
against ZONE_NORMAL pagecache and will be reclaimed by ZONE_NORMAL
memory pressure.

This work is, of course, a special for the huge highmem machines.
Possibly it obsoletes the buffer_heads_over_limit stuff (which doesn't
work terribly well), but that code is simple, and will provide relief
for other filesystems.


It should be noted that the nobh_prepare_write() function and the
PageMappedToDisk() infrastructure is what is needed to solve the
problem of user data corruption when the filesystem which backs a
sparse MAP_SHARED mapping runs out of space.  We can use this code in
filemap_nopage() to ensure that all mapped pages have space allocated
on-disk.  Deliver SIGBUS on ENOSPC.

This will require a new address_space op, I expect.

b1ad1f4e

[PATCH] handle zones which are full of unreclaimable pages · 36fb7f84

Andrew Morton authored Nov 21, 2002

This patch is a general solution to the situation where a zone is full
of pinned pages.

This can come about if:

a) Someone has allocated all of ZONE_DMA for IO buffers

b) Some application is mlocking some memory and a zone ends up full
   of mlocked pages (can happen on a 1G ia32 system)

c) All of ZONE_HIGHMEM is pinned in hugetlb pages (can happen on 1G
   machines)

We'll currently burn 10% of CPU in kswapd when this happens, although
it is quite hard to trigger.

The algorithm is:

- If page reclaim has scanned 2 * the total number of pages in the
  zone and there have been no pages freed in that zone then mark the
  zone as "all unreclaimable".

- When a zone is "all unreclaimable" page reclaim almost ignores it.
  We will perform a "light" scan at DEF_PRIORITY (typically 1/4096'th of
  the zone, or 64 pages) and then forget about the zone.

- When a batch of pages are freed into the zone, clear its "all
  unreclaimable" state and start full scanning again.  The assumption
  being that some state change has come about which will make reclaim
  successful again.

  So if a "light scan" actually frees some pages, the zone will revert to
  normal state immediately.

So we're effectively putting the zone into "low power" mode, and lightly
polling it to see if something has changed.

The code works OK, but is quite hard to test - I mainly tested it by
pinning all highmem in hugetlb pages.

36fb7f84

[PATCH] strengthen the `incremental min' logic in the page · fee2b68d

Andrew Morton authored Nov 21, 2002

Strengthen the `incremental min' logic in the page allocator.

Currently it is allowing the allocation to succeed if the zone has
free_pages >= pages_high.

This was to avoid a lockup corner case in which all the zones were at
pages_high so reclaim wasn't doing anything, but the incremental min
refused to take pages from those zones anyway.

But we want the incremental min zone protection to work.  So:

- Only allow the allocator to dip below the incremental min if he
  cannot run direct reclaim.

- Change the page reclaim code so that on the direct reclaim path,
  the caller can free pages beyond ->pages_high.  So if the incremental
  min test fails, the caller will go and free some more memory.

  Eventually, the caller will have freed enough memory for the
  incremental min test to pass against one of the zones.

fee2b68d

[PATCH] Remove mapping->vm_writeback · 53bf7bef

Andrew Morton authored Nov 21, 2002

The vm_writeback address_space operation was designed to provide the VM
with a "clustered writeout" capability.  It allowed the filesystem to
perform more intelligent writearound decisions when the VM was trying
to clean a particular page.

I can't say I ever saw any real benefit from this - not much writeout
actually happens on that path - quite a lot of work has gone into
minimising it actually.

The default ->vm_writeback a_op which I provided wrote back the pages
in ->dirty_pages order.  But there is one scenario in which this causes
problems - writing a single 4G file with mem=4G.  We end up with all of
ZONE_NORMAL full of dirty pages, but all writeback effort is against
highmem pages.  (Because there is about 1.5G of dirty memory total).

Net effect: the machine stalls ZONE_NORMAL allocation attempts until
the ->dirty_pages writeback advances onto ZONE_NORMAL pages.

This can be fixed most sweetly with additional radix-tree
infrastructure which will be quite complex.  Later.


So this patch dumps it all, and goes back to using writepage
against individual pages as they come off the LRU.

53bf7bef

[PATCH] Fix busy-wait with writeback to large queues · 5fa9d488

Andrew Morton authored Nov 21, 2002

blk_congestion_wait() is a utility function which various callers use
to throttle themselves to the rate at which the IO system can retire
writes.

The current implementation refuses to wait if no queues are "congested"
(>75% of requests are in flight).

That doesn't work if the queue is so huge that it can hold more than
40% (dirty_ratio) of memory.  The queue simply cannot enter congestion
because the VM refuses to allow more than 40% of memory to be dirtied.
(This spin could happen with a lot of normal-sized queues too)

So this patch simply changes blk_congestion_wait() to throttle even if
there are no congested queues.  It will cause the caller to sleep until
someone puts back a write request against any queue.  (Nobody uses
blk_congestion_wait for read congestion).

The patch adds new state to backing_dev_info->state: a couple of flags
which indicate whether there are _any_ reads or writes in flight
against that queue.  This was added to prevent blk_congestion_wait()
from taking a nap when there are no writes at all in flight.

But the "are there any reads" info could be used to defer background
writeout from pdflush, to reduce read-vs-write competition.  We'll see.

Because the large request queues have made a fundamental change:
blocking in get_request_wait() has been the main form of VM throttling
for years.  But with large queues it doesn't work any more - all
throttling happens in blk_congestion_wait().

Also, change io_schedule_timeout() to propagate the schedule_timeout()
return value.  I was using that in some debug code, but it should have
been like that from day one.

5fa9d488

[PATCH] bootmem crash fix · 40a7fe2f

Andrew Morton authored Nov 21, 2002

From Roman Zippel.  Don't assume that physical memory starts at
physical address zero.

40a7fe2f

[PATCH] ext2/ext3 Orlov directory accounting fix · 52ab8b6a

Andrew Morton authored Nov 21, 2002

Patch from Stephen Tweedie

"In looking at the fix for the ext3 Orlov double-accounting bug, I
 noticed a change to the sb->s_dir_count accounting, restoring a
 missing s_dir_count++ when we allocate a new directory.

 However, I can't find anywhere in the code where we decrement this
 again on directory deletion, neither in ext2 nor in ext3, in 2.4 nor
 in 2.5."

Locking is via lock_super().

52ab8b6a

[PATCH] remove a warning from __block_write_full_page() · 793b840b

Andrew Morton authored Nov 21, 2002

There is a warning in there to detect when block_write_full_page()
attaches buffers to a blockdev page.  This is a bad thing because that
page's blocks may then overlap blocks from a different address_space.
So I disallowed it.

But the message can be triggered when an application is mmapping a
blockdev MAP_SHARED.  Apparently INND likes to do this.

So remove the warning.

793b840b

[PATCH] fix endian problem in ext3 htree code · 086b4866

Andrew Morton authored Nov 21, 2002

Patch from Christopher Li <chrisl@vmware.com>

This little patch will fix two place in htree code which
forget the "cpu_to_le16" converting . This bug causes
incorrect record length on PPC.

Thanks Franz for report the problem.

086b4866

[PATCH] Make inode_ops->setxattr value parameter const · 443bfb9d

Andrew Morton authored Nov 21, 2002

Patch from Andreas Gruenbacher <agruen@suse.de>

The setxattr inode operation is defined like this in 2.4 and 2.5:

int (*setxattr) (struct dentry *dentry, const char *name,
void *value, size_t size, int flags);

the original type of the value parameter was `const void *'; the const
obviously has been lost at some point. The definition should be:

int (*setxattr) (struct dentry *dentry, const char *name,
const void *value, size_t size, int flags);

443bfb9d

[PATCH] Expanded bad page handling · 67d9df19

Andrew Morton authored Nov 21, 2002

The page allocator has traditionally just gone BUG when it sees a page
in a bad state.  This is usually due to hardware errors, sometimes
software errors.

I'm proposing that we not go BUG() any more, but print lots (and lots)
of diagnostic info and try to continue.

Might be a bit controversial.

67d9df19

[PATCH] reduce CPU cost in loop · 75177faa

Andrew Morton authored Nov 21, 2002

balance_dirty_pages() is too expensive to call once-per-page.  Use the
ratelimited version.

75177faa

[PATCH] Add SMP barrier to ipc's grow_ary() · 5e1341d1

Andrew Morton authored Nov 21, 2002

From Dipanker Sarma.

Before setting the ids->entries to the new array, there must be a wmb()
to make sure that the memcpyed contents of the new array are visible
before the new array becomes visible.

5e1341d1

[PATCH] radix-tree reinitialisation fix · ca22d5dd

Andrew Morton authored Nov 21, 2002

This patch fixes a problem which was discovered by Vladimir Saveliev
<vs@namesys.com>

Radix trees have a `height' field, which defines how far the pages are
from the root of the tree.  It starts out at zero and increases as the
trees depth is grown.

But it is never decreased.  It cannot be decreased without a full tree
traversal.

Because radix_tree_delete() does not decrease `height', we end up
returning inodes to their filesystem's inode slab cache with a non-zero
height.

And when that inode is reused from slab for a new file, it still has a
non-zero height.  So we're breaking the slab rules by not putting
objects back in a fully reinitialised state.

So the new file starts out life with whatever height the previous owner
of the inode had.  Which is space- and speed-inefficient.

The most efficient place to fix this would be in destroy_inode().  But
that only fixes the problem for inodes - there are other users of radix
trees.

So fix it in radix_tree_delete(): if the tree was emptied, reset
`height' to zero.

ca22d5dd

[PATCH] shmdt bugfix · 33487c87

Andrew Morton authored Nov 21, 2002

Patch from Hugh Dickins <hugh@veritas.com>

Fixes the Oracle startup problem reported by Alessandro Suardi.

Reverts a "simplification" to shmdt() which was wrong if subsequent
mprotects broke up the original VMA, or if parts of it were munmapped.

33487c87

[PATCH] misc · a838ea3b

Andrew Morton authored Nov 21, 2002

- I hit a BUG in end_swap_bio_read() under heavy load.  The page
  wasn't locked.  No idea how this can happen :(

  Add a BUG at submission time to catch a caller reading into an
  unlocked swapcache page.

- Remove a debug check from destroy_inode() - it was in the wrong leg
  of the `if' statement anyway.

a838ea3b

[PATCH] kNFSd - 2 of 2 - Change NFSv4 reply encoding to cope with multiple pages. · dcffe12e

Neil Brown authored Nov 21, 2002

This allows NFSv4 responses to cover move than one page. There are
still limits though. There can be at most one 'data' response which
includes READ, READLINK, READDIR. For these responses, the interesting
data goes in a separate page or, for READ, list of pages.

All responses before the 'data' response must fit in one page, and all
responses after it must also fit in one (separate) page.

dcffe12e

[PATCH] kNFSd - 1 of 2 - Change NFSv4 xdr decoding to cope with separate pages. · 89fc0a31

Neil Brown authored Nov 21, 2002

Now that nfsd uses a list of pages for requests instead of
one large buffer, NFSv4 need to know about this.

The most interesting part of this is that it is possible
that section of a request, like a path name, could span
two pages, so we need to be able to kmalloc as little bit
of space to copy them into, and make sure they get
freed later.

89fc0a31

[PATCH] disable old stat on ppc64 · 586a5a35

Anton Blanchard authored Nov 21, 2002

We don't implement the ancient stat syscalls on ppc64 since early libcs
wont run on ppc64 (they hardcode the incorrect cacheline size).

586a5a35

Merge bk://are.twiddle.net/axp-2.5 · f3a7e4f1
Linus Torvalds authored Nov 21, 2002
```
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
```
f3a7e4f1
Merge http://linux-acpi.bkbits.net/linux-acpi · 63a84559
Linus Torvalds authored Nov 21, 2002
```
into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
```
63a84559

[PATCH] Split buffer overflow checking out of struct nfs4_compound · dccb90df

Trond Myklebust authored Nov 21, 2002

Here is the a pre-patch in the attempt to get rid of 'struct
nfs4_compound', and the associated horrible union in 'struct
nfs4_op'.

It splits out the fields that are meant to do buffer overflow checking
and iovec adjusting on the XDR received/sent data. It moves support
for that nto the dedicated structure 'xdr_stream', and the associated
functions 'xdr_reserve_space()', 'xdr_inline_decode()'.

The patch also expands out the all macros ENCODE_HEAD, ENCODE_TAIL,
ADJUST_ARGS and DECODE_HEAD, as well as most of the DECODE_TAILs.

dccb90df