Commits · 026ae7875bd6529f96224bd1fe570ef8ec973f25 · Kirill Smelkov / linux

An error occurred fetching the project authors.

23 Nov, 2007 40 commits

Import 2.3.30pre1 · 026ae787
Linus Torvalds authored 17 years ago

026ae787
Import 2.3.29 · 2cc90c98
Linus Torvalds authored 17 years ago

2cc90c98
Import 2.3.20pre1 · a3c57c1b
Linus Torvalds authored 17 years ago

a3c57c1b
Import 2.3.16 · 0d447745
Linus Torvalds authored 17 years ago

0d447745
Import 2.3.14pre1 · 6bbf087e
Linus Torvalds authored 17 years ago

6bbf087e
Import 2.3.9pre5 · 697b7103
Linus Torvalds authored 17 years ago

697b7103
Import 2.3.8pre3 · afbf60cd
Linus Torvalds authored 17 years ago

afbf60cd
Import 2.2.7pre4 · 830c685b
Linus Torvalds authored 17 years ago

830c685b
Import 2.2.0pre8 · 3a282a06
Linus Torvalds authored 17 years ago

3a282a06

Linus Torvalds authored 17 years ago

Ok, I think I now know why pre-6 looks so unbalanced. It's two issues.
Basically, trying to swap out a large number of pages from one process
context is just doomed. It bascially sucks, because

- it has bad latency. This is further excerberated by the per-process
"thrashing_memory" flag, which means that if we were unlucky enough to
be selected to be the process that frees up memory, we'll probably be
stuck with it for a long time. That can make it extremely unfair under
some circumstances - other processes may allocate the pages we free'd
up, so that we keep on being counted as a memory trasher even if we
really aren't.
Note that this shows most under "moderate" load - the problem doesn't
tend to show itself if you have some process that is _really_
allocating a lot of pages, because then that process will be correctly
found by the trashing logic. But if you have lots of "normal load"
processes, some of those can get really badly hurt by this.
In particular, the worst case you have a number of processes that all
allocate memory, but not very quickly - certainly not more quickly than
we can page things out. What happens is that under these circumstances
one of them gets marked as a "scapegoat", and once that happens all the
others will just live off the pages that the scapegoat frees up, while
the scapegoat itself doesn't make much progress at all because it is
always just freeing memory for others.
The really bad behaviour tends to go away reasonably quickly, but while
it happens it's _really_ unfair.
- try_to_free_pages() just goes overboard, and starts paging stuff out
without getting back to the nice balanced behaviour. This is what
Andrea noticed.
Essentially, once it starts failing the shrink_mmap() tests, it will
just page things out crazily. Normally this is avoided by just always
starting from shrink_mmap(), but if you ask try_to_free_pages() to try
to free up a ton of pages, the balancing that it does is basically
bypassed.

So basically pre-6 works _really_ well for the kind of stress-me stuff
that it was designed for: a few processes that are extremely memory
hungry. It gets close to perfect swap-out behaviour, simply because it is
optimized for getting into a paging rut.

That makes for nice benchmarks, but it also explains why (a) sometimes
it's just not very nice for interactive behaviour and (b) why it under
normal load can easily swap much too eagerly.

Anyway, the first problem is fixed by making "trashing" be a global flag
rather than a per-process flag. Being per-process is really nice when it
finds the right process, but it's really unfair under a lot of other
circumstances. I'd rather be fair than get the best possible page-out
speed.

Note that even a global flag helps: it still clusters the write-outs, and
means that processes that allocate more pages tend to be more likely to be
hit by it, so it still does a large part of what the per-process flag did
- without the unfairness (but admittedly being unfair sometimes gets you
better performance - you just have to be _very_ careful whom you target
with the unfairness, and that's the hard part).

The second problem actually goes away by simply just not asking
try_to_free_pages() to free too many pages - and having the global
trashing flag makes it unnecessary to do so anyway because the flag will
essentially cluster the page-outs even without asking for them to be all
done in one large chunk (and now it's not just one process that gets hit
any more).

There's a "pre-7.gz" on ftp.kernel.org in testing, anybody interested?
It's not the real thing, as I haven't done the write semaphore deadlock
thing yet, but that one will not affect normal users anyway so for
performance testing this should be equivalent.

Linus

c68677ac

Linux 2.1.131pre2 · b468356b

Linus Torvalds authored 17 years ago

There's a pre-131-2 patch there on ftp.kernel.org in the testing
directory. This should have the NFS locking issues worked out (please
test), and also has a rather subtle but potentially very nasty deadlock
due to incorrect semaphore ordering with rmdir() hopefully fixed for good.
Alan, the regparm patches are also there.

                Linus

nfs: write back everything whenever some lock is changed (not just for
     unlock), and always invalidates the caches.

b468356b

The Basted Turkey Release (aka 2.1.130) · 2a86df06

Linus Torvalds authored 17 years ago

Following hot on the heels of the greased weasel, the basted turkey rears
its handsome head.

The basted turkey release fixes some problems that our dear weasel had,
namely:
 - NFS reference counting was wrong. It had been wrong for a long time,
   but apparently the more aggressively asynchronous code was more easily
   able to show the resultant random memory corruption. That should be
   gone.
 - The UP flu fixed officially (this has been in most of the 2.1.129
   patches)
 - kernel_thread() used to be able to cause bad things in init-routines at
   bootup. Fixed.
 - itimers could lead to bad things in SMP under heavy itimer load.
 - various mm tweaks to make it behave better under load. Things for dirty
   buffers still under consideration.
 - IP masqerading check fixes.
 - acenic gigabit ethernet driver
 - some drunken revelers fixed some MCA issues.
 - alpha PCI setup updates and video drivers
 - hfs and minix filesystem fixes.

On the whole, an excellent thing to do this evening, and goes together
remarkably well with some good red wine. Amaze your friends and relatives
by completely ignoring them, sitting in a corner with your own basted
turkey, and getting wasted on red wine. Much more fun than your average
thanksgiving dinner,

		Linus

2a86df06

Import 2.1.126pre1 · 79e1fe75
Linus Torvalds authored 17 years ago

79e1fe75
Import 2.1.121pre1 · 716454f0
Linus Torvalds authored 17 years ago

716454f0
Import 2.1.116pre1 · 7d32756b
Linus Torvalds authored 17 years ago

7d32756b
Import 2.1.115pre1 · b5d6c0fe
Linus Torvalds authored 17 years ago

b5d6c0fe

pre-2.1.109-2.. · e994d3ce

Linus Torvalds authored 17 years ago

To get people away from their normally scheduled copyright discussions, I
made a pre-2.1.109 to try out. I woul dhave made a real 2.1.109, but my
computer room has been taken over by visiting relatives, and they want to
go to sleep. Ye Gods!

Get it from ftp.kernel.org, /pub/linux/kernel/testing as usual. It has
 - CPU detection in C code (and thus much easier to expand upon,
   especially as it's all thrown away after booting now that it is
   "initfunc()").  This should finally get the Cyrix case right, for
   example. Please test.
 - too meny people convinced me that sendfile() really wants to act like
   writep().
 - sound driver updates from Alan.
 - console updates, so now we have the full old functionality again as far
   as I'm concerned (but I'm sure people will tell me something is still
   missing)
 - task switch and user space return cleanly handles bad segment
   descriptors etc, so people shoul dno longer be able to cause kernel
   messages by misusing the LDT.
 - wine should work again thanks to Bill Hawes (other LDT fixes)
 - de4x5 driver update
 - token ring driver update
 - ppp driver update
 - coda-fs update
 - "shared writable" bug fixed (thanks to a lot of people for testing and
   working on this - the actual fix was trivial once the problem was
   understood)
Check it out,

                Linus

e994d3ce

Linux 2.1.108 · 7eaba1c7

Linus Torvalds authored 17 years ago

I just made a pre-2.1.108 and put it on ftp.kernel.org - it fixes a
problem where my sendfile() forgot to get the kernel lock (blush), so it
randomly didn't work correctly on SMP.

I've also done some more testing of sendfile(), and the nice thing is that
when I compared doing a file copy with sendfile compared to a plain "cp",
the sendfile implementation was about twice as fast (at least my version
of "cp" will just do read+write pairs over and over again). When I copied
a 38MB file the "cp" took 1:58 seconds while sendfile took 1:08 seconds
according to "time" (I have 512MB of RAM, so this was all cached,
obviously)..

I haven't done any network tests, because I don't think I'd be able to see
any difference, and it does need the "SO_CONSTIPATED" thing and a way to
push the end of data for best performance.

Some final words on sendfile():
 - it does report errors correctly. That doesn't mean that you necessarily
   can know _which_ fd produced the error, that you have to find out on
   your own. A file real access can generally result in EIO and EACCES
   (the latter with NFS and other "protection-at-read-time" non-UNIX
   filesystems), while the output write() can result in a number of errors
   as the output fd can be any kind of socket/tty/file. Depending on the
   mode of the output file, the output errors can include EINTR, EAGAIN
   etc, and you can mix sendfile() with select() on the output socket, for
   example.
 - you can give it a length of MAX_ULONG, and it will write as much as it
   can. This is entirely consistent with the notion that it is equivalent
   with write(out, tmpbuf, read(in, tmpbuf, size)) where "tmpbuf" is
   essentially infinite - the read() will read al of the file and return
   the file length in the process. Thus you don't even need to know the
   size of the file beforehand.
   The file copy test was essentially done with a single
        error = sendfile(out, in, ~0);
   and I'm appending my current test-program.

This is going to be in 2.2, btw. The changes are so small and so obviously
have to work that it would be ridiculous not to have this - the only
question is whether I'll try to make it a "copyfd()" system call instead,
falling back on read+write when I can't use the page cache directly. I
suspect I won't.

                        Linus

7eaba1c7

Import 2.1.107 · d4f630d9
Linus Torvalds authored 17 years ago

d4f630d9
Import 2.1.94 · ad1b31ae
Linus Torvalds authored 17 years ago

ad1b31ae
Import 2.1.92pre2 · 5e71242d
Linus Torvalds authored 17 years ago

5e71242d

Stephen Tweedie: · 717def95

Linus Torvalds authored 17 years ago

* 2.1.88, adds a bunch of new functionality to
  the swapper.  The main changes are:

* All swapping goes through the swap cache (aka. page cache) now.

* There is no longer a swap lock map.  Because we need to atomically
  test and create a new swap-cache page in order to do swap IO, it is
  sufficient just to lock the struct page itself.  Having only one
  layer of locking to deal with removes a number of races concerning
  swapping shared pages.

* We can swap shared pages, and still keep them shared when they are
  swapped back in!!!  Currently, only private shared pages (as in pages
  shared after a fork()) benefit from this, but the basic mechanism will
  be appropriate for MAP_ANONYMOUS | MAP_SHARED pages too
  (implementation to follow).  Pages will remain shared after a swapoff.

* The page cache is now quite happy dealing with swap-cache pages too.
  In particular, write-ahead and read-ahead of swap through the page
  cache will work fine (and in fact, write-ahead does get done already
  under certain circumstances with this patch --- that's essentially how
  the swapping of shared pages gets done).  Support code to perform
  asynchronous readahead of swap is included, but is not actually used
  anywhere yet.

  I've tested with a number of forked processes running with a shared
  working set larger than physical memory, and with SysV shared memory.
  I haven't found any problems with it so far.

Linus: I've also changed the way we consider us to need more memory in kswapd,
       but that was entirely orthogonal and did not impact these patches. ]

[Changelog pieced together by davej]

717def95

Import 2.1.79 · ae04feb3
Linus Torvalds authored 17 years ago

ae04feb3
Import 2.1.37pre3 · 6258f70d
Linus Torvalds authored 17 years ago

6258f70d
Import 2.1.35 · a2d6205f
Linus Torvalds authored 17 years ago

a2d6205f
Import 2.1.6 · d86fb96f
Linus Torvalds authored 17 years ago

d86fb96f
Import 2.1.3 · 121c2c4c
Linus Torvalds authored 17 years ago

121c2c4c
Import 2.0.14 · 8cf64f0c
Linus Torvalds authored 17 years ago

8cf64f0c
Import pre2.0.12 · 43c4e96e
Linus Torvalds authored 17 years ago

43c4e96e
Import 1.3.87 · f806c6db
Linus Torvalds authored 17 years ago

f806c6db
Import 1.3.82 · 2ab298ef
Linus Torvalds authored 17 years ago

2ab298ef
Import 1.3.75 · e2c56c88
Linus Torvalds authored 17 years ago

e2c56c88
Import 1.3.70 · d26708ba
Linus Torvalds authored 17 years ago

d26708ba
Import 1.3.65 · a525572b
Linus Torvalds authored 17 years ago

a525572b
Import 1.3.50 · 22accfc2
Linus Torvalds authored 17 years ago

22accfc2
Import 1.3.48 · 97d32f33
Linus Torvalds authored 17 years ago

97d32f33
Import 1.3.41 · a89a2558
Linus Torvalds authored 17 years ago

a89a2558
Import 1.3.38 · 32784d95
Linus Torvalds authored 17 years ago

32784d95
Import 1.3.28 · 175e8c6e
Linus Torvalds authored 17 years ago

175e8c6e
Import 1.3.22 · 8f0ec1f9
Linus Torvalds authored 17 years ago

8f0ec1f9