An error occurred fetching the project authors.
- 23 Nov, 2007 40 commits
-
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
Ok, I think I now know why pre-6 looks so unbalanced. It's two issues. Basically, trying to swap out a large number of pages from one process context is just doomed. It bascially sucks, because - it has bad latency. This is further excerberated by the per-process "thrashing_memory" flag, which means that if we were unlucky enough to be selected to be the process that frees up memory, we'll probably be stuck with it for a long time. That can make it extremely unfair under some circumstances - other processes may allocate the pages we free'd up, so that we keep on being counted as a memory trasher even if we really aren't. Note that this shows most under "moderate" load - the problem doesn't tend to show itself if you have some process that is _really_ allocating a lot of pages, because then that process will be correctly found by the trashing logic. But if you have lots of "normal load" processes, some of those can get really badly hurt by this. In particular, the worst case you have a number of processes that all allocate memory, but not very quickly - certainly not more quickly than we can page things out. What happens is that under these circumstances one of them gets marked as a "scapegoat", and once that happens all the others will just live off the pages that the scapegoat frees up, while the scapegoat itself doesn't make much progress at all because it is always just freeing memory for others. The really bad behaviour tends to go away reasonably quickly, but while it happens it's _really_ unfair. - try_to_free_pages() just goes overboard, and starts paging stuff out without getting back to the nice balanced behaviour. This is what Andrea noticed. Essentially, once it starts failing the shrink_mmap() tests, it will just page things out crazily. Normally this is avoided by just always starting from shrink_mmap(), but if you ask try_to_free_pages() to try to free up a ton of pages, the balancing that it does is basically bypassed. So basically pre-6 works _really_ well for the kind of stress-me stuff that it was designed for: a few processes that are extremely memory hungry. It gets close to perfect swap-out behaviour, simply because it is optimized for getting into a paging rut. That makes for nice benchmarks, but it also explains why (a) sometimes it's just not very nice for interactive behaviour and (b) why it under normal load can easily swap much too eagerly. Anyway, the first problem is fixed by making "trashing" be a global flag rather than a per-process flag. Being per-process is really nice when it finds the right process, but it's really unfair under a lot of other circumstances. I'd rather be fair than get the best possible page-out speed. Note that even a global flag helps: it still clusters the write-outs, and means that processes that allocate more pages tend to be more likely to be hit by it, so it still does a large part of what the per-process flag did - without the unfairness (but admittedly being unfair sometimes gets you better performance - you just have to be _very_ careful whom you target with the unfairness, and that's the hard part). The second problem actually goes away by simply just not asking try_to_free_pages() to free too many pages - and having the global trashing flag makes it unnecessary to do so anyway because the flag will essentially cluster the page-outs even without asking for them to be all done in one large chunk (and now it's not just one process that gets hit any more). There's a "pre-7.gz" on ftp.kernel.org in testing, anybody interested? It's not the real thing, as I haven't done the write semaphore deadlock thing yet, but that one will not affect normal users anyway so for performance testing this should be equivalent. Linus
-
Linus Torvalds authored
There's a pre-131-2 patch there on ftp.kernel.org in the testing directory. This should have the NFS locking issues worked out (please test), and also has a rather subtle but potentially very nasty deadlock due to incorrect semaphore ordering with rmdir() hopefully fixed for good. Alan, the regparm patches are also there. Linus nfs: write back everything whenever some lock is changed (not just for unlock), and always invalidates the caches.
-
Linus Torvalds authored
Following hot on the heels of the greased weasel, the basted turkey rears its handsome head. The basted turkey release fixes some problems that our dear weasel had, namely: - NFS reference counting was wrong. It had been wrong for a long time, but apparently the more aggressively asynchronous code was more easily able to show the resultant random memory corruption. That should be gone. - The UP flu fixed officially (this has been in most of the 2.1.129 patches) - kernel_thread() used to be able to cause bad things in init-routines at bootup. Fixed. - itimers could lead to bad things in SMP under heavy itimer load. - various mm tweaks to make it behave better under load. Things for dirty buffers still under consideration. - IP masqerading check fixes. - acenic gigabit ethernet driver - some drunken revelers fixed some MCA issues. - alpha PCI setup updates and video drivers - hfs and minix filesystem fixes. On the whole, an excellent thing to do this evening, and goes together remarkably well with some good red wine. Amaze your friends and relatives by completely ignoring them, sitting in a corner with your own basted turkey, and getting wasted on red wine. Much more fun than your average thanksgiving dinner, Linus
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
To get people away from their normally scheduled copyright discussions, I made a pre-2.1.109 to try out. I woul dhave made a real 2.1.109, but my computer room has been taken over by visiting relatives, and they want to go to sleep. Ye Gods! Get it from ftp.kernel.org, /pub/linux/kernel/testing as usual. It has - CPU detection in C code (and thus much easier to expand upon, especially as it's all thrown away after booting now that it is "initfunc()"). This should finally get the Cyrix case right, for example. Please test. - too meny people convinced me that sendfile() really wants to act like writep(). - sound driver updates from Alan. - console updates, so now we have the full old functionality again as far as I'm concerned (but I'm sure people will tell me something is still missing) - task switch and user space return cleanly handles bad segment descriptors etc, so people shoul dno longer be able to cause kernel messages by misusing the LDT. - wine should work again thanks to Bill Hawes (other LDT fixes) - de4x5 driver update - token ring driver update - ppp driver update - coda-fs update - "shared writable" bug fixed (thanks to a lot of people for testing and working on this - the actual fix was trivial once the problem was understood) Check it out, Linus
-
Linus Torvalds authored
I just made a pre-2.1.108 and put it on ftp.kernel.org - it fixes a problem where my sendfile() forgot to get the kernel lock (blush), so it randomly didn't work correctly on SMP. I've also done some more testing of sendfile(), and the nice thing is that when I compared doing a file copy with sendfile compared to a plain "cp", the sendfile implementation was about twice as fast (at least my version of "cp" will just do read+write pairs over and over again). When I copied a 38MB file the "cp" took 1:58 seconds while sendfile took 1:08 seconds according to "time" (I have 512MB of RAM, so this was all cached, obviously).. I haven't done any network tests, because I don't think I'd be able to see any difference, and it does need the "SO_CONSTIPATED" thing and a way to push the end of data for best performance. Some final words on sendfile(): - it does report errors correctly. That doesn't mean that you necessarily can know _which_ fd produced the error, that you have to find out on your own. A file real access can generally result in EIO and EACCES (the latter with NFS and other "protection-at-read-time" non-UNIX filesystems), while the output write() can result in a number of errors as the output fd can be any kind of socket/tty/file. Depending on the mode of the output file, the output errors can include EINTR, EAGAIN etc, and you can mix sendfile() with select() on the output socket, for example. - you can give it a length of MAX_ULONG, and it will write as much as it can. This is entirely consistent with the notion that it is equivalent with write(out, tmpbuf, read(in, tmpbuf, size)) where "tmpbuf" is essentially infinite - the read() will read al of the file and return the file length in the process. Thus you don't even need to know the size of the file beforehand. The file copy test was essentially done with a single error = sendfile(out, in, ~0); and I'm appending my current test-program. This is going to be in 2.2, btw. The changes are so small and so obviously have to work that it would be ridiculous not to have this - the only question is whether I'll try to make it a "copyfd()" system call instead, falling back on read+write when I can't use the page cache directly. I suspect I won't. Linus
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
* 2.1.88, adds a bunch of new functionality to the swapper. The main changes are: * All swapping goes through the swap cache (aka. page cache) now. * There is no longer a swap lock map. Because we need to atomically test and create a new swap-cache page in order to do swap IO, it is sufficient just to lock the struct page itself. Having only one layer of locking to deal with removes a number of races concerning swapping shared pages. * We can swap shared pages, and still keep them shared when they are swapped back in!!! Currently, only private shared pages (as in pages shared after a fork()) benefit from this, but the basic mechanism will be appropriate for MAP_ANONYMOUS | MAP_SHARED pages too (implementation to follow). Pages will remain shared after a swapoff. * The page cache is now quite happy dealing with swap-cache pages too. In particular, write-ahead and read-ahead of swap through the page cache will work fine (and in fact, write-ahead does get done already under certain circumstances with this patch --- that's essentially how the swapping of shared pages gets done). Support code to perform asynchronous readahead of swap is included, but is not actually used anywhere yet. I've tested with a number of forked processes running with a shared working set larger than physical memory, and with SysV shared memory. I haven't found any problems with it so far. Linus: I've also changed the way we consider us to need more memory in kswapd, but that was entirely orthogonal and did not impact these patches. ] [Changelog pieced together by davej]
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-
Linus Torvalds authored
-