- 01 Nov, 2002 4 commits
-
-
Alexander Viro authored
OK, that's my f*ckup in rd.c (not on initrd path, actually) + couple of f*ckups from Pat (mine: forgot to bump ->bd_count in rd_open(), Pat's: dropped reference to gendisk on del_gendisk(), resulting in use of kfree'd object + tried to remove a symlink that didn't exit). This fixes these. It also changes order of blkdev_put()/del_gendisk() in initrd_release() - better safe than sorry. It got initrd working on my boxen...
-
http://jfs.bkbits.net/linux-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Dave Kleikamp authored
The posix acls are implemented as extended attributes and are compatible with ext2/ext3 posix acls.
-
Dave Kleikamp authored
into shaggy.austin.ibm.com:/shaggy/bk/jfs-2.5
-
- 31 Oct, 2002 36 commits
-
-
http://lia64.bkbits.net/to-linus-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
bk://linux-bt.bkbits.net/bt-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
David Mosberger authored
-
Erich Focht authored
Dear David, please find attached two patches for the latest 2.5.44-ia64. They fix some problems and simplify things a bit. remove_nodeid-2.5.44.patch: This comes from Kimi. In 2.5.44 we suddenly had two definitions for numa_node_id(), one was IA64 specific (local_cpu_data->nodeid) while the other one is now platform independent: __cpu_to_node(smp_processor_id()). After some discussions we decided to remove the nodeid from the local_cpu_data and keep the definition of all other platforms. With using the cpu_to_node_map[] we are also faster when doing multiple lookups, as all node ids come in a single cache line (which is not bounced around, as it's content is only read). ia64_topology_fixup-2.5.44.patch: I'm following here the latest fixup for i386 from Matthew Dobson. The __node_to_cpu_mask() macro now accesses an array which is initialized after the ACPI CPU discovery. It also simplifies __node_to_first_cpu(). A compiler warning has been fixed, too. Please apply these to your kernel tree.
-
David Mosberger authored
-
Robert Love authored
The hyper-threading in /proc/cpuinfo patch introduced a compile warning under UP. Fixed thus.
-
Luca Barbieri authored
This trivial patch causes the TLS to be cleared on execve (code is in flush_thread). This is necessary to avoid ESRCH errors when set_thread_area is asked to choose a free TLS entry after several nested execve's. The LDT also has a similar problem, but it is less serious because the LDT code doesn't scan for free entries. I'll probably send a patch to fix this too, unless there is something important relying on this behavior.
-
bk://cifs.bkbits.net/linux-2.5cifsLinus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
John Levon authored
As per comment: restoring APIC_LVTPC can trigger an apic error because the delivery mode and vector nr combination can be illegal. That's by design: on power on apic lvt contain a zero vector nr which are legal only for NMI delivery mode. So inhibit apic err before restoring lvtpc
-
John Levon authored
We need to use u64 because the future 64-bit ports can theoretically return the same value for two different dentries, as pointed out by Ulrich Weigand. The patch also changes return value of the syscall to give length of data copied, needed for valgrind support (this bit is by Philippe Elie). Note this is not a complete fix for mixed 32/64: userspace needs to figure out the kernel pointer size when reading from the buffer. But that's another fix... NOTE! any oprofile users will need to upgrade after this goes in, and the user-space equivalent is checked into CVS. Sorry for the inconvenience
-
Andrew Morton authored
Companion to the previous patch: all the support needed for non-ia32 architectures.
-
Andrew Morton authored
Patch from Ravikiran G Thirumalai <kiran@in.ibm.com> 1. Break out disk stats from kernel_stat and move disk stat to blkdev.h 2. Group cpu stat in kernel_stat and make them "per_cpu" instead of the NR_CPUS array 3. Remove EXPORT_SYMBOL(kstat) from ksyms.c (as I noticed that no module is using kstat)
-
Andrew Morton authored
Uninlines some large functions in the ipc code. Before: text data bss dec hex filename 30226 224 192 30642 77b2 ipc/built-in.o After: text data bss dec hex filename 20274 224 192 20690 50d2 ipc/built-in.o
-
Andrew Morton authored
Patch from Mingming, Rusty, Hugh, Dipankar, me: - It greatly reduces the lock contention by having one lock per id. The global spinlock is removed and a spinlock is added in kern_ipc_perm structure. - Uses ReadCopyUpdate in grow_ary() for locking-free resizing. - In the places where ipc_rmid() is called, delay calling ipc_free() to RCU callbacks. This is to prevent ipc_lock() returning an invalid pointer after ipc_rmid(). In addition, use the workqueue to enable RCU freeing vmalloced entries. Also some other changes: - Remove redundant ipc_lockall/ipc_unlockall - Now ipc_unlock() directly takes IPC ID pointer as argument, avoid extra looking up the array. The changes are made based on the input from Huge Dickens, Manfred Spraul and Dipankar Sarma. In addition, Cliff White has run OSDL's dbt1 test on a 2 way against the earlier version of this patch. Results shows about 2-6% improvement on the average number of transactions per second. Here is the summary of his tests: 2.5.42-mm2 2.5.42-mm2-ipclock ----------------------------- Average over 5 runs 85.0 BT 89.8 BT Std Deviation 5 runs 7.4 BT 1.0 BT Average over 4 best 88.15 BT 90.2 BT Std Deviation 4 best 2.8 BT 0.5 BT Also, another test today from Bill Hartner: I tested Mingming's RCU ipc lock patch using a *new* microbenchmark - semopbench. semopbench was written to test the performance of Mingming's patch. I also ran a 3 hour stress and it completed successfully. Explanation of the microbenchmark is below the results. Here is a link to the microbenchmark source. http://www-124.ibm.com/developerworks/opensource/linuxperf/semopbench/semopbench.c SUT : 8-way 700 Mhz PIII I tested 2.5.44-mm2 and 2.5.44-mm2 + RCU ipc patch >semopbench -g 64 -s 16 -n 16384 -r > sem.results.out >readprofile -m /boot/System.map | sort -n +0 -r > sem.profile.out The metric is seconds / per repetition. Lower is better. kernel run 1 run 2 seconds seconds ================== ======= ======= 2.5.44-mm2 515.1 515.4 2.5.44-mm2+rcu-ipc 46.7 46.7 With Mingming's patch, the test completes 10X faster.
-
Andrew Morton authored
From Hugh Instate Ingo's shmem_populate on top of the previous patches, now using shmem_getpage(,,,SGP_QUICK) for the nonblocking case (its find_lock_page may block, but rarely for long). Note install_page will need redefining if PAGE_CACHE_SIZE departs from PAGE_SIZE; note pgoff to populate must be in terms of PAGE_SIZE; note page_cache_release if install_page fails. filemap_populate similarly needs page_cache_release when install_page fails, but filemap.c not included in this patch since we started out from 2.5.43 rather than 2.5.43-mm2: whereas patches 1-8 could go directly to 2.5.43, this 9/9 belongs with Ingo's population work.
-
Andrew Morton authored
Ingo's remap_file_pages patch. Supported on ia32, x86-64, sparc and sparc64. Others will need to update mman.h and the syscall tables.
-
Andrew Morton authored
With large highmem machines and many small cached files it is possible to encounter ZONE_NORMAL allocation failures. This can be demonstrated with a large number of one-byte files on a 7G machine. All lowmem is filled with icache and all those inodes have a small amount of highmem pagecache which makes them unfreeable. The patch strips the pagecache from inodes as they come off the tail of the inode_unused list. I play tricks in there peeking at the head of the inode_unused list to pick up the inode again after running iput(). The alternatives seemed to involve more widespread changes. Or running invalidate_inode_pages() under inode_lock which would be a bad thing from a scheduling latency and lock contention point of view.
-
Andrew Morton authored
The kernel will presently reclaim swapcache pages as they come off the tail of the inactive list even if they are referenced. That's the "use-once" pagecache path and shouldn't be applied to swapcache pages. This affects very few pages in practice because all those pages tend to be mapped into pagetables anyway.
-
Andrew Morton authored
If we're about to return to userspace after performing some swap readahead, the pages in the deferred-addition LRU queues could stay there for some time. So drain them after performing readahead.
-
Andrew Morton authored
Use lru_cache_add_active() so ensure that pages which are, or will be mapped into pagetables are started out on the active list.
-
Andrew Morton authored
This is the first in a series of patches which tune up the 2.5 performance under heavy swap loads. Throughput on stupid swapstormy tests is increased by 1.5x to 3x. Still about 20% behind 2.4 with multithreaded tests. That is not easily fixable - the virtual scan tends to apply a form of load control: particular processes are heavily swapped out so the others can get ahead. With 2.5 all processes make very even progress and much more swapping is needed. It's on par with 2.4 for single-process swapstorms. In this patch: The code which tries to start mapped pages out on the active list doesn't work very well. It uses an "is it mapped into pagetables" test. Which doesn't work for, say, swap readahead pages. They are not mapped into pagetables when they are spilled onto the LRU. So create a new `lru_cache_add_active()' function for deferred addition of pages to their active list. Also move mark_page_accessed() from filemap.c to swap.c where all similar functions live. And teach it to not try to move pages which are in the deferred-addition list onto the active list. That won't work, and it's bogusly clearing PageReferenced in that case. The deferred-addition lists are a pest. But lru_cache_add used to be really expensive in sime workloads on some machines. Must persist.
-
Andrew Morton authored
Davem said: "Ho hum, it is tricky :-))) At bio_map_user() you need to see the user's most recent write to the page if you are going "user --> device". So if "user --> device" bio_map_user() must flush_dcache_page(). I find the write_to_vm condition confusion which is probably why I am sitting here spelling this out :-) At bio_unmap_user(), if we are going "device --> user" you have to flush_dcache_page(). And actually, this flush could just as legitimately occur at bio_map_user() time. Therefore, the easiest thing to do is always flush_dcache_page() at bio_map_user(). All the other cases are going to be like this, so we might as well cut to the chase and flush_dcache_page() for all the pages inside of get_user_pages()."
-
Andrew Morton authored
Tuned for gcc-2.95.3: filemap.c: 10815 -> 10046 highmem.c: 3392 -> 3104 mmap.c: 5998 -> 5854 mremap.c: 3058 -> 2802 msync.c: 1521 -> 1489 page_alloc.c: 8487 -> 8167
-
Andrew Morton authored
[I was going to send shared pagetables today, but it failed in my testing under X :( ] the first one is an mmap inefficiency that was reported by Saurabh Desai. The test_str02 NPTL test-utility does the following: it tests the maximum number of threads by creating a new thread, which thread creates a new thread itself, etc. It basically creates thousands of parallel threads, which means thousands of thread stacks. NPTL uses mmap() to allocate new default thread stacks - and POSIX requires us to install a 'guard page' as well, which is done via mprotect(PROT_NONE) on the first page of the stack. This means that tons of NPTL threads means 2* tons of vmas per MM, all allocated in a forward fashion starting at the virtual address of 1 GB (TASK_UNMAPPED_BASE). Saurabh reported a slowdown after the first couple of thousands of threads, which i can reproduce as well. The reason for this slowdown is the get_unmapped_area() implementation, which tries to achieve the most compact virtual memory allocation, by searching for the vma at TASK_UNMAPPED_BASE, and then linearly searching for a hole. With thousands of linearly allocated vmas this is an increasingly painful thing to do ... obviously, high-performance threaded applications will create stacks without the guard page, which triggers the anon-vma merging code so we end up with one large vma, not tons of small vmas. it's also possible for userspace to be smarter by setting aside a stack space and keeping a bitmap of allocated stacks and using MAP_FIXED (this also enables it to do the guard page not via mprotect() but by keeping the stacks apart by 1 page - ie. half the number of vmas) - but this also decreases flexibility. So i think that the default behavior nevertheless makes sense as well, so IMO we should optimize it in the kernel. there are various solutions to this problem, none of which solve the problem in a 100% sufficient way, so i went for the simplest approach: i added code to cache the 'last known hole' address in mm->free_area_cache, which is used as a hint to get_unmapped_area(). this fixed the test_str02 testcase wonderfully, thread creation performance for this testcase is O(1) again, but this simpler solution obviously has a number of weak spots, and the (unlikely but possible) worst-case is quite close to the current situation. In any case, this approach does not sacrifice the perfect VM compactness out mmap() implementation achieves, so it's a performance optimization with no externally visible consequences. The most generic and still perfectly-compact VM allocation solution would be to have a vma tree for the 'inverse virtual memory space', ie. a tree of free virtual memory ranges, which could be searched and iterated like the space of allocated vmas. I think we could do this by extending vmas, but the drawback is larger vmas. This does not save us from having to scan vmas linearly still, because the size constraint is still present, but at least most of the anon-mmap activities are constant sized. (both malloc() and the thread-stack allocator uses mostly fixed sizes.) This patch contains some fixes from Dave Miller - on some architectures it is not posible to evaluate TASK_UNMAPPED_BASE at compile-time.
-
Andrew Morton authored
This is Al's implementation of the Orlov block allocator for ext2. At least doubles the throughput for the traverse-a-kernel-tree test and is well tested. I still need to do the ext3 version. No effort has been put into tuning it at this time, so more gains are probably possible.
-
bk://ldm.bkbits.net/linux-2.5-kobjectLinus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Patrick Mochel authored
This allows subsystems to exist the hierarchy, but not be exported via the filesystem. This fixes a minor flaw with partitions, as partition objects are children of block devices, though they register with the partition subsystem. Really, the partition subsystem shouldn't have presence in the tree at all, yet still exist.
-
http://gkernel.bkbits.net/alpha-2.5Linus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
bk://ldm.bkbits.net/linux-2.5-kobjectLinus Torvalds authored
into home.transmeta.com:/home/torvalds/v2.5/linux
-
Steve French authored
-
Maksim Krasnyanskiy authored
- Convert /proc/bluetoth/l2cap to seq_file - Convert /proc/bluetoth/rfcomm to seq_file - Convert /proc/bluetooth/sco to seq_file - Export HCI device info via /proc/bluetooth/hci/N
-
Jeff Garzik authored
-
Patrick Mochel authored
-
Patrick Mochel authored
-
Patrick Mochel authored
-
Patrick Mochel authored
-