1. 31 Oct, 2002 40 commits
    • Andrew Morton's avatar
      [PATCH] strip pagecache from to-be-reaped inodes · f9a316fa
      Andrew Morton authored
      With large highmem machines and many small cached files it is possible
      to encounter ZONE_NORMAL allocation failures.  This can be demonstrated
      with a large number of one-byte files on a 7G machine.
      
      All lowmem is filled with icache and all those inodes have a small
      amount of highmem pagecache which makes them unfreeable.
      
      The patch strips the pagecache from inodes as they come off the tail of
      the inode_unused list.
      
      I play tricks in there peeking at the head of the inode_unused list to
      pick up the inode again after running iput().  The alternatives seemed
      to involve more widespread changes.
      
      Or running invalidate_inode_pages() under inode_lock which would be a
      bad thing from a scheduling latency and lock contention point of view.
      f9a316fa
    • Andrew Morton's avatar
      [PATCH] exempt swapcahe pages from "use once" handling · 1bbb1949
      Andrew Morton authored
      The kernel will presently reclaim swapcache pages as they come off the
      tail of the inactive list even if they are referenced.  That's the
      "use-once" pagecache path and shouldn't be applied to swapcache pages.
      
      This affects very few pages in practice because all those pages tend to
      be mapped into pagetables anyway.
      1bbb1949
    • Andrew Morton's avatar
      [PATCH] empty the deferred lru-addition buffers in swapin_readahead · e550cf78
      Andrew Morton authored
      If we're about to return to userspace after performing some swap
      readahead, the pages in the deferred-addition LRU queues could stay
      there for some time.  So drain them after performing readahead.
      e550cf78
    • Andrew Morton's avatar
      [PATCH] start anon pages on the active list (properly this time) · 33709b5c
      Andrew Morton authored
      Use lru_cache_add_active() so ensure that pages which are, or will be
      mapped into pagetables are started out on the active list.
      33709b5c
    • Andrew Morton's avatar
      [PATCH] lru_add_active(): for starting pages on the active list · 228c3d15
      Andrew Morton authored
      This is the first in a series of patches which tune up the 2.5
      performance under heavy swap loads.
      
      Throughput on stupid swapstormy tests is increased by 1.5x to 3x.
      Still about 20% behind 2.4 with multithreaded tests.  That is not
      easily fixable - the virtual scan tends to apply a form of load
      control: particular processes are heavily swapped out so the others can
      get ahead.  With 2.5 all processes make very even progress and much
      more swapping is needed.  It's on par with 2.4 for single-process
      swapstorms.
      
      
      In this patch:
      
      The code which tries to start mapped pages out on the active list
      doesn't work very well.  It uses an "is it mapped into pagetables"
      test.  Which doesn't work for, say, swap readahead pages.  They are not
      mapped into pagetables when they are spilled onto the LRU.
      
      So create a new `lru_cache_add_active()' function for deferred addition
      of pages to their active list.
      
      Also move mark_page_accessed() from filemap.c to swap.c where all
      similar functions live.  And teach it to not try to move pages which
      are in the deferred-addition list onto the active list.  That won't
      work, and it's bogusly clearing PageReferenced in that case.
      
      The deferred-addition lists are a pest.  But lru_cache_add used to be
      really expensive in sime workloads on some machines.  Must persist.
      228c3d15
    • Andrew Morton's avatar
      [PATCH] flush_dcache_page in get_user_pages() · e735f278
      Andrew Morton authored
      Davem said:
      
      "Ho hum, it is tricky :-)))
      
       At bio_map_user() you need to see the user's most recent write to the
       page if you are going "user --> device".  So if "user --> device"
       bio_map_user() must flush_dcache_page().
      
       I find the write_to_vm condition confusion which is probably why I am
       sitting here spelling this out :-)
      
       At bio_unmap_user(), if we are going "device --> user" you have to
       flush_dcache_page().  And actually, this flush could just as
       legitimately occur at bio_map_user() time.
      
       Therefore, the easiest thing to do is always flush_dcache_page() at
       bio_map_user().
      
       All the other cases are going to be like this, so we might as well
       cut to the chase and flush_dcache_page() for all the pages inside of
       get_user_pages()."
      e735f278
    • Andrew Morton's avatar
      [PATCH] uninline some things in mm/*.c · 79425084
      Andrew Morton authored
      Tuned for gcc-2.95.3:
      
      	filemap.c:	10815 -> 10046
      	highmem.c:	3392 -> 3104
      	mmap.c:		5998 -> 5854
      	mremap.c:	3058 -> 2802
      	msync.c:	1521 -> 1489
      	page_alloc.c:	8487 -> 8167
      79425084
    • Andrew Morton's avatar
      [PATCH] speedup heuristic for get_unmapped_area · 631709da
      Andrew Morton authored
      [I was going to send shared pagetables today, but it failed in
       my testing under X :( ]
      
      the first one is an mmap inefficiency that was reported by Saurabh Desai.
      The test_str02 NPTL test-utility does the following: it tests the maximum
      number of threads by creating a new thread, which thread creates a new
      thread itself, etc. It basically creates thousands of parallel threads,
      which means thousands of thread stacks.
      
      NPTL uses mmap() to allocate new default thread stacks - and POSIX
      requires us to install a 'guard page' as well, which is done via
      mprotect(PROT_NONE) on the first page of the stack. This means that tons
      of NPTL threads means 2* tons of vmas per MM, all allocated in a forward
      fashion starting at the virtual address of 1 GB (TASK_UNMAPPED_BASE).
      
      Saurabh reported a slowdown after the first couple of thousands of
      threads, which i can reproduce as well. The reason for this slowdown is
      the get_unmapped_area() implementation, which tries to achieve the most
      compact virtual memory allocation, by searching for the vma at
      TASK_UNMAPPED_BASE, and then linearly searching for a hole. With thousands
      of linearly allocated vmas this is an increasingly painful thing to do ...
      
      obviously, high-performance threaded applications will create stacks
      without the guard page, which triggers the anon-vma merging code so we end
      up with one large vma, not tons of small vmas.
      
      it's also possible for userspace to be smarter by setting aside a stack
      space and keeping a bitmap of allocated stacks and using MAP_FIXED (this
      also enables it to do the guard page not via mprotect() but by keeping the
      stacks apart by 1 page - ie. half the number of vmas) - but this also
      decreases flexibility.
      
      So i think that the default behavior nevertheless makes sense as well, so
      IMO we should optimize it in the kernel.
      
      there are various solutions to this problem, none of which solve the
      problem in a 100% sufficient way, so i went for the simplest approach: i
      added code to cache the 'last known hole' address in mm->free_area_cache,
      which is used as a hint to get_unmapped_area().
      
      this fixed the test_str02 testcase wonderfully, thread creation
      performance for this testcase is O(1) again, but this simpler solution
      obviously has a number of weak spots, and the (unlikely but possible)
      worst-case is quite close to the current situation. In any case, this
      approach does not sacrifice the perfect VM compactness out mmap()
      implementation achieves, so it's a performance optimization with no
      externally visible consequences.
      
      The most generic and still perfectly-compact VM allocation solution would
      be to have a vma tree for the 'inverse virtual memory space', ie. a tree
      of free virtual memory ranges, which could be searched and iterated like
      the space of allocated vmas. I think we could do this by extending vmas,
      but the drawback is larger vmas. This does not save us from having to scan
      vmas linearly still, because the size constraint is still present, but at
      least most of the anon-mmap activities are constant sized. (both malloc()
      and the thread-stack allocator uses mostly fixed sizes.)
      
      This patch contains some fixes from Dave Miller - on some architectures
      it is not posible to evaluate TASK_UNMAPPED_BASE at compile-time.
      631709da
    • Andrew Morton's avatar
      [PATCH] Orlov block allocator for ext2 · b2205dc0
      Andrew Morton authored
      This is Al's implementation of the Orlov block allocator for ext2.
      
      At least doubles the throughput for the traverse-a-kernel-tree
      test and is well tested.
      
      I still need to do the ext3 version.
      
      No effort has been put into tuning it at this time, so more gains
      are probably possible.
      b2205dc0
    • Linus Torvalds's avatar
      Merge bk://ldm.bkbits.net/linux-2.5-kobject · 4856e09e
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      4856e09e
    • Patrick Mochel's avatar
      kobject: don't create directory for kobject/subsystem if name is NULL. · b053262f
      Patrick Mochel authored
      This allows subsystems to exist the hierarchy, but not be exported via
      the filesystem. This fixes a minor flaw with partitions, as partition
      objects are children of block devices, though they register with the 
      partition subsystem. Really, the partition subsystem shouldn't have 
      presence in the tree at all, yet still exist.
      b053262f
    • Linus Torvalds's avatar
      Merge http://gkernel.bkbits.net/alpha-2.5 · 1baa95c5
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      1baa95c5
    • Linus Torvalds's avatar
      Merge bk://ldm.bkbits.net/linux-2.5-kobject · 6dc1ec37
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      6dc1ec37
    • Jeff Garzik's avatar
      Fix alpha build. · f32abcc0
      Jeff Garzik authored
      f32abcc0
    • Patrick Mochel's avatar
      turn off kobject debugging by default. · 24555ac2
      Patrick Mochel authored
      24555ac2
    • Patrick Mochel's avatar
      driverfs: die die die · 808897cf
      Patrick Mochel authored
      808897cf
    • Patrick Mochel's avatar
      convert edd to use kobjects and sysfs. · 60211581
      Patrick Mochel authored
      60211581
    • Patrick Mochel's avatar
    • Patrick Mochel's avatar
    • Patrick Mochel's avatar
      acpi: convert to use kobjects and sysfs. · 8bebafe7
      Patrick Mochel authored
      - replace driver_dir_entry in acpi_device with struct kobject.
      - register acpi with firmware subsystem on startup.
      - register sub-subsystem.
      - put namespace hierarchy under that.
      8bebafe7
    • Patrick Mochel's avatar
      c408284c
    • Linus Torvalds's avatar
      Merge bk://extfs.bkbits.net/extfs-2.5-update · 6e6e099b
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      6e6e099b
    • Gerd Knorr's avatar
      [PATCH] new v4l2 driver: saa7134 · e12a2bd0
      Gerd Knorr authored
      This patch adds a new device driver to the linux kernel.  It is for TV
      cards based on the Philips SAA7134 chip.  It supports the v4l2 API and
      thus depends on the v4l2 patches of the previous mails.
      e12a2bd0
    • Gerd Knorr's avatar
      [PATCH] bttv update · 8354eb8a
      Gerd Knorr authored
      This updates the bttv driver.  Major changes are (a) adaptions to the
      final v4l2 API and (b) lots of updates in the card-specific code.  There
      are also various other small changes.
      8354eb8a
    • Gerd Knorr's avatar
      [PATCH] bttv documentation update · 3344276f
      Gerd Knorr authored
      3344276f
    • Gerd Knorr's avatar
      [PATCH] tv tuner driver update · 055c508e
      Gerd Knorr authored
      This is a update for the tv tuner module.  It makes the descriptions
      more verbose and also has some minor bugfixes + cleanups.
      055c508e
    • Gerd Knorr's avatar
      [PATCH] add v4l2 api · e028b61b
      Gerd Knorr authored
      This adds the v4l2 API to the linux kernel.
      
      The first, original video4linux API has a number of design bugs.  They
      are fixed in this new API revision.  It already exists for quite some
      time.  Last weeks it got a number of cleanups based on the experiences
      of the last years (drop stuff nobody uses, fix some inconsistencies).
      We consider it being in a pretty good shape now and like to see it in
      2.6.
      
      This patch is basically the header file with all the structs and ioctls
      in there.  A small module with some helper functions for v4l2 drivers is
      included too.  Related updates (bttv, ...) will follow as separate
      patches.
      e028b61b
    • Gerd Knorr's avatar
      [PATCH] videobuf update · b7649ef7
      Gerd Knorr authored
      This updates the video-buf.c module (helper module for video buffer
      management).  Some memory management fixes, also some adaptions to the
      final v4l2 api.
      b7649ef7
    • Robert Love's avatar
      [PATCH] decoded wchan in /proc · 2f61876e
      Robert Love authored
      This implements a pre-decoded wchan in /proc using kallsyms.  I.e.:
      
              [21:23:17]rml@phantasy:~$ cat /proc/1228/wchan
              wait4
      
      Which, aside from being cool, means procps will not have to parse
      Sysyem.map for each process.  In fact, procps will no longer require
      System.map.
      
      If CONFIG_KALLSYMS is not enabled, /proc/#/wchan does not exist to
      conserve memory.  Regardless of CONFIG_KALLSYMS's value, the old wchan
      field in /proc/#/stat still exists.
      
      I have a procps patch I will merge once this is in your tree.
      2f61876e
    • Robert Love's avatar
      [PATCH] hyper-threading info in /proc/cpuinfo · 3ccd5369
      Robert Love authored
      This adds hyper-threading information to /proc/cpuinfo, if relevant: the
      physical processor id and the number of sibling units in this core.
      
      The naming of the fields were debated a bit on lkml and the names below
      offend the least number of people, do not break glibc, and are the same
      as those in 2.4-ac.
      
      This is in 2.4-ac, 2.5-mm, and vendor kernels from RedHat, SuSE, etc.
      3ccd5369
    • John Levon's avatar
      [PATCH] oprofile: tiny makefile tidy · 14029c7b
      John Levon authored
      14029c7b
    • John Levon's avatar
      [PATCH] fix timer_pit.c warning · 213afbef
      John Levon authored
      make x86_do_profile available when UP=y,LOCAL_APIC=n
      213afbef
    • Andrew Morton's avatar
      [PATCH] hugetlbfs backing for SYSV shared memory · bba2dd58
      Andrew Morton authored
      From Bill Irwin
      
      Optionally back priviled processes' shm with hugetlbfs.
      
      One of the more common requests for and/or users of hugetlb interfaces
      in general are databases using shm.  This patch exports functionality
      mostly equivalent to tmpfs, adds the calling sequence to ipc/shm.c, and
      hashes out a small support function in fs/hugetlbfs/inode.c so that shm
      segments may be hugetlbpage-backed if userspace passes a flag to
      shmget().
      
      Access to this resource requires CAP_IPC_LOCK.
      bba2dd58
    • Andrew Morton's avatar
      [PATCH] hugetlbfs file system · 9f3336ab
      Andrew Morton authored
      From Bill Irwin
      
      Tiny hugetlbpage ram-backed filesystem.
      
      Some way to export hugetlbfs through more standard system call
      interfaces was needed, and hugetlbfs already had inodes with ratnodes
      etc.  used to track offset -> page translations, so adding the rest of
      a filesystem around it was easy and natural.  Most of it is identical
      to ramfs, except ->f_op->mmap() is now just a wrapper around the
      hugetlb_prefault() to fill in the VMA, and to simplify it,
      ->readpage(), ->prepare_write(), and ->commit_write() are omitted.
      
      Permissions:
      
      (1) check capable(CAP_IPC_LOCK) in ->f_ops->mmap
              This may be redundant but it errors out with less state to
              clean up and at least clarifies the fact that checks are
              being performed at the relevant entry points.
      
      (2) check capable(CAP_IPC_LOCK) in hugetlbfs_zero_setup()
              This is called at shmget() time and is an actual potential
              security hole. hugetlb_prefault() does not perform this
              check itself, so it must be done here.
      9f3336ab
    • Andrew Morton's avatar
      [PATCH] fix hugetlb thinko · 1541c38b
      Andrew Morton authored
      It's setting the page count on the wrong page.
      1541c38b
    • Andrew Morton's avatar
      [PATCH] hugetlb fixes andhugetlb fixes and cleanups cleanups · b2229e8d
      Andrew Morton authored
      huge_page_release()             -- hugepage refcounting
      free_huge_page()                -- separates freeing from inode refcounting
      unmap_hugepage_range()          -- unmapping refcounting hook when locked
      zap_hugepage_range()            -- unmappping refcounting hook when unlocked
      export setattr_mask()           -- hugetlbfs wants to call it
      export destroy_inode()          -- hugetlbfs wants to use it
      export unmap_vma()              -- hugetlbpage.c wants to use it
      unlock_page() in hugetlbpage.c  -- fixes deadlock in hugetlbfs_truncate()
      b2229e8d
    • Andrew Morton's avatar
      [PATCH] Move hugetlb declarations into their own header · 5c7eb9d8
      Andrew Morton authored
      From Bill Irwin
      
      Move hugetlb and hugetlbfs declarations into a dedicated header file.
      
      Hugetlb's big #ifdeffed block in mm.h got a lot bigger with hugetlbfs.
      This patch basically attempts to remove the noise from mm.h by simply
      rearranging it into a new header, and fixing all users of hugetlb.
      5c7eb9d8
    • Andrew Morton's avatar
      [PATCH] hugetlbpages: factor out some code for hugetlbfs · d38c229c
      Andrew Morton authored
      In order for hugetlbfs to operate, prefaulting the vma at mmap()-time
      while simultaneously instantiating and performing lookups on its
      ratcache entries is needed as an isolated operation.  This is
      implemented as part of a different function within hugetlbpage.c that
      ties it to inode and key lookup and allocation.
      
      The following patch simply moves the code already present into its own
      function, calls it, and makes it available for hugetlbfs to use.
      d38c229c
    • Roman Zippel's avatar
      [PATCH] check QT only if needed · e66c772c
      Roman Zippel authored
      On Wed, 30 Oct 2002, Aaron Lehmann wrote:
      >
      > Now running 'make oldconfig' or 'make menuconfig' requires a Qt
      > installation. I believe that this is a bug because these still work
      > fine without Qt when the -k flag is passed to make.
      
      Yes, it's a bug. This fixes it without breaking xconfig.
      e66c772c
    • Patrick Mochel's avatar
      convert do_mounts.c to use sysfs instead of driverfs. · d48a8e6e
      Patrick Mochel authored
      Also, update path to look for devices in to reflect placement of block
      subsystem at top level.
      d48a8e6e