1. 03 Jan, 2005 20 commits
    • Miquel van Smoorenburg's avatar
      [PATCH] mark_page_accessed() for read()s on non-page boundaries · 21adf7ac
      Miquel van Smoorenburg authored
      When reading a (partial) page from disk using read(), the kernel only marks
      the page as "accessed" if the read started at a page boundary.  This means
      that files that are accessed randomly at non-page boundaries (usually
      database style files) will not be cached properly.
      
      The patch below uses the readahead state instead.  If a page is read(), it
      is marked as "accessed" if the previous read() was for a different page,
      whatever the offset in the page.
      
      Testing results:
      
      
      - Boot kernel with mem=128M
      
      - create a testfile of size 8 MB on a partition. Unmount/mount.
      
      - then generate about 10 MB/sec streaming writes
      
      	for i in `seq 1 1000`
      	do
      		dd if=/dev/zero of=junkfile.$i bs=1M count=10
      		sync
      		cat junkfile.$i > /dev/null
      		sleep 1
      	done
      
      - use an application that reads 128 bytes 64000 times from a
        random offset in the 64 MB testfile.
      
      1. Linux 2.6.10-rc3 vanilla, no streaming writes:
      
      # time ~/rr testfile
      Read 128 bytes 64000 times
      ~/rr testfile  0.03s user 0.22s system 5% cpu 4.456 total
      
      2. Linux 2.6.10-rc3 vanilla, streaming writes:
      
      # time ~/rr testfile
      Read 128 bytes 64000 times
      ~/rr testfile  0.03s user 0.16s system 2% cpu 7.667 total
      # time ~/rr testfile
      Read 128 bytes 64000 times
      ~/rr testfile  0.03s user 0.37s system 1% cpu 23.294 total
      # time ~/rr testfile
      Read 128 bytes 64000 times
      ~/rr testfile  0.02s user 0.99s system 1% cpu 1:11.52 total
      # time ~/rr testfile
      Read 128 bytes 64000 times
      ~/rr testfile  0.03s user 0.21s system 2% cpu 10.273 total
      
      3. Linux 2.6.10-rc3 with read-page-access.patch , streaming writes:
      
      # time ~/rr testfile
      Read 128 bytes 64000 times
      ~/rr testfile  0.02s user 0.21s system 3% cpu 7.634 total
      # time ~/rr testfile
      Read 128 bytes 64000 times
      ~/rr testfile  0.04s user 0.22s system 2% cpu 9.588 total
      # time ~/rr testfile
      Read 128 bytes 64000 times
      ~/rr testfile  0.02s user 0.12s system 24% cpu 0.563 total
      # time ~/rr testfile
      Read 128 bytes 64000 times
      ~/rr testfile  0.03s user 0.13s system 98% cpu 0.163 total
      
      As expected, with the read-page-access.patch, the kernel keeps the 8 MB
      testfile cached as expected, while without it, it doesn't.
      
      So this is useful for workloads where one smallish (wrt RAM) file is read
      randomly over and over again (like heavily used database indexes), while
      other I/O is going on.  Plain 2.6 caches those files poorly, if the app
      uses plain read().
      Signed-Off-By: default avatarMiquel van Smoorenburg <miquels@cistron.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      21adf7ac
    • Dave Hansen's avatar
      [PATCH] make sure ioremap only tests valid addresses · bbd4c45d
      Dave Hansen authored
      When CONFIG_HIGHMEM=y, but ZONE_NORMAL isn't quite full, there is, of
      course, no actual memory at *high_memory.  This isn't a problem with normal
      virt<->phys translations because it's never dereferenced, but
      CONFIG_NONLINEAR is a bit more finicky.  So, don't do virt_to_phys() to
      non-existent addresses.
      Signed-off-by: default avatarDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bbd4c45d
    • Dave Hansen's avatar
      [PATCH] kill off highmem_start_page · 422e43d4
      Dave Hansen authored
      People love to do comparisons with highmem_start_page.  However, where
      CONFIG_HIGHMEM=y and there is no actual highmem, there's no real page at
      *highmem_start_page.
      
      That's usually not a problem, but CONFIG_NONLINEAR is a bit more strict and
      catches the bogus address tranlations. 
      
      There are about a gillion different ways to find out of a 'struct page' is
      highmem or not.  Why not just check page_flags?  Just use PageHighMem()
      wherever there used to be a highmem_start_page comparison.  Then, kill off
      highmem_start_page.
      
      This removes more code than it adds, and gets rid of some nasty
      #ifdefs in .c files.
      Signed-off-by: default avatarDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      422e43d4
    • Andries E. Brouwer's avatar
      [PATCH] mm: overcommit updates · ea86630e
      Andries E. Brouwer authored
      Alan made overcommit mode 2 and it doesnt work at all.  A process passing
      the limit often does so at a moment of stack extension, and is killed by a
      segfault, not better than being OOM-killed.
      
      Another problem is that close to the edge no other processes can be
      started, so that a sysadmin has problems logging in and investigating.
      
      Below a patch that does 3 things:
      
      (1) It reserves a reasonable amount of virtual stack space (amount
          randomly chosen, no guarantees given) when the process is started, so
          that the common utilities will not be killed by segfault on stack
          extension.
      
      (2) It reserves a reasonable amount of virtual memory for root, so that
          root can do things when the system is out-of-memory
      
      (3) It limits a single process to 97% of what is left, so that also an
          ordinary user is able to use getty, login, bash, ps, kill and similar
          things when one of her processes got out of control.
      
      Since the current overcommit mode 2 is not really useful, I did not give
      this a new number.
      
      The patch is just for playing, not to be applied by Linus.  But, Andrew, I
      hope that you would be willing to put this in -mm so that people can
      experiment.  Of course it only does something if one sets overcommit mode
      to 2.
      
      The past month I have pressured people asking for feedback, and now have
      about a dozen reports, mostly positive, one very positive.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ea86630e
    • Andrea Arcangeli's avatar
      [PATCH] mempolicy optimisation · 182e0eba
      Andrea Arcangeli authored
      Some optimizations in mempolicy.c (like to avoid rebalancing the tree while
      destroying it and by breaking loops early and not checking for invariant
      conditions in the replace operation).
      Signed-off-by: default avatarAndrea Arcangeli <andrea@novell.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      182e0eba
    • Ram Pai's avatar
      [PATCH] Simplified readahead congestion control · 250c01d0
      Ram Pai authored
      Reinstate the feature wherein readahead will be bypassed if the underlying
      queue is read-congersted.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      250c01d0
    • Steven Pratt's avatar
      [PATCH] Simplified readahead · 6f734a1a
      Steven Pratt authored
      With Ram Pai <linuxram@us.ibm.com>
      
      - request size is now passed into page_cache_readahead.  This allows the
        removal of the size averaging code in the current readahead logic.
      
      - readahead rampup is now faster  (especially for larger request sizes)
      
      - No longer "slow read path".  Readahead is turn off at first random access,
        turned back on at first sequential access.
      
      - Code now handles thrashing, slowly reducing readahead window until
        thrashing stops, or min size reached.
      
      - Returned to old behavior where first access is assumed sequential only if
        at offset 0.
      
      - designed to handle larger (1M or above) window sizes efficiently
      
      
      Benchmark results:
      
      machine 1: 8 way pentiumIV 1GB memory, tests run to 36GB SCSI disk
      (Similar results were seen on a 1 way 866Mhz box with IDE disk.)
      
      TioBench:
      
      tiobench.pl --dir /mnt/tmp --block 4096 --size 4000 --numruns 2 --threads 1(4,16,64)
      
      4k request size sequential read results in MB/sec
      
        Threads         2.6.9    w/patches    %diff         diff
      6f734a1a
    • Nick Piggin's avatar
      [PATCH] mm: teach kswapd about higher order areas · d4cf1012
      Nick Piggin authored
      Teach kswapd to free memory on behalf of higher order allocators.  This
      could be important for higher order atomic allocations because they
      otherwise have no means to free the memory themselves.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d4cf1012
    • Nick Piggin's avatar
      [PATCH] mm: higher order watermarks · 206ca74e
      Nick Piggin authored
      Move the watermark checking code into a single function.  Extend it to
      account for the order of the allocation and the number of free pages that
      could satisfy such a request.
      
      From: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
      
      Fix typo in Nick's kswapd-high-order awareness patch
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      206ca74e
    • Nick Piggin's avatar
      [PATCH] mm: keep count of free areas · f86789bc
      Nick Piggin authored
      Keep track of the number of free pages of each order in the buddy allocator.
      Signed-off-by: default avatarNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f86789bc
    • Ron Murray's avatar
      [PATCH] CS461x gameport code isn't being included in build · 7aee2fc8
      Ron Murray authored
      With Cal Peake <cp@absolutedigital.net>
      
      I've found a typo in drivers/input/gameport/Makefile in kernel 2.6.9 which
      effectively prevents the CS461x gameport code from being included.
      Signed-off-by: default avatarRon Murray <rjmx@rjmx.net>
      Signed-off-by: default avatarCal Peake <cp@absolutedigital.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7aee2fc8
    • Andrew Morton's avatar
      [PATCH] vmscan: total_scanned fix · aa0baf35
      Andrew Morton authored
      We haven't been incrementing local variable total_scanned since the
      scan_control stuff went in.  That broke kswapd throttling.
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      aa0baf35
    • Jan Kara's avatar
      [PATCH] Allow disabling quota messages to console · cdd39d34
      Jan Kara authored
      Allow disabling of quota messages to console (they can disturb other
      output).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      cdd39d34
    • Jan Kara's avatar
      [PATCH] Fix of quota deadlock on pagelock: reiserfs · 04a6c897
      Jan Kara authored
      Implement quota journaling and quota reading and writing functions for
      reiserfs.  Solves also several other deadlocks possible for reiserfs due to
      the lock inversion on journal_begin and quota locks.
      
      From: Vladimir Saveliev <vs@namesys.com>
      
      When CONFIG_QUOTA is defined reiserfs's finish_unfinished sets and clears
      MS_ACTIVE bit in s_flags field of super block.  If that bit was set already
      it should not be set.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      04a6c897
    • Jan Kara's avatar
      [PATCH] Fix of quota deadlock on pagelock: ext3 · 98887122
      Jan Kara authored
      Implementation of quota reading and writing functions for ext3.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      98887122
    • Jan Kara's avatar
      [PATCH] Fix of quota deadlock on pagelock: ext2 · 6b394613
      Jan Kara authored
      Implementation of quota reading and writing functions for ext2.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6b394613
    • Jan Kara's avatar
      [PATCH] quota umount race fix · 84f308c2
      Jan Kara authored
      Fix possible races between umount and quota on/off.
      
      Finally I decided to take a reference to vfsmount during vfs_quota_on() and
      to drop it after the final cleanup in the vfs_quota_off().  This way we
      should be all the time guarded against umount.  This way was protected also
      the old code which used filp_open() for opening quota files.  I was also
      thinking about other ways of protection but there would be always a window
      (provided I don't want to play much with namespace locks) where
      vfs_quota_on() could be called while umount() is in progress resulting in
      the "Busy inodes after unmount" messages...
      
      Get a reference to vfsmount during quotaon() so that we are guarded against
      umount (as was the old code using filp_open()).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      84f308c2
    • Jan Kara's avatar
      [PATCH] Fix of quota deadlock on pagelock: quota core · cf684334
      Jan Kara authored
      The four patches in this series fix deadlocks with quotas of pagelock (the
      problem was lock inversion on PageLock and transaction start - quota code
      needed to first start a transaction and then write the data which subsequently
      needed acquisition of PageLock while the standard ordering - PageLock first
      and transaction start later - was used e.g.  by pdflush).  They implement a
      new way of quota access to disk: Every filesystem that would like to implement
      quotas now has to provide quota_read() and quota_write() functions.  These
      functions must obey quota lock ordering (in particular they should not take
      PageLock inside a transaction).
      
      The first patch implements the changes in the quota core, the other three
      patches implement needed functions in ext2, ext3 and reiserfs.  The patch for
      reiserfs also fixes several other lock inversion problems (similar as ext3
      had) and implements the journaled quota functionality (which comes almost for
      free after the locking fixes...).
      
      The quota core patch makes quota support in other filesystems (except XFS
      which implements everything on its own ;)) unfunctional (quotaon() will refuse
      to turn on quotas on them).  When the patches get reasonable wide testing and
      it will seem that no major changes will be needed I can make fixes also for
      the other filesystems (JFS, UDF, UFS).
      
      This patch:
      
      The patch implements the new way of quota io in the quota core.  Every
      filesystem wanting to support quotas has to provide functions quota_read()
      and quota_write() obeying quota locking rules.  As the writes and reads
      bypass the pagecache there is some ugly stuff ensuring that userspace can
      see all the data after quotaoff() (or Q_SYNC quotactl).  In future I plan
      to make quota files inaccessible from userspace (with the exception of
      quotacheck(8) which will take care about the cache flushing and such stuff
      itself) so that this synchronization stuff can be removed...
      
      The rewrite of the quota core. Quota uses the filesystem read() and write()
      functions no more to avoid possible deadlocks on PageLock. From now on every
      filesystem supporting quotas must provide functions quota_read() and
      quota_write() which obey the quota locking rules (e.g. they cannot acquire the
      PageLock).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      cf684334
    • Jan Kara's avatar
      [PATCH] Fix reiserfs quota debug messages · 6ffc2881
      Jan Kara authored
      Attached patch fixes debug messages of quota code in reiserfs so that they
      compile.  Chris Mason agreed the patch.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6ffc2881
    • Jan Kara's avatar
      [PATCH] Expose reiserfs_sync_fs() · 3bc5bf4e
      Jan Kara authored
      Attached patch exposes reiserfs_sync_fs().  This call is needed by the new
      quota code to write data to disk on quotaoff so that userspace can see them
      afterwards.  Chris Mason agrees with the patch.
      
      Make reiserfs provide the sync_fs() function so that the quota code
      has a way to reliably force a transaction to disk.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3bc5bf4e
  2. 02 Jan, 2005 20 commits