1. 05 Oct, 2002 29 commits
    • Kai Germaschewski's avatar
      kbuild: Fix arch/i386/boot clean targets · 08e57ddd
      Kai Germaschewski authored
      We removed some files which are long since dead, but on the other
      hand forgot some of the current ones.
      
      Also, add a missing ) in a warning (introduced and fixed by Sam Ravnborg ;)
      08e57ddd
    • Kai Germaschewski's avatar
      kbuild: fix make -jN warnings · 16aadefd
      Kai Germaschewski authored
      If you hide the sub-make in a function, 'make' needs a little help...
      16aadefd
    • Kai Germaschewski's avatar
      kbuild: Put .bss back to the end of vmlinux · 345af2c9
      Kai Germaschewski authored
      The kallsyms patches added __kallsyms as last section into vmlinux,
      behind .bss.
      
      This was done to save two additional kallsyms passes, since as the
      added section was last, it did not change the symbols before it.
      
      With the new infrastructure in the top-level Makefile, we do not need
      to do full relinks for these passes, so they are cheaper. We now
      use one additional link/kallsyms run to be able to place the __kallsyms
      section before .bss. The other pass is saved by adding an empty but 
      allocated __kallsyms section in kernel/kallsyms.c, so the first kallsyms
      pass already generates a section of the final size.
      345af2c9
    • Kai Germaschewski's avatar
      kbuild: Generalize adding of additional sections to vmlinux · 8cc7a297
      Kai Germaschewski authored
      kallsyms needs to actually have a final vmlinux to extract the symbols,
      and then add this information as a new section to the final vmlinux.
      
      Currently, we basically just do the vmlinux link twice, adding
      .tmp_kallsyms.o the second time. However, it's actually possible to just
      link together the temporary vmlinux generated the first time and the
      new object file directly without going back to all the single parts
      that the temporary vmlinux was linked from.
      
      This mechanism should be useful for sparc as well, where the btfix
      mechanism needs an already linked vmlinux, too.
      
      IMPORTANT: This does only work as desired if the link script can be
      used recursively, i.e.
      
      ld <flags> -T arch/$(ARCH)/vmlinux.lds.s -o vmlinux.test vmlinux
      
      generates a vmlinux.test which is identical to vmlinux.
      arch/i386/vmlinux.lds.S needed a little tweaking, so probably the
      other archs do as well.
      8cc7a297
    • Kai Germaschewski's avatar
      Merge tp1.ruhr-uni-bochum.de:/home/kai/src/kernel/v2.5/linux-2.5 · 91990be9
      Kai Germaschewski authored
      into tp1.ruhr-uni-bochum.de:/home/kai/src/kernel/v2.5/linux-2.5.make
      91990be9
    • Kai Germaschewski's avatar
      kbuild: Don't descend into arch/i386/boot · abcdaf4b
      Kai Germaschewski authored
      We don't descend anymore when building vmlinux, so don't do so for
      the i386 specific boot targets, either.
      
      Plus, more cleanup in arch/i386/Makefile
      abcdaf4b
    • Kai Germaschewski's avatar
      kbuild: Nicer warnings · 56a8f5d4
      Kai Germaschewski authored
      Improve the warning messages when using obsolete features, kill one
      remaining user of $(list-multi)
      
      (by Sam Ravnborg)
      
      I also made O_TARGET != built-in.o an error, since compatibility code for
      that case has already been dropped 
      56a8f5d4
    • Andrew Morton's avatar
      [PATCH] clean up ll_rw_block() · 61c4b8fb
      Andrew Morton authored
      Hardly anything uses this function, so the debug checks in there are
      not of much value.
      
      The check for bdev_readonly() should be done in submit_bio().
      
      Local variable `major' was altogether unused.
      61c4b8fb
    • Andrew Morton's avatar
      [PATCH] stricter dirty memory clamping · 3669e824
      Andrew Morton authored
      The ratelimiting logic in balance_dirty_pages_ratelimited() is designed
      to prevent excessive calls to the expensive get_page_state(): On a big
      machine we only check to see if we're over dirty memory limits once per
      1024 dirtyings per cpu.
      
      This works OK normally, but it has the effect of allowing each process
      to go 1024 pages over the dirty limit before it gets throttled.
      
      So if someone runs 16000 tiobench threads, they can go 16G over the
      dirty memory threshold and die the death of buffer_head consumption.
      Because page dirtiness pins the page's buffer_heads, defeating the
      special buffer_head reclaim logic.
      
      I'd left this overshoot artifact in place because it provides a degree
      of adaptivity - of someone if running hundreds of dirtying processes
      (dbench!) then they do want to overshoot the dirty memory limit.
      
      But it's hard to balance, and is really not worth the futzing around.
      So change the logic to only perform the get_page_state() call rate
      limiting if we're known to be under the dirty memory threshold.
      3669e824
    • Andrew Morton's avatar
      [PATCH] remove page->virtual · a27efcaf
      Andrew Morton authored
      The patch removes page->virtual for all architectures which do not
      define WANT_PAGE_VIRTUAL.  Hash for it instead.
      
      Possibly we could define WANT_PAGE_VIRTUAL for CONFIG_HIGHMEM4G, but it
      seems unlikely.
      
      A lot of the pressure went off kmap() and page_address() as a result of
      the move to kmap_atomic().  That should be the preferred way to address
      CPU load in the set_page_address() and page_address() hashing and
      locking.
      
      If kmap_atomic is not usable then the next best approach is for users
      to cache the result of kmap() in a local rather than calling
      page_address() repeatedly.
      
      One heavy user of kmap() and page_address() is the ext2 directory code.
      
      On a 7G Quad PIII, running four concurrent instances of
      
      	while true
      	do
      		find /usr/src/linux > /dev/null
      	done
      
      on ext2 with everything cached, profiling shows that the new hashed
      set_page_address() and page_address() implementations consume 0.4% and
      1.3% of CPU time respectively.   I think that's OK.
      a27efcaf
    • Andrew Morton's avatar
      [PATCH] use buffer_boundary() for writeback scheduling hints · 343893e6
      Andrew Morton authored
      This is the replacement for write_mapping_buffers().
      
      Whenever the mpage code sees that it has just written a block which had
      buffer_boundary() set, it assumes that the next block is dirty
      filesystem metadata.  (This is a good assumption - that's what
      buffer_boundary is for).
      
      So we do a lookup in the blockdev mapping for the next block and it if
      is present and dirty, then schedule it for IO.
      
      So the indirect blocks in the blockdev mapping get merged with the data
      blocks in the file mapping.
      
      This is a bit more general than the write_mapping_buffers() approach.
      write_mapping_buffers() required that the fs carefully maintain the
      correct buffers on the mapping->private_list, and that the fs call
      write_mapping_buffers(), and the implementation was generally rather
      yuk.
      
      This version will "just work" for filesystems which implement
      buffer_boundary correctly.  Currently this is ext2, ext3 and some
      not-yet-merged reiserfs patches.  JFS implements buffer_boundary() but
      does not use ext2-like layouts - so there will be no change there.
      
      Works nicely.
      343893e6
    • Andrew Morton's avatar
      [PATCH] remove write_mapping_buffers() · 4ac833da
      Andrew Morton authored
      When the global buffer LRU was present, dirty ext2 indirect blocks were
      automatically scheduled for writeback alongside their data.
      
      I added write_mapping_buffers() to replace this - the idea was to
      schedule the indirects close in time to the scheduling of their data.
      
      It works OK for small-to-medium sized files but for large, linear writes
      it doesn't work: the request queue is completely full of file data and
      when we later come to scheduling the indirects, their neighbouring data
      has already been written.
      
      So writeback of really huge files tends to be a bit seeky.
      
      So.  Kill it.  Will fix this problem by other means.
      4ac833da
    • Andrew Morton's avatar
      [PATCH] use bio_get_nr_vecs() for sizing direct-io BIOs · e3b12fc1
      Andrew Morton authored
      From Badari Pulavarty.
      
      Rather than allocating maximum-sized BIOs, use the new
      bio_get_nr_vecs() hint when sizing the BIOs.
      
      Also keep track of the approximate upper-bound on the number of pages
      remaining to do, so we can again avoid allocating excessively-sized
      BIOs.
      e3b12fc1
    • Andrew Morton's avatar
      [PATCH] Documentation/filesystems/ext3.txt · 6fb75ca4
      Andrew Morton authored
      By Vincent Hanquez <tab@tuxfamily.org>
      6fb75ca4
    • Andrew Morton's avatar
      [PATCH] use bio_get_nr_vecs() hint for pagecache writeback · f2b01f8b
      Andrew Morton authored
      Use the bio_get_nr_pages() hint for sizing the BIOs which writeback
      allocates.
      f2b01f8b
    • Andrew Morton's avatar
      [PATCH] fix reclaim for higher-order allocations · 3209a954
      Andrew Morton authored
      The page reclaim logic will bail out if all zones are at pages_high.
      But if the caller is requesting a higher-order allocation we need to go
      on and free more memory anyway.  That's the only way we have of
      addressing buddy fragmentation.
      3209a954
    • Andrew Morton's avatar
      [PATCH] separation of direct-reclaim and kswapd functions · bf3f607a
      Andrew Morton authored
      There is some lack of clarity in what kswapd does and what
      direct-reclaim tasks do; try_to_free_pages() tries to service both
      functions, and they are different.
      
      - kswapd's role is to keep all zones on its node at
      
      	zone->free_pages >= zone->pages_high.
      
        and to never stop as long as any zones do not meet that condition.
      
      - A direct reclaimer's role is to try to free some pages from the
        zones which are suitable for this particular allocation request, and
        to return when that has been achieved, or when all the relevant zones
        are at
      
      	zone->free_pages >= zone->pages_high.
      
      The patch explicitly separates these two code paths; kswapd does not
      run try_to_free_pages() any more.  kswapd should not be aware of zone
      fallbacks.
      bf3f607a
    • Andrew Morton's avatar
      [PATCH] mempool wakeup fix · fe66ad33
      Andrew Morton authored
      When the mempool is empty, tasks wait on the waitqueue in "exclusive
      mode".  So one task is woken for each returned element.
      
      But if the number of tasks which are waiting exceeds the mempool's
      specified size (min_nr), mempool_free() ends up deciding that as the
      pool is fully replenished, there cannot possibly be anyone waiting for
      more elements.
      
      But with 16384 threads running tiobench, it happens.
      
      We could fix this with a waitqueue_active() test in mempool_free().
      But rather than adding that test to this fastpath I changed the wait to
      be non-exclusive, and used the prepare_to_wait/finish_wait API, which
      will be quite beneficial in this case.
      
      Also, convert the schedule() in mempool_alloc() to an io_schedule(), so
      this sleep time is accounted as "IO wait".  Which is a bit approximate
      - we don't _know_ that the caller is really waiting for IO completion.
      But for most current users of mempools, io_schedule() is more accurate
      than schedule() here.
      fe66ad33
    • Andrew Morton's avatar
      [PATCH] O_DIRECT invalidation fix · a7634cff
      Andrew Morton authored
      If the alignment checks in generic_direct_IO() fail, we end up not
      forcing writeback of dirty pagecache pages, but we still run
      invalidate_inode_pages2().  The net result is that dirty pagecache gets
      incorrectly removed.  I guess this will expose unwritten disk blocks.
      
      So move the sync up into generic_file_direct_IO(), where we perform the
      invalidation.  So we know that pagecache and disk are in sync before we
      do anything else.
      a7634cff
    • Andrew Morton's avatar
      [PATCH] truncate fixes · 911ceab5
      Andrew Morton authored
      The new truncate code needs to check page->mapping after acquiring the
      page lock.  Because the page could have been unmapped by page reclaim
      or by invalidate_inode_pages() while we waited for the page lock.
      
      Also, the page may have been moved between a tmpfs inode and
      swapper_space.  Because we don't hold the mapping->page_lock across the
      entire truncate operation any more.
      
      Also, change the initial truncate scan (the non-blocking one which is
      there to stop as much writeout as possible) so that it is immune to
      other CPUs decreasing page->index.
      
      Also fix negated test in invalidate_inode_pages2().  Not sure how that
      got in there.
      911ceab5
    • Andrew Morton's avatar
      [PATCH] distinguish between address span of a zone and the number · d3975580
      Andrew Morton authored
      From David Mosberger
      
      The patch below fixes a bug in nr_free_zone_pages() which shows when a
      zone has hole.  The problem is due to the fact that "struct zone"
      didn't keep track of the amount of real memory in a zone.  Because of
      this, nr_free_zone_pages() simply assumed that a zone consists entirely
      of real memory.  On machines with large holes, this has catastrophic
      effects on VM performance, because the VM system ends up thinking that
      there is plenty of memory left over in a zone, when in fact it may be
      completely full.
      
      The patch below fixes the problem by replacing the "size" member in
      "struct zone" with "spanned_pages" and "present_pages" and updating
      page_alloc.c.
      d3975580
    • Andrew Morton's avatar
      [PATCH] remove debug code from list_del() · 9d66d9e9
      Andrew Morton authored
      It hasn't caught any bugs, and it is causing confusion over whether
      this is a permanent part of list_del() behaviour.
      9d66d9e9
    • Andrew Morton's avatar
      [PATCH] hugetlb kmap fix · db12b88f
      Andrew Morton authored
      From Bill Irwin
      
      This patch makes alloc_hugetlb_page() kmap() the memory it's zeroing,
      and cleans up a tiny bit of list handling on the side.  Without this
      fix, it oopses every time it's called.
      db12b88f
    • Andrew Morton's avatar
      [PATCH] fix /proc/vmstat:pgpgout/pgpgin · 908325dc
      Andrew Morton authored
      These numbers are being sent to userspace as number-of-sectors, whereas
      they should be number-of-k.
      908325dc
    • Brian Gerst's avatar
      [PATCH] struct super_block cleanup - ext3 · 5868a499
      Brian Gerst authored
      Removes the last member of the union, ext3.
      5868a499
    • Brian Gerst's avatar
      [PATCH] struct super_block cleanup - hpfs · 40f51070
      Brian Gerst authored
      Remove hpfs_sb from struct super_block.
      40f51070
    • Kai Mäkisara's avatar
      [PATCH] SCSI tape devfs & driverfs fix · 9709ae9f
      Kai Mäkisara authored
      fix device numbering in driverfs and devfs broken by previous patch
      (bug found by Bjoern A. Zeeb (bz@zabbadoz.net))
      9709ae9f
    • Christer Weinigel's avatar
      [PATCH] Updated NatSemi SCx200 patches for Linux-2.5 · 3900abd5
      Christer Weinigel authored
      This patch adds support for the National Semiconductor SCx200
      processor family to Linux 2.5.
      
      The patch consists of the following drivers:
      
        arch/i386/kernel/scx200.c -- give kernel access to the GPIO pins
      
        drivers/chars/scx200_gpio.c -- give userspace access to the GPIO pins
        drivers/chars/scx200_wdt.c -- watchdog timer driver
      
        drivers/i2c/scx200_i2c.c -- use any two GPIO pins as an I2C bus
        drivers/i2c/scx200_acb.c -- driver for the Access.BUS hardware
      
        drivers/mtd/maps/scx200_docflash.c -- driver for a CFI flash connected
                                            to the DOCCS pin
      3900abd5
    • Petr Vandrovec's avatar
      [PATCH] FAT/VFAT memory corruption during mount() · 10d033f7
      Petr Vandrovec authored
      This patch fixes memory corruption during vfat mount: one byte
      before mount options is overwritten by ',' since strtok->strsep
      conversion happened.
      
      This patch also fixes another problem introduced by strtok->strsep
      conversion: VFAT requires that FAT does not modify passed options,
      but unfortunately FAT driver fails to preserve options string if
      there is more than one consecutive comma in option string.
      10d033f7
  2. 04 Oct, 2002 11 commits