1. 10 Sep, 2002 20 commits
    • Celso González's avatar
      [PATCH] drivers_net_pcmcia_fmvj18x_cs.c save_flags unsigned check · 530df4b1
      Celso González authored
        The function save_flags must use an unsigned long parameter instead a
        long (signed) one
      
        This trivial patch solves the problem
      530df4b1
    • Linus Torvalds's avatar
      The scheduler should complain not just about interrupts, · 231991f4
      Linus Torvalds authored
      but also about being called whenever we're holding any
      other preemption locks.
      231991f4
    • Linus Torvalds's avatar
      atari_rootsec.h moved to fs/partitions/atari.h, but somehow the · 34fc0f54
      Linus Torvalds authored
      version in include/linux didn't get deleted.
      34fc0f54
    • Andrew Morton's avatar
      [PATCH] resurrect CONFIG_HIGHPTE · 81e0a1a6
      Andrew Morton authored
      Bill Irwin's patch to fix up pte's in highmem.
      
      With CONFIG_HIGHPTE, the direct pte pointer in struct page becomes the
      64-bit physical address of the single pte which is mapping this page.
      
      If the page is not PageDirect then page->pte.chain points at a list of
      pte_chains, which each now contain an array of 64-bit physical
      addresses of the pte's which are mapping the page.
      
      The functions rmap_ptep_map() and rmap_ptep_unmap() are used for
      mapping and unmapping the page which backs the target pte.
      
      The patch touches all architectures (adding do-nothing compatibility
      macros and inlines).  It generally mangles lots of header files and may
      break non-ia32 compiles.  I've had it in testing since 2.5.31.
      81e0a1a6
    • Andrew Morton's avatar
      [PATCH] rmap pte_chain speedup and space saving · 9dc8af80
      Andrew Morton authored
      The pte_chains presently consist of a pte pointer and a `next' link.
      So there's a 50% memory wastage here as well as potential for a lot of
      misses during walks of the singly-linked per-page list.
      
      This patch increases the pte_chain structure to occupy a full
      cacheline.  There are 7, 15 or 31 pte pointers per structure rather
      than just one.  So the wastage falls to a few percent and the number of
      misses during the walk is reduced.
      
      The patch doesn't make much difference in simple testing, because in
      those tests the pte_chain list from the previous page has good cache
      locality with the next page's list.
      
      The patch sped up Anton's "10,000 concurrently exitting shells" test by
      3x or 4x.  It gives a 10% reduction in system time for a kernel build
      on 16p NUMAQ.
      
      It saves memory and reduces the amount of work performed in the slab
      allocator.
      
      Pages which are mapped by only a single process continue to not have a
      pte_chain.  The pointer in struct page points directly at the mapping
      pte (a "PageDirect" pte pointer).  Once the page is shared a pte_chain
      is allocated and both the new and old pte pointers are moved into it.
      
      We used to collapse the pte_chain back to a PageDirect representation
      in page_remove_rmap().  That has been changed.  That collapse is now
      performed inside page reclaim, via page_referenced().  The thinking
      here is that if a page was previously shared then it may become shared
      again, so leave the pte_chain structure in place.  But if the system is
      under memory pressure then start reaping them anyway.
      9dc8af80
    • Andrew Morton's avatar
      [PATCH] buffer_head takedown for bighighmem machines · e182d612
      Andrew Morton authored
      This patch addresses the excessive consumption of ZONE_NORMAL by
      buffer_heads on highmem machines.  The algorithms which decide which
      buffers to shoot down are fairly dumb, but they only cut in on machines
      with large highmem:lowmem ratios and the code footprint is tiny.
      
      The buffer.c change implements the buffer_head accounting - it sets the
      upper limit on buffer_head memory occupancy to 10% of ZONE_NORMAL.
      
      A possible side-effect of this change is that the kernel will perform
      more calls to get_block() to map pages to disk.  This will only be
      observed when a file is being repeatadly overwritten - this is the only
      case in which the "cached get_block result" in the buffers is useful.
      
      I did quite some testing of this back in the delalloc ext2 days, and
      was not able to come up with a test in which the cached get_block
      result was measurably useful.  That's for ext2, which has a fast
      get_block().
      
      A desirable side effect of this patch is that the kernel will be able
      to cache much more blockdev pagecache in ZONE_NORMAL, so there are more
      ext2/3 indirect blocks in cache, so with some workloads, less I/O will
      be performed.
      
      In mpage_writepage(): if the number of buffer_heads is excessive then
      buffers are stripped from pages as they are submitted for writeback.
      This change is only useful for filesystems which are using the mpage
      code.  That's ext2 and ext3-writeback and JFS.  An mpage patch for
      reiserfs was floating about but seems to have got lost.
      
      There is no need to strip buffers for reads because the mpage code does
      not attach buffers for reads.
      
      These are perhaps not the most appropriate buffer_heads to toss away.
      Perhaps something smarter should be done to detect file overwriting, or
      to toss the 'oldest' buffer_heads first.
      
      In refill_inactive(): if the number of buffer_heads is excessive then
      strip buffers from pages as they move onto the inactive list.  This
      change is useful for all filesystems.  This approach is good because
      pages which are being repeatedly overwritten will remain on the active
      list and will retain their buffers, whereas pages which are not being
      overwritten will be stripped.
      e182d612
    • Andrew Morton's avatar
      [PATCH] reduce the default dirty memory thresholds · ce92adf3
      Andrew Morton authored
      Writeback parameter tuning.  Somewhat experimental, but heading in the
      right direction, I hope.
      
      - Allowing 40% of physical memory to be dirtied on massive ia32 boxes
        is unreasonable.  It pins too many buffer_heads and contribues to
        page reclaim latency.
      
        The patch changes the initial value of
        /proc/sys/vm/dirty_background_ratio, dirty_async_ratio and (the
        presently non-functional) dirty_sync_ratio so that they are reduced
        when the highmem:lowmem ratio exceeds 4:1.
      
        These ratios are scaled so that as the highmem:lowmem ratio goes
        beyond 4:1, the maximum amount of allowed dirty memory ceases to
        increase.  It is clamped at the amount of memory which a 4:1 machine
        is allowed to use.
      
      - Aggressive reduction in the dirty memory threshold at which
        background writeback cuts in.  2.4 uses 30% of ZONE_NORMAL.  2.5 uses
        40% of total memory.  This patch changes it to 10% of total memory
        (if total memory <= 4G.  Even less otherwise - see above).
      
      This means that:
      
      - Much more writeback is performed by pdflush.
      
      - When the application is generating dirty data at a moderate
        rate, background writeback cuts in much earlier, so memory is
        cleaned more promptly.
      
      - Reduces the risk of user applications getting stalled by writeback.
      
      - Will damage dbench numbers.  It turns out that the damage is
        fairly small, and dbench isn't a worthwhile workload for
        optimisation.
      
      - Moderate reduction in the dirty level at which the write(2) caller
        is forced to perform writeback (throttling).  Was 40% of total
        memory.  Is now 30% of total memory (if total memory <= 4G, less
        otherwise).
      
      This is to reduce page reclaim latency, and generally because
      allowing processes to flood the machine with dirty data is a bad
      thing in mixed workloads.
      ce92adf3
    • Andrew Morton's avatar
      [PATCH] discontigmem code cleanup #2 · e2f5e334
      Andrew Morton authored
      Patch from Martin Bligh
      
      "This mainly just rips out some magic extra structures in the boot time
       code to determine node sizes, and counts in pages instead of bytes.
       Oh, and I put the code that allocates pgdat into allocage_pgdat,
       instead of find_max_pfn_node, which seems like an incongruous home for
       it.
      
       No functionality changes, nothing touched outside i386 discontigmem ...
       just makes code cleaner and more readable.  Tested on 16-way NUMA-Q."
      e2f5e334
    • Andrew Morton's avatar
      [PATCH] discontigmem code cleanup #1 · 79a96230
      Andrew Morton authored
      Patch from Martin Bligh.
      
      "This mainly changes the PLAT_MY_MACRO_IS_ALL_CAPS() stuff to be
       normal_macro(), and takes out some unnecessary redirection of function
       names.  No functionality changes, nothing touched outside i386
       discontigmem ...  just makes code readable.  Rumour has it that the
       PLAT_* stuff came from IRIX - I don't see that as a good reason to make
       the Linux code unreadable.  Tested on 16-way NUMA-Q."
      79a96230
    • Andrew Morton's avatar
      [PATCH] exact dirty state accounting · 1f90eedd
      Andrew Morton authored
      Some adjustments to global dirty page accounting.
      
      Previously, dirty page accounting counted all dirty pages.  Even dirty
      anonymous pages.  This has potential to upset the throttling logic in
      balance_dirty_pages().  Particularly as I suspect we should decrease
      the dirty memory writeback thresholds by a lot.
      
      So this patch changes it so that we only account for dirty pagecache
      pages which have backing store.  Not anonymous pages, not swapcache,
      not in-memory filesystem pages.
      
      To support this, the `memory_backed' boolean has been added to struct
      backing_dev_info.  When an address space's backing device is marked as
      memory-backed, the core kernel knows to not include that mapping's
      pages in the dirty memory accounting.
      
      For memory-backed mappings, dirtiness is a way of pinning the page, and
      there's nothing the kernel can to do clean the page to make it freeable.
      
      driverfs, tmpfs, and ranfs have been coverted to mark their mappings as
      memory-backed.
      
      The ramdisk driver hasn't been converted.  I have a separate patch for
      ramdisk, which fails to fix the longstanding problems in there :(
      
      With this patch, /bin/sync now sends /proc/meminfo:Dirty to zero, which
      is rather comforting.
      1f90eedd
    • Andrew Morton's avatar
      [PATCH] pass the correct flags to aops->releasepage() · 6a0fb424
      Andrew Morton authored
      Restore the gfp_mask in the VM's call to a_ops->releasepage().  We can
      block in there again, and XFS (at least) can use that.
      6a0fb424
    • Andrew Morton's avatar
      [PATCH] writer throttling fix · 95b88300
      Andrew Morton authored
      The patch fixes a few problems in the writer throttling code.  Mainly
      in the situation where a single large file is being written out.
      
      That file could be parked on sb->locked_inodes due to pdflush
      writeback, and the writer throttling path coming out of
      balance_dirty_pages() forgot to look for inodes on ->locked_inodes.
      
      The net effect was that the amount of dirty memory was exceeding the
      limit set in /proc/sys/vm/dirty_async_ratio, possibly to the point
      where the system gets seriously choked.
      
      The patch removes sb->locked_inodes altogether and teaches the
      throttling code to look for inodes on sb->s_io as well as sb->s_dirty.
      
      Also, just leave unwritten dirty pages on mapping->io_pages, and
      unwritten dirty inodes on sb->s_io.  Putting them back onto
      ->dirty_pages and ->dirty_inodes was fairly pointless, given that both
      lists need to be looked at.
      95b88300
    • Ingo Molnar's avatar
      [PATCH] Re: do_syslog/__down_trylock lockup in current BK · 0d8b3b44
      Ingo Molnar authored
      This fixes the lockup.
      
      The bug happened because reparenting in the CLONE_THREAD case was done in
      a fundamentally non-atomic way, which was asking for various races to
      happen: eg. the target parent gets reparented to the currently exiting
      thread ...
      
      (the non-CLONE_THREAD case is safe because nothing reparents init.)
      
      the solution is to make all of reparenting atomic (including the
      forget_original_parent() bit) - this is possible with some reorganization
      done in signal.c and exit.c. This also made some of the loops simpler.
      0d8b3b44
    • Alexander Viro's avatar
      [PATCH] Missing IDE partition 3 of 3 on 2.5.34 · 8fb345bd
      Alexander Viro authored
      devfs side fixed thus:
      8fb345bd
    • Jens Axboe's avatar
      [PATCH] hdreg command updates etc · f1c84a2e
      Jens Axboe authored
      Update hdreg to match 2.4 levels.
      
      o Use consistent SRV_STAT instead of SERVICE_STAT
      o Add sector count status bits for tcq
      o Add various missing commands
      o hd_driveid update
      f1c84a2e
    • Jens Axboe's avatar
      [PATCH] IDE pci ids · 8930eafc
      Jens Axboe authored
      Update IDE pci ids to match 2.4.20-pre5-ac4 levels.
      8930eafc
    • Jens Axboe's avatar
      [PATCH] blk_fs_request() · 4372b607
      Jens Axboe authored
      Add blk_fs_request(rq) to avoid testing rq->flags & REQ_CMD directly.
      4372b607
    • Jens Axboe's avatar
      [PATCH] PCI individual resource handling · e47901f9
      Jens Axboe authored
      This merges the changes from 2.4-ac that allow drivers to enable (and
      mark as used) only a subset of PCI resources, for those drivers that
      need it (at this point apparently only the i845 IDE controller).
      e47901f9
    • Mikael Pettersson's avatar
      [PATCH] undo 2.5.34 ftape damage · ac9c060c
      Mikael Pettersson authored
      In the 2.5.33->2.5.34 step someone removed "export-objs" from
      drivers/char/ftape/lowlevel/Makefile, which makes it impossible to build
      ftape as a module since is _does_ have a number of EXPORT_SYMBOL's.
      
      This reverts that change.
      ac9c060c
    • Mikael Pettersson's avatar
      [PATCH] 2.5.34 floppy driver init/exit fixes · 9d1f9419
      Mikael Pettersson authored
      The 2.5 floppy driver has for a long time has two init/exit bugs:
      1. It calls register_sys_device() on init, but fails to call
         unregister_sys_device() in exit. This leads to data structure
         corruption if floppy is a module and it gets unloaded.
      2. If calls register_sys_device() early on init, but fails to call
         unregister_sys_device() if init fails. Again, this leads to
         data structure corruption.
      
      The patch below fixes both these problems.
      9d1f9419
  2. 09 Sep, 2002 20 commits
    • Stephen Rothwell's avatar
      [PATCH] cdrom.c is the only file to include asm/fcntl.h · ed245b59
      Stephen Rothwell authored
      drivers/cdrom/cdrom.c is the only file (apart from include/linux/fcntl.h)
      that includes asm/fcntl.h.  This changes that and should have no affect.
      
      I need to do this before I consolidate the asm/fcntl.h files into
      linux/fcntl.h (coming next - again).
      ed245b59
    • Skip Ford's avatar
      [PATCH] 2.5.34 ufs/super.c · 2ecc1c29
      Skip Ford authored
      This is needed since 2.5.32 to successfully mount a UFS partition.
      2ecc1c29
    • Rolf Fokkens's avatar
      [PATCH] USER_HZ & NTP problems · 3843e047
      Rolf Fokkens authored
      I've been playing with different HZ values in the 2.4 kernel for a while
      now, and apparantly Linus also has decided to introduce a USER_HZ
      constant (I used CLOCKS_PER_SEC) while raising the HZ value on x86 to
      1000.
      
      On x86 timekeeping has shown to be relative fragile when raising HZ (OK,
      I tried HZ=2048 which is quite high) because of the way the interrupt
      timer is configured to fire HZ times each second.  This is done by
      configuring a divisor in the timer chip (LATCH) which divides a certain
      clock (1193180) and makes the chip fire interrupts at the resulting
      frequency.
      
      Now comes the catch: NTP requires a clock accuracy of 500 ppm.  For some
      HZ values the clock is not accurate enough to meet this requirement,
      hence NTP won't work well.
      
      An example HZ value is 1020 which exceeds the 500 ppm requirement.  In
      this case the best approximation is 1019.8 Hz.  the xtime.tv_usec value
      is raised with a value of 980 each tick which means that after one
      second the tv_usec value has increased with 999404 (should be 1000000)
      which is an accuracy of 596 ppm.
      
      Some more examples:
      	  HZ Accuracy (ppm)
      	---- --------------
      	 100             17
      	1000            151
      	1024            632
      	2000            687
      	2008            343
      	2011             18
      	2048           1249
      
      What I've been doing is replace tv_usec by tv_nsec, meaning xtime is now
      a timespec instead of a timeval.  This allows the accuracy to be
      improved by a factor of 1000 for any (well ...  any?) HZ value. 
      
      Of course all kinds of calculations had te be improved as well.  The
      ACTHZ constantant is introduced to approximate the actual HZ value, it's
      used to do some approximations of other related values. 
      3843e047
    • Linus Torvalds's avatar
      Never _ever_ BUG() if you don't have to · ba815d85
      Linus Torvalds authored
      Cset exclude: greg@kroah.com|ChangeSet|20020905153320|19047
      ba815d85
    • Linus Torvalds's avatar
      Merge http://linux-acpi.bkbits.net/linux-acpi · 38908d74
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      38908d74
    • Linus Torvalds's avatar
      Merge bk://linuxusb.bkbits.net/linus-2.5 · 8a0f08e2
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      8a0f08e2
    • Linus Torvalds's avatar
      Merge bk://linuxusb.bkbits.net/pci_hp-2.5 · 159b0104
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      159b0104
    • Andy Grover's avatar
    • Andy Grover's avatar
      Merge groveronline.com:/root/bk/linux-2.5 · a340bf30
      Andy Grover authored
      into groveronline.com:/root/bk/linux-acpi
      a340bf30
    • Patrick Mochel's avatar
      Reorganize the mtrr init sequence a bit. All mtrr init now happens · b6a3d01f
      Patrick Mochel authored
      during the initcall sequence, after all CPUs have been brought up. 
      mtrr_init() calls a static init_other_cpus(), which fires off a function 
      on all other cpus to replicate the state across all of them. 
      
      arch/i386/kernel/smpboot.c::smp_callin() had the following: 
      
      #ifdef CONFIG_MTRR
             /*
              * Must be done before calibration delay is computed
              */
             mtrr_init_secondary_cpu ();
      #endif
      
      
      I couldn't figure this one out. The P4 manual says nothing about this, nor
      find any other documentation about it. The P4 manual says only that state
      must be synchronized across all CPUs, which it is. And, it happens before
      anything else is executed on the other CPUs, and before any devices or
      drivers have been brought up.
      
      The cyrix mtrr code was also updated to handle this style of SMP initialization.
      b6a3d01f
    • Linus Torvalds's avatar
      Merge home:v2.5/linux · 2b5d7502
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      2b5d7502
    • Linus Torvalds's avatar
      Get Intel model name from the CPU · ad6b7f70
      Linus Torvalds authored
      ad6b7f70
    • Greg Kroah-Hartman's avatar
    • Greg Kroah-Hartman's avatar
    • Patrick Mochel's avatar
      [PATCH] Re: Performance issue in 2.5.32+ · ac7349b6
      Patrick Mochel authored
      - The early startup code was changed so smp_prepare_cpus() is now called
        before do_basic_setup().  do_basic_setup() is where mtrr_init() is
        called, which mtrr_init_secondary_cpu() is dependent on being called.
      
      - mtrr_init_boot_cpu() was removed from the AP startup code. This was a
        SMP-only hack that made sure mtrr_init() happened when SMP was
        enabled.  That's right - two different code paths to do the same
        thing, obscured by compile-time defines.
      
      The appended patch makes sure mtrr_init() is called before
      smp_prepare_cpus(). It's ugly, and I'll work on a cleaner solution, but
      James: could you try it and see if it fixes your performance issues?
      ac7349b6
    • Juan Quintela's avatar
      [PATCH] : Grammatical fixes · 4b84bbe0
      Juan Quintela authored
        Documentation/porting: s/are/and/
        Documentation/directory-locking: s/that means// was repeated
      4b84bbe0
    • Petr Vandrovec's avatar
      [PATCH] 2.5.34: recalc_sigpending missing for modules · 69be6c8e
      Petr Vandrovec authored
      When recalc_sigpending was converted from inline to real function,
      appropriate EXPORT_SYMBOL() was not created.  Needed at least for ncpfs
      and lockd.
      69be6c8e
    • Chris Wright's avatar
      [PATCH] 2.5.34 kernel-api DocBook fix · 2f5d3153
      Chris Wright authored
      Update kernel-api.tmpl to reflect mtrr changes so that the docs will build.
      2f5d3153
    • Greg Kroah-Hartman's avatar
      PCI Hotplug: remove pci_*_nodev() prototypes as the functions are gone. · 3d1a6602
      Greg Kroah-Hartman authored
      The pci_bus_* functions should be used instead.
      3d1a6602
    • Greg Kroah-Hartman's avatar