1. 25 Sep, 2002 17 commits
    • Andrew Morton's avatar
      [PATCH] slab reclaim balancing · b65bbded
      Andrew Morton authored
      A patch from Ed Tomlinson which improves the way in which the kernel
      reclaims slab objects.
      
      The theory is: a cached object's usefulness is measured in terms of the
      number of disk seeks which it saves.  Furthermore, we assume that one
      dentry or inode saves as many seeks as one pagecache page.
      
      So we reap slab objects at the same rate as we reclaim pages.  For each
      1% of reclaimed pagecache we reclaim 1% of slab.  (Actually, we _scan_
      1% of slab for each 1% of scanned pages).
      
      Furthermore we assume that one swapout costs twice as many seeks as one
      pagecache page, and twice as many seeks as one slab object.  So we
      double the pressure on slab when anonymous pages are being considered
      for eviction.
      
      The code works nicely, and smoothly.  Possibly it does not shrink slab
      hard enough, but that is now very easy to tune up and down.  It is just:
      
      	ratio *= 3;
      
      in shrink_caches().
      
      Slab caches no longer hold onto completely empty pages.  Instead, pages
      are freed as soon as they have zero objects.  This is possibly a
      performance hit for slabs which have constructors, but it's doubtful.
      Most allocations after a batch of frees are satisfied from inside
      internally-fragmented pages and by the time slab gets back onto using
      the wholly-empty pages they'll be cache-cold.  slab would be better off
      going and requesting a new, cache-warm page and reconstructing the
      objects therein.  (Once we have the per-cpu hot-page allocator in
      place.  It's happening).
      
      As a consequence of the above, kmem_cache_shrink() is now unused.  No
      great loss there - the serialising effect of kmem_cache_shrink and its
      semaphore in front of page reclaim was measurably bad.
      
      Still todo:
      
      - batch up the shrinking so we don't call into prune_dcache and
        friends at high frequency asking for a tiny number of objects.
      
      - Maybe expose the shrink ratio via a tunable.
      
      - clean up slab.c
      
      - highmem page reclaim in prune_icache: highmem pages can pin
        inodes.
      b65bbded
    • Andrew Morton's avatar
      [PATCH] use prepare_to_wait in VM/VFS · dfdacf59
      Andrew Morton authored
      This uses the new wakeup machinery in some hot parts of the VFS and
      block layers.
      
      wait_on_buffer(), wait_on_page(), lock_page(), blk_congestion_wait().
      Also in get_request_wait(), although the benefit for exclusive wakeups
      will be lower.
      dfdacf59
    • Andrew Morton's avatar
      [PATCH] prepare_to_wait/finish_wait sleep/wakeup API · 3da08d6c
      Andrew Morton authored
      This is worth a whopping 2% on spwecweb on an 8-way.  Which is faintly
      surprising because __wake_up and other wait/wakeup functions are not
      apparent in the specweb profiles which I've seen.
      
      
      The main objective of this is to reduce the CPU cost of the wait/wakeup
      operation.  When a task is woken up, its waitqueue is removed from the
      waitqueue_head by the waker (ie: immediately), rather than by the woken
      process.
      
      This means that a subsequent wakeup does not need to revisit the
      just-woken task.  It also means that the just-woken task does not need
      to take the waitqueue_head's lock, which may well reside in another
      CPU's cache.
      
      I have no decent measurements on the effect of this change - possibly a
      20-30% drop in __wake_up cost in Badari's 40-dds-to-40-disks test (it
      was the most expensive function), but it's inconclusive.  And no
      quantitative testing of which I am aware has been performed by
      networking people.
      
      The API is very simple to use (Linus thought it up):
      
      my_func(waitqueue_head_t *wqh)
      {
      	DEFINE_WAIT(wait);
      
      	prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
      	if (!some_test)
      		schedule();
      	finish_wait(wqh, &wait);
      }
      
      or:
      
      	DEFINE_WAIT(wait);
      
      	while (!some_test_1) {
      		prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
      		if (!some_test_2)
      			schedule();
      		...
      	}
      	finish_wait(wqh, &wait);
      
      You need to bear in mind that once prepare_to_wait has been performed,
      your task could be removed from the waitqueue_head and placed into
      TASK_RUNNING at any time.  You don't know whether or not you're still
      on the waitqueue_head.
      
      Running prepare_to_wait() when you're already on the waitqueue_head is
      fine - it will do the right thing.
      
      Running finish_wait() when you're actually not on the waitqueue_head is
      fine.
      
      Running finish_wait() when you've _never_ been on the waitqueue_head is
      fine, as ling as the DEFINE_WAIT() macro was used to initialise the
      waitqueue.
      
      You don't need to fiddle with current->state.  prepare_to_wait() and
      finish_wait() will do that.  finish_wait() will always return in state
      TASK_RUNNING.
      
      There are plenty of usage examples in vm-wakeups.patch and
      tcp-wakeups.patch.
      3da08d6c
    • Andrew Morton's avatar
      [PATCH] mprotect_fixup fix · 02b1783c
      Andrew Morton authored
      From David M-T.
      
      When this function successfully merges the new range into an existing
      VMA, it forgets to extend the new protection mode into the just-merged
      pages.
      02b1783c
    • Andrew Morton's avatar
      [PATCH] hugetlb fix · 5538fdaa
      Andrew Morton authored
      Patch from Rohit Seth
      
      It fixes the problem which Andrea noted in his initial review of the
      hugetlb code:
      
      "In short doing "addr = vma->vm_end" and then checking if vm_end + len
       is below vm_next->vm_start is broken, because there's no guarantee
       that "addr" will be a largepage aligned address.  the LPAGE_ALIGN in
       found_addr should be dropped becaue moving the addr ahead without
       checking that addr+len doesn't then fall into a vma, will generate
       do_munmaps and in turn userspace mem corruption."
      5538fdaa
    • Martin J. Bligh's avatar
      [PATCH] NUMA-Q fixes · bce5aeb5
      Martin J. Bligh authored
       - Remove the const that someone incorrectly stuck in there, it type conflicts.
         Alan has a better plan for fixing this long term, but this fixes the compile
         warning for now.
      
       - Move the printk of the xquad_portio setup *after* we put something in the variable
         so it actually prints something useful, not 0 ;-)
      
       - To derive the size of the xquad_portio area, multiply the number of nodes by the
         size of each nodes, not the size of two nodes (and remove define). Doh!
      bce5aeb5
    • Linus Torvalds's avatar
      Remove busy-wait for short RT nanosleeps. It's a random special case · 98ae8e2b
      Linus Torvalds authored
      and does the wrong thing for higher HZ values anyway.
      98ae8e2b
    • Ingo Molnar's avatar
      [PATCH] exit-fix-2.5.38-E3 · 5dd6a6e5
      Ingo Molnar authored
      This fixes a number of bugs in the thread-release code:
      
       - notify parents only if the group leader is a zombie,
         and if it's not a detached thread.
      
       - do not reparent children to zombie tasks.
      
       - introduce the TASK_DEAD state for tasks, to serialize the task-release
         path. (to some it might be confusing that tasks are zombies first, then
         dead :-)
      
       - simplify tasklist_lock usage in release_task().
      
      the effect of the above bugs ranged from unkillable hung zombies to kernel
      crashes. None of those happens with the patch applied.
      5dd6a6e5
    • Jens Axboe's avatar
      [PATCH] remove elevator_linus · 2684cd69
      Jens Axboe authored
      Patch killing off elevator_linus for good. Sniffle.
      2684cd69
    • Jens Axboe's avatar
      [PATCH] deadline scheduler · 85b2148a
      Jens Axboe authored
      This introduces the deadline-ioscheduler, making it the default.  2nd
      patch coming that deletes elevator_linus in a minute.
      
      This one has read_expire at 500ms, and writes_starved at 2.
      85b2148a
    • Thomas Hood's avatar
      [PATCH] PnP BIOS ESCD sanity check · 650e56ee
      Thomas Hood authored
      Sanity checkthe ESCD size. From 2.4.
      650e56ee
    • Ivan Kokshaysky's avatar
      [PATCH] ALi and Cypress IDE fixes · 26b90050
      Ivan Kokshaysky authored
      These two chipsets are most common on alpha.
      - cy82c693: allow the generic IDE setup code to work correctly
        with broken PCI registers layout of this chip. This fixes
        quite a few problems with secondary channel, plus some hacks in
        arch code can go away.
      - ALi M5229: enable DMA.
      26b90050
    • Adam Radford's avatar
      [PATCH] 3ware driver update for 2.5.35 · 92f2c52c
      Adam Radford authored
      92f2c52c
    • Ingo Molnar's avatar
      [PATCH] pidhash-2.5.38-A0 · 5191a147
      Ingo Molnar authored
      This removes the cmpxchg from the PID allocator and replaces it with a
      spinlock.  This spinlock is hit only a couple of times per bootup, so
      it's not a performance issue.
      5191a147
    • Ingo Molnar's avatar
      [PATCH] thread-flock-2.5.38-A3 · a16435af
      Ingo Molnar authored
      Ulrich found another small detail wrt. POSIX requirements for threads -
      this time it's the recursion features (read-held lock being write-locked
      means an upgrade if the same 'process' is the owner, means a deadlock if a
      different 'process').
      
      this requirement even makes some sense - the group of threads who own a
      lock really own all rights to the lock as well.
      
      These changes fix this, all testcases pass now.  (inter-process
      testcases as well, which are not affected by this patch.)
      
      (SIGURG and SIGIO semantics should also continue to work - there's some
      more stuff we can optimize with the new pidhash in this area, but that's
      for later.)
      a16435af
    • Theodore Y. Ts'o's avatar
      [PATCH] loop device broken in 2.5.38 · 86b18ae3
      Theodore Y. Ts'o authored
      The loop device driver was broken in 2.5.38 when it was converted over
      to use gendisk.  I discovered this while doing final regression testing
      on the ext3 htree code.
      
      The problem is that figure_loop_size() is setting the capacity of the
      loop device in kilobytes (because that's what compute_loop_size()
      returns), but set_capacity() expects the size in 512 byte sectors.
      
      I've enclosed a patch which fixes the problem, as well as simplifying
      the code by eliminating compute_loop_size(), since it is a static
      function is only used once by figure_loop_size().
      86b18ae3
    • Dave Kleikamp's avatar
      Merge jfs@jfs.bkbits.net:linux-2.5 · 0de4d503
      Dave Kleikamp authored
      into kleikamp.austin.ibm.com:/home/shaggy/bk/jfs-2.5
      0de4d503
  2. 24 Sep, 2002 11 commits
    • Paul Mackerras's avatar
      [PATCH] fix null dereference in sys_mprotect · 0cd9efe3
      Paul Mackerras authored
      As it is at the moment, sys_mprotect will dereference a null pointer
      if you use it on a region that is contained within the first vma.  I
      have a little program that demonstrates this (I'll post it if anyone
      is interested).  What happens then is that the process hangs in
      do_page_fault at the down_read on the mm->mmap_sem, since sys_mprotect
      has done a down_write on mm->mmap_sem.
      
      The problem is that mprotect_fixup isn't updating prev properly.  Thus
      we can finish the main loop in sys_mprotect with prev == NULL.  This
      has been the case since Christoph's cleanups went in.  Prior to that,
      mprotect_fixup always set prev to something non-NULL.  I suspect that
      not updating prev could also cause vmas to get dropped completely if
      the region being mprotected spans more than one vma.
      
      The patch below fixes the problem by making mprotect_fixup set prev to
      a reasonable value in all circumstances.
      0cd9efe3
    • Linus Torvalds's avatar
      efae82c0
    • Robert Love's avatar
      [PATCH] per-cpu data preempt-safing · c6e70088
      Robert Love authored
      This unsafe access to per-CPU data via reordering of instructions or use
      of "get_cpu()".
      
      Before anyone balks at the brlock.h fix, note this was in the
      alternative version of the code which is not used by default.
      c6e70088
    • Robert Love's avatar
      [PATCH] remove preempt workaround in slab.c · 7f644d00
      Robert Love authored
      Before the irqs_disabled() check in preempt_schedule(), we worked around
      some locking issues in slab.c.  Now that we will never preempt with
      interrupts disabled, we can remove those and clean things up.
      
      This is courtesy of Manfred Spraul.
      7f644d00
    • Robert Love's avatar
      [PATCH] s/preempt_count()/in_atomic() in do_exit() · 5d671309
      Robert Love authored
      This converts the debugging check in do_exit from a check on
      preempt_count() to in_atomic().
      
      The main benefit to this is we will stop warning over the BKL and now
      use the standard mechanism for such checks.
      5d671309
    • Matthew Wilcox's avatar
      [PATCH] flock_lock_file livelock fix · 0adfb15a
      Matthew Wilcox authored
      Looks like I dropped a hunk from my patchset, sorry.
      
      We never set FL_SLEEP in the flock case, so if we should block, we'll
      livelock instead.
      0adfb15a
    • Linus Torvalds's avatar
      Simplify elevator algorithm, make it prefer reads heavily. · a9ee74e7
      Linus Torvalds authored
      This is needed for reasonable read latency with the new VM
      behaviour. 
      
      NOTE! This is way too unfair, Andrew and Jens are working on
      alternatives.
      a9ee74e7
    • Ivan Kokshaysky's avatar
      [PATCH] another alpha update · 7f012496
      Ivan Kokshaysky authored
       - Makefile cleanups and fixes
       - a bunch of syscalls added
       - removed crap from asm/ide.h (it's not needed anymore)
       - __down_read_trylock fix
      7f012496
    • Linus Torvalds's avatar
      Merge with DRI CVS tree · 76f92de7
      Linus Torvalds authored
      76f92de7
    • Jens Axboe's avatar
      [PATCH] ide io scheduler thing · 60abdcb3
      Jens Axboe authored
      IDE must use blk_queue_empty() and not do a list_empty() on the
      (potentially only) dispatch queue.  This took quite a while to find
      while debugging a new io scheduler...
      60abdcb3
    • Ingo Molnar's avatar
      [PATCH] pgrp-fix-2.5.38-A2 · 872aa4a8
      Ingo Molnar authored
      This fixes the emacs bug reported by Andries.  It should probably also
      fix other, terminal handling related weirdnesses introduced by the new
      PID handling code in 2.5.38.
      
      The bug was in the session_of_pgrp() function, if no proper session is
      found in the process group then we must take the session ID from the
      process that has pgrp PID (which does not necesserily have to be part of
      the pgrp).  The fallback code is only triggered when no process in the
      process group has a valid session - besides being faster, this also
      matches the old implementation.
      
      [ hey, who needs a POSIX conformance testsuite when we have emacs! ;) ]
      872aa4a8
  3. 23 Sep, 2002 12 commits
    • Andrew Morton's avatar
      [PATCH] direct-io bandaid · bf72e973
      Andrew Morton authored
      The direct-IO code is currently generating 1 meg BIOs (and
      subsequent BUGs) because it doesn't know about bio_add_page().
      
      Could we please drop it to 16k until we get it sorted out?
      bf72e973
    • Greg Kroah-Hartman's avatar
      Merge kroah.com:/home/greg/linux/BK/bleeding_edge-2.5 · 7fcc2c87
      Greg Kroah-Hartman authored
      into kroah.com:/home/greg/linux/BK/gregkh-2.5
      7fcc2c87
    • Adams IT Services's avatar
      [PATCH] USBLCD updates · 9c5e6f5a
      Adams IT Services authored
      -increased timeout value because some people reported problems
      -(important!) Vender ID has changed from 0x1212 to 0x10D2 , my official
        assigned one.
      -added usblcd driver to configure.help
      9c5e6f5a
    • Stuart MacDonald's avatar
      [PATCH] usb whiteheat driver update · 97338442
      Stuart MacDonald authored
      Update to full working driver status. Latest firmware 4.06 too. Driver
      now officially supported.
      97338442
    • Greg Kroah-Hartman's avatar
      [PATCH] USB: made port_softint global for other usb-serial drivers to use. · e185597b
      Greg Kroah-Hartman authored
      Based off of a patch from Stuart MacDonald <stuartm@connecttech.com>
      e185597b
    • Stuart MacDonald's avatar
      [PATCH] USB: clean up the error logic for open() in the usb-serial driver · 9a50ad7f
      Stuart MacDonald authored
      This cleans up the error path in the open() call to make a bit more
      sense.
      9a50ad7f
    • Greg Kroah-Hartman's avatar
      [PATCH] USB: fix for ezusb firmware download · 20e3be5f
      Greg Kroah-Hartman authored
      This fixes a stupid error in the timeout value when downloading firmware
      to a device.  The WhiteHEAT device now works properly with this patch.
      20e3be5f
    • Alan Stern's avatar
      [PATCH] usb-storage: fix return codes... · 0d80c6f5
      Alan Stern authored
      Like the header says, this patch fixes up the various Transfer- and
      Transport-level return codes.  There were a lot of places in the various
      subdrivers that were not particularly careful about distinguishing the
      two; it would help if the people currently maintaining those drivers could
      take a look at my changes to make sure I haven't screwed anything up.
      
      # Converted US_BULK_TRANSFER_xxx to USB_STOR_XFER_xxx, to make it more
      # easily distinguishable from USB_STOR_TRANSPORT_xxx.  (Also, in the
      # future these codes may apply to control transfers as well as to bulk
      # transfers.)
      #
      # Changed USB_STOR_XFER_FAILED to USB_STOR_XFER_ERROR, since it implies
      # a transport error rather than a transport failure.
      #
      # Added a USB_STOR_XFER_STALLED code, to indicate a transfer that was
      # terminated by an endpoint stall.
      
      This patch is in preparation for one in which usb_stor_transfer_partial()
      and usb_stor_transfer() are replaced by usb_stor_bulk_transfer_buf() and
      usb_stor_bulk_transfer_srb() respectively, with slightly different
      argument lists.  Ultimately the subdrivers will be able to use these
      routines in place of the slightly specialized versions they have now and
      in place of the ones in raw_bulk.c.
      0d80c6f5
    • Luc Van Oostenryck's avatar
      [PATCH] #include <linux/version.h> missing in drivers/usb/host/ohci-hcd.c · 7e5b54d4
      Luc Van Oostenryck authored
      compile fails with the following message:
      
      	> In file included from ohci-hcd.c:136:
      	> ohci-dbg.c:318: parse error
      	> make[3]: *** [ohci-hcd.o] Error 1
      
      due to a missing #include <linux/version.h>
      
      Here is a trivial patch for this.
      7e5b54d4
    • David Brownell's avatar
      [PATCH] USB shutdown oopser · 817c0217
      David Brownell authored
      is it guarenteed that callers have zero'd out the device
      before this is invoked?  Else the following is necessary to
      prevent potential OOPS's derefencing interface->dev.driver in
      the generic device layer.
      817c0217
    • David Brownell's avatar
      [PATCH] ehci-hcd: update · df42f7cf
      David Brownell authored
      Here's an EHCI update, I'll send separate patches to sync 2.4 with
      this version.  Changes in this version include:
      
        - An earlier locking update would give trouble on SPARC, where
          irqsave "flags" aren't flags.  This resolves that issue by
          adding a module parameter to limit work done with irqs off.
          (Some net drivers do the same thing.)
      
        - Optionally (now #ifdef DEBUG) collects some statistics on IRQs
          and URBs.  There are more IAA interrupts than I want to see,
          during extended usb-storage loading.
      
        - Adds a commented-out workaround for a problem I've seen on one
          VT8235.  Seems likely an issue with this specific motherboard;
          another tester hasn't reported such issues.
      
        - Includes the jiffies time_after() patch from Tim Schmielau.
      
        - Minor tweaks to the hcd portability (get rid of another #if).
      
        - Minor doc/diagnostic/... updates
      df42f7cf
    • David Brownell's avatar
      [PATCH] ohci-hcd, queue fault recovery + rm DEBUG · 65e2da7e
      David Brownell authored
      This USB patch updates the OHCI driver:
      
        - converts to relying on td_list shadowing the hardware's
          schedule; only collecting the donelist needs dma_to_td(),
          and td list handling works much like EHCI or UHCI.
      
        - leaves faulted endpoint queues (bulk/intr) disabled until
          the relevant drivers had a chance to clean up.
      
        - fixes minor bugs (unreported) in the affected code:
            * byteswap problem when unlinking urbs ... symptom would
              be data toggle confusion (since 2.4.2x) on big-endian cpus
            * latent bug if folk unlinked queue in LIFO order, not FIFO
      
        - removes unnecessary debug code; mostly de-BUG()ged
      
      The interesting fix is the "leave queues halted" one.  As
      discussed on email a while back, this HCD fault handling
      policy (also followed by EHCI) is sufficient to let device
      drivers implement the two key fault handling policies that
      seem to be necessary:
      
          (a) Datagram style, where issues on one I/O won't affect
              the next unless the device halted the endpoint.  The
              device driver can ignore most errors other than -EPIPE.
      
          (b) Stream style, where for example it'd be wrong to ever
              let block N+1 overwrite block N on the disk.  Once
              the first URB fails, the rest would just be unlinked
              in the completion handler.
      
      As a consequence of using the td_list, you can now see urb
      queuing in action in the driverfs 'async' file.  At least, if
      you look at the right time, or use drivers (networking, etc)
      that queue (bulk) reads for a long time.
      65e2da7e