1. 06 Jul, 2003 18 commits
    • Greg Ungerer's avatar
      [PATCH] flat loader v850 specific support abstracted · 40d98fd3
      Greg Ungerer authored
      Architecture specific flat loader code for v850 moved into its
      own v850 flat.h header. This patch also adds supporti for a number
      of relocation cases that need to be handled at laod time.
      
      Most of this code is originally from Miles Bader <miles@gnu.org>.
      40d98fd3
    • Greg Ungerer's avatar
      [PATCH] flat loader m68knommu specific support abstracted · cb04237b
      Greg Ungerer authored
      Architecture specific flat loader code for m68knommu moved into its
      own m68knommu flat.h header. Part of the shared library flat loader
      update.
      cb04237b
    • Greg Ungerer's avatar
      [PATCH] flat loader H8/300 specific support abstracted · 2253b09e
      Greg Ungerer authored
      Architecture specific flat loader code for H8/300 moved into its
      own H8/300 flat.h header.
      2253b09e
    • Greg Ungerer's avatar
      [PATCH] shared library support for MMUless binfmt_flat loader · 3d97dc2d
      Greg Ungerer authored
      This patch adds shared library support to the MMU application
      loader, binfmt_flat. This is not new, it is a forward port from the
      same support in 2.4.x kernels with MMUless support, and has been
      running for well over a year now. The code support is conditionally
      compiled on CONFIG_BINFMT_FLAT_SHARED. This change also abstracts
      a bit more architecture dependent code into the separate flat.h
      includes.
      
      Basically relocations within an application also carry a tag to
      identify what they refer too (this code or which shared library).
      This is patched as before at load/run-time with an appropriate
      address.
      3d97dc2d
    • Greg Ungerer's avatar
      [PATCH] simplify access_ok() for all m68knommu targets · ca6abe4c
      Greg Ungerer authored
      Unify access_ok for all m68knommu targets. All targets use the
      common linker script and have common end symbols. So now we can
      just use a simple check.
      ca6abe4c
    • Greg Ungerer's avatar
      [PATCH] remove unused register from clobber list in down_trylock() · 5b0a7205
      Greg Ungerer authored
      Remove "%d0" register from clobber list of down_trylock() for
      m68knommu. It is not used by the asm code here at all.
      5b0a7205
    • Greg Ungerer's avatar
      [PATCH] force PAGE_SIZE to be an unsigned long · abba5925
      Greg Ungerer authored
      Force PAGE_SIZE for the m68knommu architecture to be an unsigned long.
      This makes it consistent with all other architectures and cleans up
      a load of compiler warnings.
      abba5925
    • Greg Ungerer's avatar
      [PATCH] conditional ROMfs copy for Motorola M5307C3 board · bb47ba3b
      Greg Ungerer authored
      Conditionally copy the ROMfs filesystem on the Motorola M5307C3
      target board only if using a ROMfs.
      bb47ba3b
    • Greg Ungerer's avatar
      [PATCH] selection of boot parameters at configure time for Motorola 5282 targets · fd83c5ed
      Greg Ungerer authored
      Allow setting boot time parameters at configuration for Motorola
      5282 targets.
      fd83c5ed
    • Linus Torvalds's avatar
      Simplify and speed up mmap read-around handling · 82a333fa
      Linus Torvalds authored
      This improves cold-cache program startup noticeably for me, and
      simplifies the read-ahead logic at the same time. The rules for
      read-ahead are:
      
       - if the vma is marked random, we just do the regular one-page case. 
         Obvious.
      
       - if the vma is marked "linear access", we use the regular readahead
         code. No change in behaviour there (well, we also only consider it a 
         _miss_ if it was marked linear access - the "readahead" and
         "readaround"  things are now totally independent of each other)
      
       - otherwise, we look at how many hits/misses we've had for this 
         particular file open for mmap, and if we've had noticeably more
         misses than hits, we don't bother with read-around.
      
      In particular, this means that the "real" read-ahead logic literally
      only needs to worry about finding sequential accesses, and does not
      have to worry about the common executable mmap access patthers that 
      have very different behaviour.
      
      Some constant tweaking may be a good idea.
      82a333fa
    • Ingo Molnar's avatar
      [PATCH] another timer overflow thing · e939c913
      Ingo Molnar authored
      in add_timer_internal() we simply leave the timer pending forever if the
      expiry is in more than 0xffffffff jiffies. This means more than 48 days on
      eg. ia64 - which is not an unrealistic timeout. IIRC crond is happy to use
      extremely large timeouts.
      
      It's better to time out early (if you can call 48 days "early") than to
      not time out at all.
      e939c913
    • Bernardo Innocenti's avatar
      [PATCH] Fix do_div() for all architectures · f0a8aa74
      Bernardo Innocenti authored
      This offers a generic do_div64() that actually does the right thing,
      unlike some architectures that "optimized" the 64-by-32 divide into
      just a 32-bit divide.
      
      Both ppc and sh were already providing an assembly optimized
      __div64_32().  I called my function the same, so that their optimized
      versions will automatically override mine in lib.a.
      
      I've only tested extensively on m68knommu (uClinux) and made
      sure generated code is reasonably short. Should be ok also on
      parisc, since it's the same algorithm they were using before.
      
       - add generic C implementations of the do_div() for 32bit and 64bit
         archs in asm-generic/div64.h;
      
       - add generic library support function __div64_32() to handle the
         full 64/32 case on 32bit archs;
      
       - kill multiple copies of generic do_div() in architecture
         specific subdirs. Most copies were either buggy or not doing
         what they were supposed to do;
      
       - ensure all surviving instances of do_div() have their parameters
         correctly parenthesized to avoid funny side-effects;
      f0a8aa74
    • Paul Fulghum's avatar
      [PATCH] synclink_cs.c update · a6a6977c
      Paul Fulghum authored
      Fix arbitration between net open and tty open.
      
      Cleanup missed bits of CUA device removal changes.
      a6a6977c
    • Paul Fulghum's avatar
      [PATCH] synclinkmp.c update · d4188a26
      Paul Fulghum authored
      Fix arbitration between net open and tty open.
      
      Clean up unused locals resulting from latest tty changes.
      d4188a26
    • Paul Fulghum's avatar
      [PATCH] synclink.c update · 47dba812
      Paul Fulghum authored
      Fix arbitration between net open and tty open.
      
      Cleanup unused local resulting from latest tty changes.
      47dba812
    • Benjamin Herrenschmidt's avatar
      [PATCH] fix IDE init oops on PowerMac · db15ad85
      Benjamin Herrenschmidt authored
      From Mikael Petterson:
      
      
        Booting kernel 2.5.74 on a PowerMac with CONFIG_BLK_DEV_IDE_PMAC=y
        results in an oops during IDE init, and the box then reboots.
      
        The patch below updates drivers/ide/ppc/pmac.c to also set up the
        hwif->ide_dma_queued_off and hwif->ide_dma_queued_on function
        pointers, which fixes the oops. Tested on my ancient PM4400.
      db15ad85
    • Pavel Machek's avatar
      [PATCH] New maintainter for nbd · 16cde048
      Pavel Machek authored
      I no longer have the time/interest in nbd, and Paul agreed to take it
      over.
      16cde048
    • Anton Blanchard's avatar
      [PATCH] enable device mapper in compat layer · e4c67754
      Anton Blanchard authored
      The compat ioctls for device mapper were not being enabled due to an
      incorrect config option.
      e4c67754
  2. 05 Jul, 2003 22 commits
    • Andrew Morton's avatar
      [PATCH] Improve mmap readaround · 99fb27c6
      Andrew Morton authored
      This tweaks the mmap read-ahead behaviour so that the prefaulting
      is largely pointless.
      
       - double the minimum readaround chunksize in page_cache_readaround().
      
       - when a seek is detected, collapse the window more slowly.
      99fb27c6
    • Krzysztof Halasa's avatar
      2d3160cc
    • Andrew Morton's avatar
      [PATCH] i2o_scsi build fix · 71ac7ef2
      Andrew Morton authored
      i2o_scsi.c now needs pci.h.
      71ac7ef2
    • Andrew Morton's avatar
      [PATCH] fix rfcomm oops · 92be328b
      Andrew Morton authored
      From: ilmari@ilmari.org (Dagfinn Ilmari Mannsaker)
      
      It turns out that net/bluetooth/rfcomm/sock.c (and
      net/bluetooth/hci_sock.c) had been left out when net_proto_family gained an
      owner field, here's a patch that fixes them both.
      92be328b
    • Andrew Morton's avatar
      [PATCH] MTD build fix for old gcc's · 090a3c7b
      Andrew Morton authored
      From: junkio@cox.net
      
      Sigh.  Is there a gcc option to tell it to not accept this incompatible C99
      extension?
      090a3c7b
    • Andrew Morton's avatar
      [PATCH] fix current->user->__count leak · 7fc4c64b
      Andrew Morton authored
      From: Arvind Kandhare <arvind.kan@wipro.com>
      
      When switch_uid is called, the reference count of the new user is
      incremented twice.  I think the increment in the switch_uid is done because
      of the reparent_to_init() function which does not increase the __count for
      root user.
      
      But if switch_uid is called from any other function, the reference count is
      already incremented by the caller by calling alloc_uid for the new user.
      Hence the count is incremented twice.  The user struct will not be deleted
      even when there are no processes holding a reference count for it.  This
      does not cause any problem currently because nothing is dependent on timely
      deletion of the user struct.
      7fc4c64b
    • Andrew Morton's avatar
      [PATCH] epoll: microoptimisations · 0d98604b
      Andrew Morton authored
      From: Davide Libenzi <davidel@xmailserver.org>
      
      - Inline eventpoll_release() so that __fput() does not need to call in
        epoll code if the file itself is not registered inside an epoll fd
      
      - Add <linux/types.h> inclusion due __u32 and __u64 usage
      
      - Fix debug printf that would otherwise panic if enabled with the new
        epoll code
      0d98604b
    • Andrew Morton's avatar
      [PATCH] bootmem.c cleanups · 1c630a8d
      Andrew Morton authored
      From: Davide Libenzi <davidel@xmailserver.org>
      
      - Remove a couple of impossible debug checks (unsigneds cannot be
        negative!)
      
      - If __alloc_bootmem_core() fails with a goal and unaligned node_boot_start
        it'll loop fovever.
      1c630a8d
    • Andrew Morton's avatar
      [PATCH] after exec_mmap(), exec cannot fail · 12c1bf07
      Andrew Morton authored
      If de_thread() fails in flush_old_exec() then we try to fail the execve().
      
      That is a bad move, because exec_mmap() has already switched the current
      process over to the new mm.  The new process is not yet sufficiently set up
      to handle the error and the kernel doublefaults and dies.  exec_mmap() is the
      point of no return.
      
      Change flush_old_exec() to call de_thread() before running exec_mmap() so the
      execing program sees the error.  I added fault injection to both de_thread()
      and exec_mmap() - everything now survives OK.
      12c1bf07
    • Andrew Morton's avatar
      [PATCH] block allocation comments · e34b0f53
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      Add some comments to the request allocation code.
      e34b0f53
    • Andrew Morton's avatar
      [PATCH] get_io_context fixes · 07581dd2
      Andrew Morton authored
      - pass gfp_flags to get_io_context(): not all callers are forced to use
        GFP_ATOMIC().
      
      - fix locking in get_io_context(): bump the refcount whilein the exclusive
        region.
      
      - don't go oops in get_io_context() if the kmalloc failed.
      
      - in as_get_io_context(): fail the whole thing if we were unable to
        allocate the AS-specific part.
      
      - as_remove_queued_request() cleanup
      07581dd2
    • Andrew Morton's avatar
      [PATCH] block request batching · 930805a2
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      The following patch gets batching working how it should be.
      
      After a process is woken up, it is allowed to allocate up to 32 requests
      for 20ms.  It does not stop other processes submitting requests if it isn't
      submitting though.  This should allow less context switches, and allow
      batches of requests from each process to be sent to the io scheduler
      instead of 1 request from each process.
      
      tiobench sequential writes are more than tripled, random writes are nearly
      doubled over mm1.  In earlier tests I generally saw better CPU efficiency
      but it doesn't show here.  There is still debug to be taken out.  Its also
      only on UP.
      
                                      Avg     Maximum     Lat%   Lat%   CPU
       Identifier    Rate  (CPU%)  Latency   Latency     >2s    >10s   Eff
       ------------------- ------ --------- ---------- ------- ------ ----
       -2.5.71-mm1   11.13 3.783%    46.10    24668.01   0.84   0.02   294
       +2.5.71-mm1   13.21 4.489%    37.37     5691.66   0.76   0.00   294
      
       Random Reads
       ------------------- ------ --------- ---------- ------- ------ ----
       -2.5.71-mm1    0.97 0.582%   519.86     6444.66  11.93   0.00   167
       +2.5.71-mm1    1.01 0.604%   484.59     6604.93  10.73   0.00   167
      
       Sequential Writes
       ------------------- ------ --------- ---------- ------- ------ ----
       -2.5.71-mm1    4.85 4.456%    77.80    99359.39   0.18   0.13   109
       +2.5.71-mm1   14.11 14.19%    10.07    22805.47   0.09   0.04    99
      
       Random Writes
       ------------------- ------ --------- ---------- ------- ------ ----
       -2.5.71-mm1    0.46 0.371%    14.48     6173.90   0.23   0.00   125
       +2.5.71-mm1    0.86 0.744%    24.08     8753.66   0.31   0.00   115
      
      It decreases context switch rate on IBM's 8-way on ext2 tiobench 64 threads
      from ~2500/s to ~140/s on their regression tests.
      930805a2
    • Andrew Morton's avatar
      [PATCH] generic io contexts · 16f88dbd
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      Generalise the AS-specific per-process IO context so that other IO schedulers
      could use it.
      16f88dbd
    • Andrew Morton's avatar
      [PATCH] block batching fairness · 80af89ca
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      This patch fixes the request batching fairness/starvation issue.  Its not
      clear what is going on with 2.4, but it seems that its a problem around this
      area.
      
      Anyway, previously:
      
      	* request queue fills up
      	* process 1 calls get_request, sleeps
      	* a couple of requests are freed
      	* process 2 calls get_request, proceeds
      	* a couple of requests are freed
      	* process 2 calls get_request...
      
      Now as unlikely as it seems, it could be a problem.  Its a fairness problem
      that process 2 can skip ahead of process 1 anyway.
      
      With the patch:
      
      	* request queue fills up
      	* any process calling get_request will sleep
      	* once the queue gets below the batch watermark, processes
      	  start being worken, and may allocate.
      
      
      This patch includes Chris Mason's fix to only clear queue_full when all tasks
      have been woken.  Previously I think starvation and unfairness could still
      occur.
      
      With this change to the blk-fair-batches patch, Chris is showing some much
      improved numbers for 2.4 - 170 ms max wait vs 2700ms without blk-fair-batches
      for a dbench 90 run.  He didn't indicate how much difference his patch alone
      made, but it is an important fix I think.
      80af89ca
    • Andrew Morton's avatar
      [PATCH] handle OOM in get_request_wait() · f67198fb
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      If there are no requess in flight against the target device and
      get_request() fails, nothing will wake us up.  Fix.
      f67198fb
    • Andrew Morton's avatar
      [PATCH] allow the IO scheduler to pass an allocation hint to · 08f36413
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      
      This patch implements a hint so that AS can tell the request allocator to
      allocate a request even if there are none left (the accounting is quite
      flexible and easily handles overallocations).
      
      elv_may_queue semantics have changed from "the elevator does _not_ want
      another request allocated" to "the elevator _insists_ that another request is
      allocated".  I couldn't see any harm ;)
      
      Now in practice, AS will only allow _1_ request over the limit, because as
      soon as the request is sent to AS, it stops anticipating.
      08f36413
    • Andrew Morton's avatar
      [PATCH] blk_congestion_wait threshold cleanup · 4e83dc01
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      Now that we are counting requests (not requests free), this patch changes
      the congested & batch watermarks to be more logical.  Also a minor fix to
      the sysfs code.
      4e83dc01
    • Andrew Morton's avatar
      [PATCH] per queue nr_requests · ee66147b
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      This gets rid of the global queue_nr_requests and usage of BLKDEV_MAX_RQ
      (the latter is now only used to set the queues' defaults).
      
      The queue depth becomes per-queue, controlled by a sysfs entry.
      ee66147b
    • Andrew Morton's avatar
      [PATCH] Use kblockd for running request queues · 179b68bb
      Andrew Morton authored
      Using keventd for running request_fns is risky because keventd itself can
      block on disk I/O.  Use the new kblockd kernel threads for the generic
      unplugging.
      179b68bb
    • Andrew Morton's avatar
      [PATCH] anticipatory I/O scheduler · 97ff29c2
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      This is the core anticipatory IO scheduler.  There are nearly 100 changesets
      in this and five months work.  I really cannot describe it fully here.
      
      Major points:
      
      - It works by recognising that reads are dependent: we don't know where the
        next read will occur, but it's probably close-by the previous one.  So once
        a read has completed we leave the disk idle, anticipating that a request
        for a nearby read will come in.
      
      - There is read batching and write batching logic.
      
        - when we're servicing a batch of writes we will refuse to seek away
          for a read for some tens of milliseconds.  Then the write stream is
          preempted.
      
        - when we're servicing a batch of reads (via anticipation) we'll do
          that for some tens of milliseconds, then preempt.
      
      - There are request deadlines, for latency and fairness.
        The oldest outstanding request is examined at regular intervals. If
        this request is older than a specific deadline, it will be the next
        one dispatched. This gives a good fairness heuristic while being simple
        because processes tend to have localised IO.
      
      
      Just about all of the rest of the complexity involves an array of fixups
      which prevent most of teh obvious failure modes with anticipation: trying to
      not leave the disk head pointlessly idle.  Some of these algorithms are:
      
      - Process tracking.  If the process whose read we are anticipating submits
        a write, abandon anticipation.
      
      - Process exit tracking.  If the process whose read we are anticipating
        exits, abandon anticipation.
      
      - Process IO history.  We accumulate statistical info on the process's
        recent IO patterns to aid in making decisions about how long to anticipate
        new reads.
      
        Currently thinktime and seek distance are tracked. Thinktime is the
        time between when a process's last request has completed and when it
        submits another one. Seek distance is simply the number of sectors
        between each read request. If either statistic becomes too high, the
        it isn't anticipated that the process will submit another read.
      
      The above all means that we need a per-process "io context".  This is a fully
      refcounted structure.  In this patch it is AS-only.  later we generalise it a
      little so other IO schedulers could use the same framework.
      
      - Requests are grouped as synchronous and asynchronous whereas deadline
        scheduler groups requests as reads and writes. This can provide better
        sync write performance, and may give better responsiveness with journalling
        filesystems (although we haven't done that yet).
      
        We currently detect synchronous writes by nastily setting PF_SYNCWRITE in
        current->flags.  The plan is to remove this later, and to propagate the
        sync hint from writeback_contol.sync_mode into bio->bi_flags thence into
        request->flags.  Once that is done, direct-io needs to set the BIO sync
        hint as well.
      
      - There is also quite a bit of complexity gone into bashing TCQ into
        submission. Timing for a read batch is not started until the first read
        request actually completes. A read batch also does not start until all
        outstanding writes have completed.
      
      AS is the default IO scheduler.  deadline may be chosen by booting with
      "elevator=deadline".
      
      There are a few reasons for retaining deadline:
      
      - AS is often slower than deadline in random IO loads with large TCQ
        windows. The usual real world task here is OLTP database loads.
      
      - deadline is presumably more stable.
      
      - deadline is much simpler.
      
      
      
      The tunable per-queue entries under /sys/block/*/iosched/ are all in
      milliseconds:
      
      * read_expire
      
        Controls how long until a request becomes "expired".
      
        It also controls the interval between which expired requests are served,
        so set to 50, a request might take anywhere < 100ms to be serviced _if_ it
        is the next on the expired list.
      
        Obviously it can't make the disk go faster.  Result is basically the
        timeslice a reader gets in the presence of other IO.  100*((seek time /
        read_expire) + 1) is very roughly the % streaming read efficiency your disk
        should get in the presence of multiple readers.
      
      * read_batch_expire
      
        Controls how much time a batch of reads is given before pending writes
        are served.  Higher value is more efficient.  Shouldn't really be below
        read_expire.
      
      * write_ versions of the above
      
      * antic_expire
      
        Controls the maximum amount of time we can anticipate a good read before
        giving up.  Many other factors may cause anticipation to be stopped early,
        or some processes will not be "anticipated" at all.  Should be a bit higher
        for big seek time devices though not a linear correspondance - most
        processes have only a few ms thinktime.
      97ff29c2
    • Andrew Morton's avatar
      [PATCH] elevator completion API · 104e6fdc
      Andrew Morton authored
      From: Nick Piggin <piggin@cyberone.com.au>
      
      Introduces an elevator_completed_req() callback with which the generic
      queueing layer may tell an IO scheduler that a particualr request has
      finished.
      104e6fdc
    • Andrew Morton's avatar
      [PATCH] elv_may_queue() API function · 7d2483a9
      Andrew Morton authored
      Introduces the elv_may_queue() predicate with which the IO scheduler may tell
      the generic request layer that we may add another request to this queue.
      
      It is used by the CFQ elevator.
      7d2483a9