1. 03 Oct, 2002 22 commits
    • Hugh Dickins's avatar
      [PATCH] shmem accounting fixes · 62fe4120
      Hugh Dickins authored
      If we're going to rely on struct page *s rather than virtual addresses
      for the metadata pages, let's count nr_swapped in the private field:
      these pages are only for storing swp_entry_ts, and need not be examined
      at all when nr_swapped is zero.
      62fe4120
    • Hugh Dickins's avatar
      [PATCH] put shmem metadata in highmem · 2729b9af
      Hugh Dickins authored
      wli suffered OOMs because tmpfs was allocating GFP_USER, for its
      metadata pages.  This patch allocates them GFP_HIGHUSER (default
      mapping->gfp_mask) and uses atomic kmaps to access (KM_USER0 for upper
      levels, KM_USER1 for lowest level).  shmem_unuse_inode and
      shmem_truncate rewritten alike to avoid repeated maps and unmaps of the
      same page: cr's truncate was much more elegant, but I couldn't quite
      see how to convert it.
      
      I do wonder whether this patch is a bloat too far for tmpfs, and even
      non-highmem configs will be penalised by page_address overhead (perhaps
      a further patch could get over that).  There is an attractive
      alternative (keep swp_entry_ts in the existing radix-tree, no metadata
      pages at all), but we haven't worked out an unhacky interface to that.
      For now at least, let's give tmpfs highmem metadata a spin.
      2729b9af
    • Hugh Dickins's avatar
      [PATCH] shmem: avoid metadata leakiness · 03844e4b
      Hugh Dickins authored
      akpm and wli each discovered unfortunate behaviour of dbench on tmpfs:
      after tmpfs has reached its data memory limit, dbench continues to
      lseek and write, and tmpfs carries on allocating unlimited metadata
      blocks to accommodate the data it then refuses.  That particular
      behaviour could be simply fixed by checking earlier; but I think tmpfs
      metablocks should be subject to the memory limit, and included in df
      and du accounting.  Also, manipulate inode->i_blocks under lock, was
      missed before.
      03844e4b
    • Hugh Dickins's avatar
      [PATCH] consolidate shmem_getpage and shmem_getpage_locked · 7aa8800b
      Hugh Dickins authored
      The distinction between shmem_getpage and shmem_getpage_locked is not
      helpful, particularly now info->sem is gone; and shmem_getpage
      confusingly tailored to shmem_nopage's expectations.  Put the code of
      shmem_getpage_locked into the frame of shmem_getpage, leaving its
      callers to unlock_page afterwards.
      7aa8800b
    • Hugh Dickins's avatar
      [PATCH] shmem: remove info->sem · cd7fef3d
      Hugh Dickins authored
      Between inode->i_sem and info->lock comes info->sem; but it doesn't
      guard thoroughly against the difficult races (truncate during read),
      and serializes reads from tmpfs unlike other filesystems.  I'd prefer
      to work with just i_sem and info->lock, backtracking when necessary
      (when another task allocates block or metablock at the same time).
      
      (I am not satisfied with the locked setting of next_index at the start
      of shmem_getpage_locked: it's one lock hold too many, and it doesn't
      really fix races against truncate better than before: another patch in
      a later batch will resolve that.)
      cd7fef3d
    • Hugh Dickins's avatar
      [PATCH] shmem truncate race fix · 91abc449
      Hugh Dickins authored
      The earlier partial truncation fix in shmem_truncate admits it is racy,
      and I've now seen that (though perhaps more likely when
      mpage_writepages was writing pages it shouldn't).  A cleaner fix is,
      not to repeat the memclear in shmem_truncate, but to hold the partial
      page in memory throughout truncation, by shmem_holdpage from
      shmem_notify_change.
      91abc449
    • Hugh Dickins's avatar
      [PATCH] add shmem_vm_writeback() · 3e884b46
      Hugh Dickins authored
      Give tmpfs its own shmem_vm_writeback (and empty shmem_writepages):
      going through the default mpage_writepages is very wrong for tmpfs,
      since that may write nearby pages while still mapped into mms, but
      "writing" converts pages from tmpfs file identity to swap backing
      identity: doing so while mapped breaks assumptions throughout e.g.  the
      shared file is liable to disintegrate into private instances.
      3e884b46
    • Hugh Dickins's avatar
      [PATCH] tmpfs: minor fixes · 83c69b86
      Hugh Dickins authored
      tmpfs contributes to the AltSysRqM swapcache add and delete statistics,
      but not to its find statistics: use lookup_swap_cache wrapper to
      find_get_page, to contribute to those statistics too.  Elsewhere, use
      existing info pointer and NAME_MAX definition.  (I'll be sending 2.4
      version to Marcelo shortly.)
      83c69b86
    • Hugh Dickins's avatar
      [PATCH] tpmfs: fake a non-zero size for directories · a76da73c
      Hugh Dickins authored
      Apparently some applications are confused by tmpfs's practice of
      returning zero for the size of diretories.  In 2.4.20-pre6 Peter Anvin
      submitted a change to make tmpfs directories always have a size of "1".
      
      In the same spirit, this patch arranges for tmpfs directories to show
      up as having 20 * number_of_entries, including "." and "..".
      
      Apparently counting up the size of all the entries isn't worth the
      hassle.
      a76da73c
    • Hugh Dickins's avatar
      [PATCH] shmem_rename() fixes · 39d21233
      Hugh Dickins authored
      shmem_rename still didn't get parent directory link count quite right,
      in the case where you rename a directory in place of an empty directory
      (with rename syscall: doesn't happen like that with mv command); and it
      forgot to update new directory's ctime and mtime.  (I'll be sending 2.4
      version to Marcelo shortly.)
      39d21233
    • Hugh Dickins's avatar
      [PATCH] cleanup of page->flags manipulations · 6b5dbcf2
      Hugh Dickins authored
      I've had this patch hanging around for a couple of months (you liked an
      earlier version, but I never found time to resubmit it), remove some
      unnecessary PageDirty and PageUptodate manipulations.
      
      add_to_page_cache can only receive a dirty page in the add_to_swap
      case, so deal with it there.  add_to_swap is better off using
      add_to_page_cache directly than add_to_swap_cache.  Keep move_to_ and
      _from_swap_cache simple, and don't fiddle with flags without reason.
      It's a little less efficient to correct clean->dirty list as an
      afterthought, but cuts unusual code from slow path.
      6b5dbcf2
    • Hugh Dickins's avatar
      [PATCH] tmpfs swapoff deadlock · a2495207
      Hugh Dickins authored
      tmpfs 1/5 swapoff deadlock: my igrab/iput around the yield in
      shmem_unuse_inode was rubbish, seems my testing never really hit the
      case until last week, when truncation of course deadlocked on the page
      held locked across the iput (at least I had the foresight to say "ugh!"
      there).  Don't yield here, switch over to the simple backoff I'd been
      using for months in the loopable tmpfs patch (yes, it could loop
      indefinitely for memory, that's already an issue to be dealt with
      later).  The return convention from shmem_unuse to try_to_unuse is
      inelegant (commented at both ends), but effective.
      a2495207
    • Andrew Morton's avatar
      [PATCH] convert direct-io to use bio_add_page() · c21c3ad0
      Andrew Morton authored
      From Badari Pavlati.
      
      Use bio_add_page() in direct-io.c.
      c21c3ad0
    • Andrew Morton's avatar
      [PATCH] "io wait" process accounting · 7b88e5e0
      Andrew Morton authored
      Patch from Rik adds "I/O wait" statistics to /proc/stat.
      
      This allows us to determine how much system time is being spent
      awaiting IO completion.  This is an important statistic, as it tends to
      directly subtract from job completion time.
      
      procps-2.0.9 is OK with this, but doesn't report it.
      7b88e5e0
    • Andrew Morton's avatar
      [PATCH] add kswapd success accounting to /proc/vmstat · 7e96bae1
      Andrew Morton authored
      Tells us how many pages were reclaimed by kswapd.
      
      The `pgsteal' statistic tells us how many pages were reclaimed
      altogether.  So
      
      	kswapd_steal - pgsteal
      
      is the number of pages which were directly reclaimed by page allocating
      processes.
      
      
      Also, the `pgscan' data is currently counting the number of pages
      scanned in shrink_cache() plus the number of pages scanned in
      refill_inactive_zone().  These are rather separate concepts, so I
      created the new `pgrefill' counter for refill_inactive_zone().
      `pgscan' is now just the number of pages scanned in shrink_cache().
      7e96bae1
    • Andrew Morton's avatar
      [PATCH] add /proc/vmstat (start of /proc/stat cleanup) · 15e19695
      Andrew Morton authored
      Moves the VM accounting out of /proc/stat and into /proc/vmstat.
      
      The VM accounting is now per-cpu.
      
      It also moves kstat.pgpgin and kstat.pgpgout into /proc/vmstat.
      Which is a bit of a duplication of /proc/diskstats (SARD), but it's
      easy, super-cheap and makes life a lot easier for all the system
      monitoring applications which we just broke.
      
      We now require procps 2.0.9.
      
      Updated versions of top and vmstat are available at http://surriel.com
      and the Cygnus CVS is uptodate for these changes.  (Rik has the CVS
      info at the above site).
      
      This tidies up kernel_stat quite a lot - it now only contains CPU
      things (interrupts and CPU loads) and disk things.  So we now have:
      
      /proc/stat:	CPU things and disk things
      /proc/vmstat:	VM things	(plus pgpgin, pgpgout)
      
      The SARD patch removes the disk things from /proc/stat as well.
      15e19695
    • Andrew Morton's avatar
      [PATCH] truncate/invalidate_inode_pages rewrite · 735a2573
      Andrew Morton authored
      Rewrite these functions to use gang lookup.
      
      - This probably has similar performance to the old code in the common case.
      
      - It will be vastly quicker than current code for the worst case
        (single-page truncate).
      
      - invalidate_inode_pages() has been changed.  It used to use
        page_count(page) as the "is it mapped into pagetables" heuristic.  It
        now uses the (page->pte.direct != 0) heuristic.
      
      - Removes the worst cause of scheduling latency in the kernel.
      
      - It's a big code cleanup.
      
      - invalidate_inode_pages() has been changed to take an address_space
        *, not an inode *.
      
      - the maximum hold times for mapping->page_lock are enormously reduced,
        making it quite feasible to turn this into an irq-safe lock.  Which, it
        seems, is a requirement for sane AIO<->direct-io integration, as well
        as possibly other AIO things.
      
      (Thanks Hugh for fixing a bug in this one as well).
      
      (Christoph added some stuff too)
      735a2573
    • Andrew Morton's avatar
      [PATCH] radix tree gang lookup · 55b40732
      Andrew Morton authored
      Adds a gang lookup facility to radix trees.  It provides an efficient
      means of locating a bunch of pages starting at a particular offset.
      
      The implementation is a bit dumb, but is efficient enough.  And it is
      amenable to the `tagged lookup' extension which is proving tricky to
      write, but which will allow the dirty pages within a mapping to be
      located in pgoff_t order.
      
      Thanks are due to Huch Dickins for finding and fixing an unpleasant bug
      in here.
      55b40732
    • Andrew Morton's avatar
      [PATCH] remove bogus BUG in page_remove_rmap() · 803f57a8
      Andrew Morton authored
      Pages with no reverse mapping can be present in page tables as a result
      of a driver performing remap_page_range().  Don't go BUG over them.
      803f57a8
    • Andrew Morton's avatar
      [PATCH] mprotect bugfix · 9c96b76d
      Andrew Morton authored
      Patch from Hugh Dickins
      
      Our earlier fix for mprotect_fixup was broken - passing an
      already-freed VMA to change_protection().
      9c96b76d
    • Andrew Morton's avatar
      [PATCH] sys_ioperm atomicity fix · f9a4baef
      Andrew Morton authored
      sys_ioperm() is calling kmalloc(GFP_KERNEL) inside get_cpu().  That's
      wrong, because the memory allocation could schedule away and return on
      a different CPU.
      
      So change it to perform the memory allocation outside the atomic region.
      f9a4baef
    • Andrew Morton's avatar
      [PATCH] misc (mainly documentation) · 3a9ed298
      Andrew Morton authored
      - hugetlb Documentation update
      
      - Add /proc/buddyinfo documentation
      
      - nano-cleanup in __remove_from_page_cache.
      3a9ed298
  2. 30 Sep, 2002 13 commits
    • Linus Torvalds's avatar
      Linux v2.5.40 · 7570df54
      Linus Torvalds authored
      7570df54
    • Linus Torvalds's avatar
      Merge http://linux-scsi.bkbits.net/scsi-for-linus-2.5 · 2b9fa51a
      Linus Torvalds authored
      into home.transmeta.com:/home/torvalds/v2.5/linux
      2b9fa51a
    • James Bottomley's avatar
      Merge mulgrave.(none):/home/jejb/BK/linux-2.5 · fd0a1c61
      James Bottomley authored
      into mulgrave.(none):/home/jejb/BK/scsi-for-linus-2.5
      fd0a1c61
    • Mike Anderson's avatar
      Error handler general clean up · 9b46c836
      Mike Anderson authored
      9b46c836
    • Rolf Fokkens's avatar
      [PATCH] sg.c and USER_HZ, kernel 2.5.37 · 8885e375
      Rolf Fokkens authored
      Hi!
      
      Since the introduction of USER_HZ the SG_[GS]ET_TIMEOUT ioctls may have
      a serious BUG as userspace uses a different HZ from the HZ in kernelspace.
      
      In x86 HZ=1000 and USER_HZ=100, resulting in confusing timouts as the
      kernel measures time 10 times as fast as userspace.
      
      This patch is an attempt to fix this by transforming USER_HZ based timing to
      HZ based timing before storing it in timeout. To make sure that SG_GET_TIMEOUT
      and SG_SET_TIMEOUT behave consistently a field timeout_user is added which
      stores the exact value that's passed by SG_SET_TIMEOUT and it's returned on
      SG_GET_TIMEOUT.
      
      Rolf Fokkens
      fokkensr@fokkensr.vertis.nl
      
      P.S. this is the second post of this patch
      8885e375
    • James Bottomley's avatar
      dfa944ae
    • Andrew Morton's avatar
      scsi_initialise_merge_fn() will only set highio if ->type == TYPE_DISK. · 2b562242
      Andrew Morton authored
      But it's called from scsi_add_lun()->scsi_alloc_sdev() before the type
      is known.  The type is -1 all the time in scsi_initialise_merge_fn()
      and scsi always bounces.
      
      This patch makes it do the right thing - just enable block-highmem for
      all scsi devices.
      
      Jens had this to say:
      
      "I guess that block-highmem has been around long enough, that I can
       use the term 'historically' at least in the kernel sense :-)
      
       This extra check was added for IDE because each device type driver
       (ide-disk, ide-cd, etc) needed to be updated to not assume virtual
       mappings of request data was valid.  I only did that for ide-disk,
       since this is the only one where bounce buffering really hurt
       performance wise.  So while ide-cd and ide-tape etc could have been
       updated, I deemed it uninteresting and not worthwhile.
      
       Now, this was just carried straight into the scsi counter parts,
       conveniently, because of laziness.  A quick glance at sr shows that it
       too can aviod bouncing easily (no changes needed).  st may need some
       changes, though.  So again, for scsi it was a matter of not impacting
       existing code in 2.4 too much.
      
       So TYPE_DISK check can be killed in 2.5 if someone does the work of
       checking that it is safe.  I'm not so sure it will make eg your SCSI
       CD-ROM that much faster :-)"
      
      2b562242
    • David Gibson's avatar
      [PATCH] Squash warning in fs/devfs/base.c · 5dd17103
      David Gibson authored
      This removes an unused label in fs/devfs/base.c
      5dd17103
    • Greg Kroah-Hartman's avatar
      Merge kroah.com:/home/greg/linux/BK/bleeding_edge-2.5 · 1a008d0e
      Greg Kroah-Hartman authored
      into kroah.com:/home/greg/linux/BK/gregkh-2.5
      1a008d0e
    • Randy Dunlap's avatar
      [PATCH] hc_sl811 build and memory leak · 5c1c6931
      Randy Dunlap authored
      It needs s/malloc.h/slab.h/ .
      It also forgets to free some memory on an error exit patch.
      Patch for 2.5.39 follows.
      5c1c6931
    • David Brownell's avatar
      [PATCH] usb_sg_{init,wait,cancel}() · 1e4fece8
      David Brownell authored
      Here are the scatterlist primitives there's been mail about before.
      Now the code has passed basic sanity testing, and is ready to merge
      into Linus' tree to start getting wider use.  Greg, please merge!
      
      To recap, the routines are a utility layer packaging several usb
      core facilities to improve system performance.  It's synchronous.
      The code uses functionality that drivers could use already, but
      generally haven't:
      
          - Request queueing.  This is a big performance win.  It lets
            device drivers help the hcds avoid wasted i/o bandwidth, by
            eliminating irq and scheduling latencies between requests.  It
            can make a huge difference at high speed, when the latencies
            often exceed the time to handle each i/o request!
      
          - The new usb_map_sg() primitives, leveraging IOMMU hardware
            if it's there (better than entry-at-a-time mapping).
      
          - URB_NO_INTERRUPT transfer flag, a hint to hcds that they
            can avoid a 'success irq' for this urb.  Only the urb for
            the last scatterlist entry really needs an IRQ, the others
            can be eliminated or delayed.  (OHCI uses this today, and
            any HCD can safely ignore it.)
      
      The particular functionality in these APIs seemed to meet Matt's
      requirements for usb-storage, so I'd hope the 2.5 usb-storage
      code will start to use these routines in a while.  (And maybe
      those two scanner drivers: hpusbscsi, microtek.)
      
      Brief summary of testing:  this code seems correct for normal
      reads and writes, but the fault paths (including cancelation)
      haven't been tested yet.  Both EHCI and OHCI seem to be mostly
      OK with these more aggressive queued loads, but may need small
      updates (like the two I sent yesterday).  Unfortunately I have
      to report that UHCI and urb queueing will sometimes lock up my
      hardware (PIIX4), so while we're lots better than 2.4 this is
      still a bit of a trouble spot for now.
      
      I'll be making some testing software available shortly, which
      will help track down remaining HCD level problems by giving the
      queuing APIs (and some others!) a more strenuous workout than
      most drivers will, in their day-to-day usage.
      
      - Dave
      1e4fece8
    • Matthew Dharm's avatar
      [PATCH] USB-storage: problem clearing halts · 2eea1938
      Matthew Dharm authored
      Greg, attached is a patch designed for diagnostic purposes.  Please apply
      to the 2.5 tree -- yes, we'll be removing this at some point in the future.
      
      It appears that we have a problem clearing halts.  This patch causes a very
      clear message to be printed whenever a usb_stor_clear_halt() manages to
      work.  So far, I haven't seen such a thing happen.  And I've seen _lots_ of
      STALL conditions.
      
      This problem has likely been around for a while... however, it hasn't been
      noticed before because usb-storage was difficult to use because of other
      bugs.  Heck, the most recent 'bk pull' is the first one for me in _months_
      which let me boot all the way into X11.
      
      I'm going to hold my patch queue until this is resolved.  On my test setup,
      it's easy to see this failing.  I've tried with 4 different devices, with
      both UHCI and EHCI drivers.  I don't want to confuse this problem with
      other patches...
      
      'result' in this function always seems to be -32.  Which is odd, because
      control endpoints shouldn't do that.
      
      I'm open to suggestions as to where to look for this bug, but my instincts
      are telling me that this is a core or HCD issue, not a usb-storage issue.
      
      On a positive note, this means that the error-recovery system gets a good
      workout.
      2eea1938
    • Linus Torvalds's avatar
      Merge bk://bk.arm.linux.org.uk · 2fbc109c
      Linus Torvalds authored
      into penguin.transmeta.com:/home/penguin/torvalds/repositories/kernel/linux
      2fbc109c
  3. 01 Oct, 2002 4 commits
  4. 30 Sep, 2002 1 commit
    • Russell King's avatar
      [ARM] Fix sa1111 IRQ handling · 99afe913
      Russell King authored
      We must clear down all currently pending IRQs before servicing any
      IRQ on the chip.  This prevents immediate recursion into the
      interrupt handling paths when we service the first IRQ.
      99afe913