An error occurred fetching the project authors.
  1. 07 Feb, 2008 6 commits
  2. 17 Oct, 2007 1 commit
    • Fengguang Wu's avatar
      writeback: remove pages_skipped accounting in __block_write_full_page() · 1f7decf6
      Fengguang Wu authored
      Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug:
      
      > The following strange behavior can be observed:
      >
      > 1. large file is written
      > 2. after 30 seconds, nr_dirty goes down by 1024
      > 3. then for some time (< 30 sec) nothing happens (disk idle)
      > 4. then nr_dirty again goes down by 1024
      > 5. repeat from 3. until whole file is written
      >
      > So basically a 4Mbyte chunk of the file is written every 30 seconds.
      > I'm quite sure this is not the intended behavior.
      
      It can be produced by the following test scheme:
      
      # cat bin/test-writeback.sh
      grep nr_dirty /proc/vmstat
      echo 1 > /proc/sys/fs/inode_debug
      dd if=/dev/zero of=/var/x bs=1K count=204800&
      while true; do grep nr_dirty /proc/vmstat; sleep 1; done
      
      # bin/test-writeback.sh
      nr_dirty 19207
      nr_dirty 19207
      nr_dirty 30924
      204800+0 records in
      204800+0 records out
      209715200 bytes (210 MB) copied, 1.58363 seconds, 132 MB/s
      nr_dirty 47150
      nr_dirty 47141
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47205
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47214
      nr_dirty 47215
      nr_dirty 47216
      nr_dirty 47216
      nr_dirty 47216
      nr_dirty 47154
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47143
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47142
      nr_dirty 47134
      nr_dirty 47134
      nr_dirty 47135
      nr_dirty 47135
      nr_dirty 47135
      nr_dirty 46097 <== -1038
      nr_dirty 46098
      nr_dirty 46098
      nr_dirty 46098
      [...]
      nr_dirty 46091
      nr_dirty 46092
      nr_dirty 46092
      nr_dirty 45069 <== -1023
      nr_dirty 45056
      nr_dirty 45056
      nr_dirty 45056
      [...]
      nr_dirty 37822
      nr_dirty 36799 <== -1023
      [...]
      nr_dirty 36781
      nr_dirty 35758 <== -1023
      [...]
      nr_dirty 34708
      nr_dirty 33672 <== -1024
      [...]
      nr_dirty 33692
      nr_dirty 32669 <== -1023
      
      % ls -li /var/x
      847824 -rw-r--r-- 1 root root 200M 2007-08-12 04:12 /var/x
      
      % dmesg|grep 847824  # generated by a debug printk
      [  529.263184] redirtied inode 847824 line 548
      [  564.250872] redirtied inode 847824 line 548
      [  594.272797] redirtied inode 847824 line 548
      [  629.231330] redirtied inode 847824 line 548
      [  659.224674] redirtied inode 847824 line 548
      [  689.219890] redirtied inode 847824 line 548
      [  724.226655] redirtied inode 847824 line 548
      [  759.198568] redirtied inode 847824 line 548
      
      # line 548 in fs/fs-writeback.c:
      543                 if (wbc->pages_skipped != pages_skipped) {
      544                         /*
      545                          * writeback is not making progress due to locked
      546                          * buffers.  Skip this inode for now.
      547                          */
      548                         redirty_tail(inode);
      549                 }
      
      More debug efforts show that __block_write_full_page()
      never has the chance to call submit_bh() for that big dirty file:
      the buffer head is *clean*. So basicly no page io is issued by
      __block_write_full_page(), hence pages_skipped goes up.
      
      Also the comment in generic_sync_sb_inodes():
      
      544                         /*
      545                          * writeback is not making progress due to locked
      546                          * buffers.  Skip this inode for now.
      547                          */
      
      and the comment in __block_write_full_page():
      
      1713                 /*
      1714                  * The page was marked dirty, but the buffers were
      1715                  * clean.  Someone wrote them back by hand with
      1716                  * ll_rw_block/submit_bh.  A rare case.
      1717                  */
      
      do not quite agree with each other. The page writeback should be skipped for
      'locked buffer', but here it is 'clean buffer'!
      
      This patch fixes this bug. Though I'm not sure why __block_write_full_page()
      is called only to do nothing and who actually issued the writeback for us.
      
      This is the two possible new behaviors after the patch:
      
      1) pretty nice: wait 30s and write ALL:)
      2) not so good:
      	- during the dd: ~16M
      	- after 30s:      ~4M
      	- after 5s:       ~4M
      	- after 5s:     ~176M
      
      The next patch will fix case (2).
      
      Cc: David Chinner <dgc@sgi.com>
      Cc: Ken Chen <kenchen@google.com>
      Signed-off-by: default avatarFengguang Wu <wfg@mail.ustc.edu.cn>
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f7decf6
  3. 16 Oct, 2007 6 commits
  4. 12 Oct, 2007 1 commit
  5. 10 Oct, 2007 1 commit
  6. 18 Sep, 2007 1 commit
  7. 05 Sep, 2007 1 commit
  8. 14 Jul, 2007 3 commits
  9. 29 May, 2007 1 commit
    • David Chinner's avatar
      [XFS] Write at EOF may not update filesize correctly. · df3c7244
      David Chinner authored
      The recent fix for preventing NULL files from being left around does not
      update the file size corectly in all cases. The missing case is a write
      extending the file that does not need to allocate a block.
      
      In that case we used a read mapping of the extent which forced the use of
      the read I/O completion handler instead of the write I/O completion
      handle. Hence the file size was not updated on I/O completion.
      
      SGI-PV: 965068
      SGI-Modid: xfs-linux-melb:xfs-kern:28657a
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarNathan Scott <nscott@aconex.com>
      Signed-off-by: default avatarTim Shimmin <tes@sgi.com>
      df3c7244
  10. 08 May, 2007 1 commit
    • Lachlan McIlroy's avatar
      [XFS] Fix to prevent the notorious 'NULL files' problem after a crash. · ba87ea69
      Lachlan McIlroy authored
      The problem that has been addressed is that of synchronising updates of
      the file size with writes that extend a file. Without the fix the update
      of a file's size, as a result of a write beyond eof, is independent of
      when the cached data is flushed to disk. Often the file size update would
      be written to the filesystem log before the data is flushed to disk. When
      a system crashes between these two events and the filesystem log is
      replayed on mount the file's size will be set but since the contents never
      made it to disk the file is full of holes. If some of the cached data was
      flushed to disk then it may just be a section of the file at the end that
      has holes.
      
      There are existing fixes to help alleviate this problem, particularly in
      the case where a file has been truncated, that force cached data to be
      flushed to disk when the file is closed. If the system crashes while the
      file(s) are still open then this flushing will never occur.
      
      The fix that we have implemented is to introduce a second file size,
      called the in-memory file size, that represents the current file size as
      viewed by the user. The existing file size, called the on-disk file size,
      is the one that get's written to the filesystem log and we only update it
      when it is safe to do so. When we write to a file beyond eof we only
      update the in- memory file size in the write operation. Later when the I/O
      operation, that flushes the cached data to disk completes, an I/O
      completion routine will update the on-disk file size. The on-disk file
      size will be updated to the maximum offset of the I/O or to the value of
      the in-memory file size if the I/O includes eof.
      
      SGI-PV: 958522
      SGI-Modid: xfs-linux-melb:xfs-kern:28322a
      Signed-off-by: default avatarLachlan McIlroy <lachlan@sgi.com>
      Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
      Signed-off-by: default avatarTim Shimmin <tes@sgi.com>
      ba87ea69
  11. 12 Feb, 2007 1 commit
  12. 10 Feb, 2007 2 commits
  13. 21 Dec, 2006 1 commit
    • David Chinner's avatar
      [PATCH] Fix XFS after clear_page_dirty() removal · 92132021
      David Chinner authored
      XFS appears to call clear_page_dirty to get the mapping tree dirty tag
      set correctly at the same time the page dirty flag is cleared.  I note
      that this can be done by set_page_writeback() if we clear the dirty flag
      on the page first when we are writing back the entire page.
      
      Hence it seems to me that the XFS call to clear_page_dirty() could
      easily be substituted by clear_page_dirty_for_io() followed by a call to
      set_page_writeback() to get the mapping tree tags set correctly after
      the page has been marked clean.
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      92132021
  14. 10 Dec, 2006 1 commit
    • Zach Brown's avatar
      [PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED · 8459d86a
      Zach Brown authored
      The only time it is safe to call aio_complete() is when the ->ki_retry
      function returns -EIOCBQUEUED to the AIO core.  direct_io_worker() has
      historically done this by relying on its caller to translate positive return
      codes into -EIOCBQUEUED for the aio case.  It did this by trying to keep
      conditionals in sync.  direct_io_worker() knew when finished_one_bio() was
      going to call aio_complete().  It would reverse the test and wait and free the
      dio in the cases it thought that finished_one_bio() wasn't going to.
      
      Not surprisingly, it ended up getting it wrong.  'ret' could be a negative
      errno from the submission path but it failed to communicate this to
      finished_one_bio().  direct_io_worker() would return < 0, it's callers
      wouldn't raise -EIOCBQUEUED, and aio_complete() would be called.  In the
      future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
      would be called for a second time which can manifest as an oops.
      
      The previous cleanups have whittled the sync and async completion paths down
      to the point where we can collapse them and clearly reassert the invariant
      that we must only call aio_complete() after returning -EIOCBQUEUED.
      direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
      drop the dio refcount and the aio bio completion path will only call
      aio_complete() when it is the last to drop the dio refcount.
      direct_io_worker() can ensure that it is the last to drop the reference count
      by waiting for bios to drain.  It does this for sync ops, of course, and for
      partial dio writes that must fall back to buffered and for aio ops that saw
      errors during submission.
      
      This means that operations that end up waiting, even if they were issued as
      aio ops, will not call aio_complete() from dio.  Instead we return the return
      code of the operation and let the aio core call aio_complete().  This is
      purposely done to fix a bug where AIO DIO file extensions would call
      aio_complete() before their callers have a chance to update i_size.
      
      Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
      no longer have to translate for it.  XFS needs to be careful not to free
      resources that will be used during AIO completion if -EIOCBQUEUED is returned.
       We maintain the previous behaviour of trying to write fs metadata for O_SYNC
      aio+dio writes.
      Signed-off-by: default avatarZach Brown <zach.brown@oracle.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Suparna Bhattacharya <suparna@in.ibm.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: <xfs-masters@oss.sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8459d86a
  15. 22 Nov, 2006 1 commit
  16. 28 Sep, 2006 2 commits
  17. 07 Sep, 2006 1 commit
  18. 28 Jun, 2006 1 commit
  19. 20 Jun, 2006 1 commit
  20. 09 Jun, 2006 4 commits
  21. 11 Apr, 2006 1 commit
  22. 29 Mar, 2006 1 commit
  23. 28 Mar, 2006 1 commit