1. 13 Jul, 2009 4 commits
    • dingdinghua's avatar
      jbd2: fix race between write_metadata_buffer and get_write_access · 96577c43
      dingdinghua authored
      The function jbd2_journal_write_metadata_buffer() calls
      jbd_unlock_bh_state(bh_in) too early; this could potentially allow
      another thread to call get_write_access on the buffer head, modify the
      data, and dirty it, and allowing the wrong data to be written into the
      journal.  Fortunately, if we lose this race, the only time this will
      actually cause filesystem corruption is if there is a system crash or
      other unclean shutdown of the system before the next commit can take
      place.
      Signed-off-by: default avatardingdinghua <dingdinghua85@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      96577c43
    • Theodore Ts'o's avatar
      ext4: Fix ext4_mb_initialize_context() to initialize all fields · 833576b3
      Theodore Ts'o authored
      Pavel Roskin pointed out that kmemcheck indicated that
      ext4_mb_store_history() was accessing uninitialized values of
      ac->ac_tail and ac->ac_buddy leading to garbage in the mballoc
      history.  Fix this by initializing the entire structure to all zeros
      first.
      
      Also, two fields were getting doubly initialized by the caller of
      ext4_mb_initialize_context, so remove them for efficiency's sake.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      833576b3
    • Peng Tao's avatar
      ext4: fix null handler of ioctls in no journal mode · ac046f1d
      Peng Tao authored
      The EXT4_IOC_GROUP_ADD and EXT4_IOC_GROUP_EXTEND ioctls should not
      flush the journal in no_journal mode.  Otherwise, running resize2fs on
      a mounted no_journal partition triggers the following error messages:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000014
      IP: [<c039d282>] _spin_lock+0x8/0x19
      *pde = 00000000 
      Oops: 0002 [#1] SMP
      Signed-off-by: default avatarPeng Tao <bergwolf@gmail.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ac046f1d
    • Curt Wohlgemuth's avatar
      ext4: Fix buffer head reference leak in no-journal mode · e6b5d301
      Curt Wohlgemuth authored
      We found a problem with buffer head reference leaks when using an ext4
      partition without a journal.  In particular, calls to ext4_forget() would
      not to a brelse() on the input buffer head, which will cause pages they
      belong to to not be reclaimable.
      
      Further investigation showed that all places where ext4_journal_forget() and
      ext4_journal_revoke() are called are subject to the same problem.  The patch
      below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
      release of the buffer head when the journal handle isn't valid.
      Signed-off-by: default avatarCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      e6b5d301
  2. 14 Jun, 2009 3 commits
  3. 04 Jun, 2009 1 commit
  4. 06 Jul, 2009 1 commit
  5. 08 Jul, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: fix no journal corruption with locale-gen · 5adfee9c
      Theodore Ts'o authored
      If there is no journal, ext4_should_writeback_data() should return
      TRUE.  This will fix ext4_set_aops() to set ext4_da_ops in the case of
      delayed allocation; otherwise ext4_journaled_aops gets used by
      default, which doesn't handle delayed allocation properly.
      
      The advantage of using ext4_should_writeback_data() approach is that
      it should handle nobh better as well.
      
      Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
      Kumar for suggesting this approach.
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      5adfee9c
  6. 06 Jul, 2009 1 commit
  7. 13 Jul, 2009 2 commits
    • Jan Kara's avatar
      ext4: Fix truncation of symlinks after failed write · ffacfa7a
      Jan Kara authored
      Contents of long symlinks is written via standard write methods. So
      when the write fails, we add inode to orphan list. But symlinks don't
      have .truncate method defined so nobody properly removes them from the
      on disk orphan list.
      
      Fix this by calling ext4_truncate() directly instead of calling
      vmtruncate() (which is saner anyway since we don't need anything
      vmtruncate() does except from calling .truncate in these paths).  We
      also add inode to orphan list only if ext4_can_truncate() is true
      (currently, it can be false for symlinks when there are no blocks
      allocated) - otherwise orphan list processing will complain and
      ext4_truncate() will not remove inode from on-disk orphan list.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      ffacfa7a
    • Jan Kara's avatar
      jbd2: Fix a race between checkpointing code and journal_get_write_access() · f91d1d04
      Jan Kara authored
      The following race can happen:
      
       CPU1                          CPU2
                                     checkpointing code checks the buffer, adds
                                       it to an array for writeback
       do_get_write_access()
       ...
       lock_buffer()
       unlock_buffer()
                                     flush_batch() submits the buffer for IO
       __jbd2_journal_file_buffer()
      
      So a buffer under writeout is returned from
      do_get_write_access(). Since the filesystem code relies on the fact
      that journaled buffers cannot be written out, it does not take the
      buffer lock and so it can modify buffer while it is under
      writeout. That can lead to a filesystem corruption if we crash at the
      right moment.
      
      We fix the problem by clearing the buffer dirty bit under buffer_lock
      even if the buffer is on BJ_None list. Actually, we clear the dirty
      bit regardless the list the buffer is in and warn about the fact if
      the buffer is already journalled.
      
      Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>.
      Reported-by: default avatardingdinghua <dingdinghua85@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      f91d1d04
  8. 06 Jul, 2009 1 commit
  9. 13 Jul, 2009 1 commit
    • Eric Sandeen's avatar
      ext4: naturally align struct ext4_allocation_request · 726447d8
      Eric Sandeen authored
      As Ted noted, the ext4_allocation_request isn't well aligned.  Looking
      at it with pahole we're wasting space on 64-bit arches:
      
      struct ext4_allocation_request {
              struct inode *             inode;              /*     0     8 */
              ext4_lblk_t                logical;            /*     8     4 */
      
              /* XXX 4 bytes hole, try to pack */
      
              ext4_fsblk_t               goal;               /*    16     8 */
              ext4_lblk_t                lleft;              /*    24     4 */
      
              /* XXX 4 bytes hole, try to pack */
      
              ext4_fsblk_t               pleft;              /*    32     8 */
              ext4_lblk_t                lright;             /*    40     4 */
      
              /* XXX 4 bytes hole, try to pack */
      
              ext4_fsblk_t               pright;             /*    48     8 */
              unsigned int               len;                /*    56     4 */
              unsigned int               flags;              /*    60     4 */
              /* --- cacheline 1 boundary (64 bytes) --- */
      
              /* size: 64, cachelines: 1, members: 9 */
              /* sum members: 52, holes: 3, sum holes: 12 */
      };
      
      Grouping 32-bit members together closes these holes and shrinks the
      structure by 12 bytes. which is important since ext4 can get on the
      hairy edge of stack overruns.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      726447d8
  10. 06 Jul, 2009 2 commits
    • Eric Sandeen's avatar
      ext4: mark several more functions in mballoc.c as noinline · 089ceecc
      Eric Sandeen authored
      Ted noticed a stack-deep callchain through
      writepages->ext4_mb_regular_allocator->ext4_mb_init_cache->submit_bh ...
      
      With all the static functions in mballoc.c, gcc helpfully
      inlines for us, and we get something like this:
      
      ext4_mb_regular_allocator	(232 bytes stack)
      	ext4_mb_init_cache	(232 bytes stack)
      		submit_bh	(starts 464 deeper)
      
      the 2 ext4 functions here get several others inlined; by telling
      gcc not to inline them, we can save stack space for when we
      head off into submit_bh land and associated block layer callchains.
      The following noinlined functions are only called once, so this
      won't impact any other callchains:
      
      ext4_mb_regular_allocator 			(104) (was 232)
      	ext4_mb_find_by_goal			 (56) (noinlined)
      	ext4_mb_init_group			 (24) (noinlined)
      		ext4_mb_init_cache		(136) (was 232)
      			ext4_mb_generate_buddy	 (88) (noinlined)
      			ext4_mb_generate_from_pa (40) (noinlined)
      			submit_bh
      	ext4_mb_simple_scan_group		 (24) (noinlined)
      	ext4_mb_scan_aligned			 (56) (noinlined)
      	ext4_mb_complex_scan_group		 (40) (noinlined)
      	ext4_mb_try_best_found			 (24) (noinlined)
      
      now when we head off into submit_bh() we're only 264 bytes deeper
      in stack than when we entered ext4_mb_regular_allocator()
      (vs. 464 bytes before).  Every 200 bytes helps.  :)
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      089ceecc
    • Theodore Ts'o's avatar
      ext4: Fix potential reclaim deadlock when truncating partial block · f4a01017
      Theodore Ts'o authored
      The ext4_block_truncate_page() function previously called
      grab_cache_page(), which called find_or_create_page() with the
      __GFP_FS flag potentially set.  This could cause a deadlock if the
      system is low on memory and it attempts a memory reclaim, which could
      potentially call back into ext4.  So we need to call
      find_or_create_page() directly, and remove the __GFP_FP flag to avoid
      this potential deadlock.
      
      Thanks to Roland Dreier for reporting a lockdep warning which showed
      this problem.
      
      [20786.363249] =================================
      [20786.363257] [ INFO: inconsistent lock state ]
      [20786.363265] 2.6.31-2-generic #14~rbd4gitd960eea9
      [20786.363270] ---------------------------------
      [20786.363276] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
      [20786.363285] http/8397 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [20786.363291]  (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150
      [20786.363314] {IN-RECLAIM_FS-W} state was registered at:
      [20786.363320]   [<ffffffff8108bef6>] mark_irqflags+0xc6/0x1a0
      [20786.363334]   [<ffffffff8108d347>] __lock_acquire+0x287/0x430
      [20786.363345]   [<ffffffff8108d595>] lock_acquire+0xa5/0x150
      [20786.363355]   [<ffffffff812008da>] jbd2_journal_start+0xfa/0x150
      [20786.363365]   [<ffffffff811d98a8>] ext4_journal_start_sb+0x58/0x90
      [20786.363377]   [<ffffffff811cce85>] ext4_delete_inode+0xc5/0x2c0
      [20786.363389]   [<ffffffff81146fa3>] generic_delete_inode+0xd3/0x1a0
      [20786.363401]   [<ffffffff81147095>] generic_drop_inode+0x25/0x30
      [20786.363411]   [<ffffffff81145ce2>] iput+0x62/0x70
      [20786.363420]   [<ffffffff81142878>] dentry_iput+0x98/0x110
      [20786.363429]   [<ffffffff81142a00>] d_kill+0x50/0x80
      [20786.363438]   [<ffffffff811444c5>] dput+0x95/0x180
      [20786.363447]   [<ffffffff8120de4b>] ecryptfs_d_release+0x2b/0x70
      [20786.363459]   [<ffffffff81142978>] d_free+0x28/0x60
      [20786.363468]   [<ffffffff81142a18>] d_kill+0x68/0x80
      [20786.363477]   [<ffffffff81142ad3>] prune_one_dentry+0xa3/0xc0
      [20786.363487]   [<ffffffff81142d61>] __shrink_dcache_sb+0x271/0x290
      [20786.363497]   [<ffffffff81142e89>] prune_dcache+0x109/0x1b0
      [20786.363506]   [<ffffffff81142f6f>] shrink_dcache_memory+0x3f/0x50
      [20786.363516]   [<ffffffff810f6d3d>] shrink_slab+0x12d/0x190
      [20786.363527]   [<ffffffff810f97d7>] balance_pgdat+0x4d7/0x640
      [20786.363537]   [<ffffffff810f9a57>] kswapd+0x117/0x170
      [20786.363546]   [<ffffffff810773ce>] kthread+0x9e/0xb0
      [20786.363558]   [<ffffffff8101430a>] child_rip+0xa/0x20
      [20786.363569]   [<ffffffffffffffff>] 0xffffffffffffffff
      [20786.363598] irq event stamp: 15997
      [20786.363603] hardirqs last  enabled at (15997): [<ffffffff81125f9d>] kmem_cache_alloc+0xfd/0x1a0
      [20786.363617] hardirqs last disabled at (15996): [<ffffffff81125f01>] kmem_cache_alloc+0x61/0x1a0
      [20786.363628] softirqs last  enabled at (15966): [<ffffffff810631ea>] __do_softirq+0x14a/0x220
      [20786.363641] softirqs last disabled at (15861): [<ffffffff8101440c>] call_softirq+0x1c/0x30
      [20786.363651] 
      [20786.363653] other info that might help us debug this:
      [20786.363660] 3 locks held by http/8397:
      [20786.363665]  #0:  (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<ffffffff8112ed24>] do_truncate+0x64/0x90
      [20786.363685]  #1:  (&sb->s_type->i_alloc_sem_key#5){+++++.}, at: [<ffffffff81147f90>] notify_change+0x250/0x350
      [20786.363707]  #2:  (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150
      [20786.363724] 
      [20786.363726] stack backtrace:
      [20786.363734] Pid: 8397, comm: http Tainted: G         C 2.6.31-2-generic #14~rbd4gitd960eea9
      [20786.363741] Call Trace:
      [20786.363752]  [<ffffffff8108ad7c>] print_usage_bug+0x18c/0x1a0
      [20786.363763]  [<ffffffff8108b0c0>] ? check_usage_backwards+0x0/0xb0
      [20786.363773]  [<ffffffff8108bad2>] mark_lock_irq+0xf2/0x280
      [20786.363783]  [<ffffffff8108bd97>] mark_lock+0x137/0x1d0
      [20786.363793]  [<ffffffff8108c03c>] mark_held_locks+0x6c/0xa0
      [20786.363803]  [<ffffffff8108c11f>] lockdep_trace_alloc+0xaf/0xe0
      [20786.363813]  [<ffffffff810efbac>] __alloc_pages_nodemask+0x7c/0x180
      [20786.363824]  [<ffffffff810e9411>] ? find_get_page+0x91/0xf0
      [20786.363835]  [<ffffffff8111d3b7>] alloc_pages_current+0x87/0xd0
      [20786.363845]  [<ffffffff810e9827>] __page_cache_alloc+0x67/0x70
      [20786.363856]  [<ffffffff810eb7df>] find_or_create_page+0x4f/0xb0
      [20786.363867]  [<ffffffff811cb3be>] ext4_block_truncate_page+0x3e/0x460
      [20786.363876]  [<ffffffff812008da>] ? jbd2_journal_start+0xfa/0x150
      [20786.363885]  [<ffffffff812008bb>] ? jbd2_journal_start+0xdb/0x150
      [20786.363895]  [<ffffffff811c6415>] ? ext4_meta_trans_blocks+0x75/0xf0
      [20786.363905]  [<ffffffff811e8d8b>] ext4_ext_truncate+0x1bb/0x1e0
      [20786.363916]  [<ffffffff811072c5>] ? unmap_mapping_range+0x75/0x290
      [20786.363926]  [<ffffffff811ccc28>] ext4_truncate+0x498/0x630
      [20786.363938]  [<ffffffff8129b4ce>] ? _raw_spin_unlock+0x5e/0xb0
      [20786.363947]  [<ffffffff81107306>] ? unmap_mapping_range+0xb6/0x290
      [20786.363957]  [<ffffffff8108c3ad>] ? trace_hardirqs_on+0xd/0x10
      [20786.363966]  [<ffffffff811ffe58>] ? jbd2_journal_stop+0x1f8/0x2e0
      [20786.363976]  [<ffffffff81107690>] vmtruncate+0xb0/0x110
      [20786.363986]  [<ffffffff81147c05>] inode_setattr+0x35/0x170
      [20786.363995]  [<ffffffff811c9906>] ext4_setattr+0x186/0x370
      [20786.364005]  [<ffffffff81147eab>] notify_change+0x16b/0x350
      [20786.364014]  [<ffffffff8112ed30>] do_truncate+0x70/0x90
      [20786.364021]  [<ffffffff8112f48b>] T.657+0xeb/0x110
      [20786.364021]  [<ffffffff8112f4be>] sys_ftruncate+0xe/0x10
      [20786.364021]  [<ffffffff81013132>] system_call_fastpath+0x16/0x1b
      Reported-by: default avatarRoland Dreier <roland@digitalvampire.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      f4a01017
  11. 21 Jun, 2009 2 commits
  12. 04 Jul, 2009 9 commits
  13. 03 Jul, 2009 12 commits