1. 19 May, 2016 4 commits
  2. 18 May, 2016 5 commits
  3. 17 May, 2016 15 commits
  4. 13 May, 2016 5 commits
    • Jan Kara's avatar
      ext4: pre-zero allocated blocks for DAX IO · 12735f88
      Jan Kara authored
      Currently ext4 treats DAX IO the same way as direct IO. I.e., it
      allocates unwritten extents before IO is done and converts unwritten
      extents afterwards. However this way DAX IO can race with page fault to
      the same area:
      
      ext4_ext_direct_IO()				dax_fault()
        dax_io()
          get_block() - allocates unwritten extent
          copy_from_iter_pmem()
      						  get_block() - converts
      						    unwritten block to
      						    written and zeroes it
      						    out
        ext4_convert_unwritten_extents()
      
      So data written with DAX IO gets lost. Similarly dax_new_buf() called
      from dax_io() can overwrite data that has been already written to the
      block via mmap.
      
      Fix the problem by using pre-zeroed blocks for DAX IO the same way as we
      use them for DAX mmap. The downside of this solution is that every
      allocating write writes each block twice (once zeros, once data). Fixing
      the race with locking is possible as well however we would need to
      lock-out faults for the whole range written to by DAX IO. And that is
      not easy to do without locking-out faults for the whole file which seems
      too aggressive.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      12735f88
    • Jan Kara's avatar
      ext4: refactor direct IO code · 914f82a3
      Jan Kara authored
      Currently ext4 direct IO handling is split between ext4_ext_direct_IO()
      and ext4_ind_direct_IO(). However the extent based function calls into
      the indirect based one for some cases and for example it is not able to
      handle file extending. Previously it was not also properly handling
      retries in case of ENOSPC errors. With DAX things would get even more
      contrieved so just refactor the direct IO code and instead of indirect /
      extent split do the split to read vs writes.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      914f82a3
    • Jan Kara's avatar
      ext4: fix race in transient ENOSPC detection · dbc427ce
      Jan Kara authored
      When there are blocks to free in the running transaction, block
      allocator can return ENOSPC although the filesystem has some blocks to
      free. We use ext4_should_retry_alloc() to force commit of the current
      transaction and return whether anything was committed so that it makes
      sense to retry the allocation. However the transaction may get committed
      after block allocation fails but before we call
      ext4_should_retry_alloc(). So ext4_should_retry_alloc() returns false
      because there is nothing to commit and we wrongly return ENOSPC.
      
      Fix the race by unconditionally returning 1 from ext4_should_retry_alloc()
      when we tried to commit a transaction. This should not add any
      unnecessary retries since we had a transaction running a while ago when
      trying to allocate blocks and we want to retry the allocation once that
      transaction has committed anyway.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      dbc427ce
    • Jan Kara's avatar
      ext4: handle transient ENOSPC properly for DAX · 7cb476f8
      Jan Kara authored
      ext4_dax_get_blocks() was accidentally omitted fixing get blocks
      handlers to properly handle transient ENOSPC errors. Fix it now to use
      ext4_get_blocks_trans() helper which takes care of these errors.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      7cb476f8
    • Jan Kara's avatar
      dax: call get_blocks() with create == 1 for write faults to unwritten extents · aef39ab1
      Jan Kara authored
      Currently, __dax_fault() does not call get_blocks() callback with create
      argument set, when we got back unwritten extent from the initial
      get_blocks() call during a write fault. This is because originally
      filesystems were supposed to convert unwritten extents to written ones
      using complete_unwritten() callback. Later this was abandoned in favor of
      using pre-zeroed blocks however the condition whether get_blocks() needs
      to be called with create == 1 remained.
      
      Fix the condition so that filesystems are not forced to zero-out and
      convert unwritten extents when get_blocks() is called with create == 0
      (which introduces unnecessary overhead for read faults and can be
      problematic as the filesystem may possibly be read-only).
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      aef39ab1
  5. 06 May, 2016 3 commits
  6. 05 May, 2016 4 commits
    • Nicolai Stange's avatar
      ext4: silence UBSAN in ext4_mb_init() · 935244cd
      Nicolai Stange authored
      Currently, in ext4_mb_init(), there's a loop like the following:
      
        do {
          ...
          offset += 1 << (sb->s_blocksize_bits - i);
          i++;
        } while (i <= sb->s_blocksize_bits + 1);
      
      Note that the updated offset is used in the loop's next iteration only.
      
      However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
      the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
      and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15
        shift exponent 4294967295 is too large for 32-bit type 'int'
        [...]
        Call Trace:
         [<ffffffff818c4d25>] dump_stack+0xbc/0x117
         [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390
         [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0
         [<ffffffff814293c7>] ? create_cache+0x57/0x1f0
         [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0
         [<ffffffff821c2168>] ? mutex_lock+0x38/0x60
         [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50
         [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0
         [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0
         [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0
         [...]
      
      Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of offset is never used again.
      
      Silence UBSAN by introducing another variable, offset_incr, holding the
      next increment to apply to offset and adjust that one by right shifting it
      by one position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      935244cd
    • Nicolai Stange's avatar
      ext4: address UBSAN warning in mb_find_order_for_block() · b5cb316c
      Nicolai Stange authored
      Currently, in mb_find_order_for_block(), there's a loop like the following:
      
        while (order <= e4b->bd_blkbits + 1) {
          ...
          bb += 1 << (e4b->bd_blkbits - order);
        }
      
      Note that the updated bb is used in the loop's next iteration only.
      
      However, at the last iteration, that is at order == e4b->bd_blkbits + 1,
      the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports
      
        UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11
        shift exponent -1 is negative
        [...]
        Call Trace:
         [<ffffffff818c4d35>] dump_stack+0xbc/0x117
         [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169
         [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e
         [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
         [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
         [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590
         [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80
         [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240
         [...]
      
      Unless compilers start to do some fancy transformations (which at least
      GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
      such calculated value of bb is never used again.
      
      Silence UBSAN by introducing another variable, bb_incr, holding the next
      increment to apply to bb and adjust that one by right shifting it by one
      position per loop iteration.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNicolai Stange <nicstange@gmail.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      b5cb316c
    • Jan Kara's avatar
      ext4: fix oops on corrupted filesystem · 74177f55
      Jan Kara authored
      When filesystem is corrupted in the right way, it can happen
      ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we
      subsequently remove inode from the in-memory orphan list. However this
      deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we
      leave i_orphan list_head with a stale content. Later we can look at this
      content causing list corruption, oops, or other issues. The reported
      trace looked like:
      
      WARNING: CPU: 0 PID: 46 at lib/list_debug.c:53 __list_del_entry+0x6b/0x100()
      list_del corruption, 0000000061c1d6e0->next is LIST_POISON1
      0000000000100100)
      CPU: 0 PID: 46 Comm: ext4.exe Not tainted 4.1.0-rc4+ #250
      Stack:
       60462947 62219960 602ede24 62219960
       602ede24 603ca293 622198f0 602f02eb
       62219950 6002c12c 62219900 601b4d6b
      Call Trace:
       [<6005769c>] ? vprintk_emit+0x2dc/0x5c0
       [<602ede24>] ? printk+0x0/0x94
       [<600190bc>] show_stack+0xdc/0x1a0
       [<602ede24>] ? printk+0x0/0x94
       [<602ede24>] ? printk+0x0/0x94
       [<602f02eb>] dump_stack+0x2a/0x2c
       [<6002c12c>] warn_slowpath_common+0x9c/0xf0
       [<601b4d6b>] ? __list_del_entry+0x6b/0x100
       [<6002c254>] warn_slowpath_fmt+0x94/0xa0
       [<602f4d09>] ? __mutex_lock_slowpath+0x239/0x3a0
       [<6002c1c0>] ? warn_slowpath_fmt+0x0/0xa0
       [<60023ebf>] ? set_signals+0x3f/0x50
       [<600a205a>] ? kmem_cache_free+0x10a/0x180
       [<602f4e88>] ? mutex_lock+0x18/0x30
       [<601b4d6b>] __list_del_entry+0x6b/0x100
       [<601177ec>] ext4_orphan_del+0x22c/0x2f0
       [<6012f27c>] ? __ext4_journal_start_sb+0x2c/0xa0
       [<6010b973>] ? ext4_truncate+0x383/0x390
       [<6010bc8b>] ext4_write_begin+0x30b/0x4b0
       [<6001bb50>] ? copy_from_user+0x0/0xb0
       [<601aa840>] ? iov_iter_fault_in_readable+0xa0/0xc0
       [<60072c4f>] generic_perform_write+0xaf/0x1e0
       [<600c4166>] ? file_update_time+0x46/0x110
       [<60072f0f>] __generic_file_write_iter+0x18f/0x1b0
       [<6010030f>] ext4_file_write_iter+0x15f/0x470
       [<60094e10>] ? unlink_file_vma+0x0/0x70
       [<6009b180>] ? unlink_anon_vmas+0x0/0x260
       [<6008f169>] ? free_pgtables+0xb9/0x100
       [<600a6030>] __vfs_write+0xb0/0x130
       [<600a61d5>] vfs_write+0xa5/0x170
       [<600a63d6>] SyS_write+0x56/0xe0
       [<6029fcb0>] ? __libc_waitpid+0x0/0xa0
       [<6001b698>] handle_syscall+0x68/0x90
       [<6002633d>] userspace+0x4fd/0x600
       [<6002274f>] ? save_registers+0x1f/0x40
       [<60028bd7>] ? arch_prctl+0x177/0x1b0
       [<60017bd5>] fork_handler+0x85/0x90
      
      Fix the problem by using list_del_init() as we always should with
      i_orphan list.
      
      CC: stable@vger.kernel.org
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      74177f55
    • Seth Forshee's avatar
      ext4: fix check of dqget() return value in ext4_ioctl_setproject() · ff0bc084
      Seth Forshee authored
      A failed call to dqget() returns an ERR_PTR() and not null. Fix
      the check in ext4_ioctl_setproject() to handle this correctly.
      
      Fixes: 9b7365fc ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support")
      Cc: stable@vger.kernel.org # v4.5
      Signed-off-by: default avatarSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      ff0bc084
  7. 30 Apr, 2016 2 commits
    • Theodore Ts'o's avatar
      ext4: clean up error handling when orphan list is corrupted · 7827a7f6
      Theodore Ts'o authored
      Instead of just printing warning messages, if the orphan list is
      corrupted, declare the file system is corrupted.  If there are any
      reserved inodes in the orphaned inode list, declare the file system
      corrupted and stop right away to avoid doing more potential damage to
      the file system.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      7827a7f6
    • Theodore Ts'o's avatar
      ext4: fix hang when processing corrupted orphaned inode list · c9eb13a9
      Theodore Ts'o authored
      If the orphaned inode list contains inode #5, ext4_iget() returns a
      bad inode (since the bootloader inode should never be referenced
      directly).  Because of the bad inode, we end up processing the inode
      repeatedly and this hangs the machine.
      
      This can be reproduced via:
      
         mke2fs -t ext4 /tmp/foo.img 100
         debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
         mount -o loop /tmp/foo.img /mnt
      
      (But don't do this if you are using an unpatched kernel if you care
      about the system staying functional.  :-)
      
      This bug was found by the port of American Fuzzy Lop into the kernel
      to find file system problems[1].  (Since it *only* happens if inode #5
      shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
      surprising that AFL needed two hours before it found it.)
      
      [1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf
      
      Cc: stable@vger.kernel.org
      Reported by: Vegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      c9eb13a9
  8. 27 Apr, 2016 1 commit
  9. 26 Apr, 2016 1 commit