- 03 Aug, 2015 2 commits
-
-
Lukas Czerner authored
commit 9705acd6 upstream. On delalloc enabled file system on invalidatepage operation in ext4_da_page_release_reservation() we want to clear the delayed buffer and remove the extent covering the delayed buffer from the extent status tree. However currently there is a bug where on the systems with page size > block size we will always remove extents from the start of the page regardless where the actual delayed buffers are positioned in the page. This leads to the errors like this: EXT4-fs warning (device loop0): ext4_da_release_space:1225: ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data blocks This however can cause data loss on writeback time if the file system is in ENOSPC condition because we're releasing reservation for someones else delayed buffer. Fix this by only removing extents that corresponds to the part of the page we want to invalidate. This problem is reproducible by the following fio receipt (however I was only able to reproduce it with fio-2.1 or older. [global] bs=8k iodepth=1024 iodepth_batch=60 randrepeat=1 size=1m directory=/mnt/test numjobs=20 [job1] ioengine=sync bs=1k direct=1 rw=randread filename=file1:file2 [job2] ioengine=libaio rw=randwrite direct=1 filename=file1:file2 [job3] bs=1k ioengine=posixaio rw=randwrite direct=1 filename=file1:file2 [job5] bs=1k ioengine=sync rw=randread filename=file1:file2 [job7] ioengine=libaio rw=randwrite filename=file1:file2 [job8] ioengine=posixaio rw=randwrite filename=file1:file2 [job10] ioengine=mmap rw=randwrite bs=1k filename=file1:file2 [job11] ioengine=mmap rw=randwrite direct=1 filename=file1:file2 Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Theodore Ts'o authored
commit bdf96838 upstream. The commit cf108bca : "ext4: Invert the locking order of page_lock and transaction start" caused __ext4_journalled_writepage() to drop the page lock before the page was written back, as part of changing the locking order to jbd2_journal_start -> page_lock. However, this introduced a potential race if there was a truncate racing with the data=journalled writeback mode. Fix this by grabbing the page lock after starting the journal handle, and then checking to see if page had gotten truncated out from under us. This fixes a number of different warnings or BUG_ON's when running xfstests generic/086 in data=journalled mode, including: jbd2_journal_dirty_metadata: vdc-8: bad jh for block 115643: transaction (ee3fe7 c0, 164), jh->b_transaction ( (null), 0), jh->b_next_transaction ( (null), 0), jlist 0 - and - kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200! ... Call Trace: [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117 [<c02b2de5>] __ext4_journalled_invalidatepage+0x10f/0x117 [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117 [<c027d883>] ? lock_buffer+0x36/0x36 [<c02b2dfa>] ext4_journalled_invalidatepage+0xd/0x22 [<c0229139>] do_invalidatepage+0x22/0x26 [<c0229198>] truncate_inode_page+0x5b/0x85 [<c022934b>] truncate_inode_pages_range+0x156/0x38c [<c0229592>] truncate_inode_pages+0x11/0x15 [<c022962d>] truncate_pagecache+0x55/0x71 [<c02b913b>] ext4_setattr+0x4a9/0x560 [<c01ca542>] ? current_kernel_time+0x10/0x44 [<c026c4d8>] notify_change+0x1c7/0x2be [<c0256a00>] do_truncate+0x65/0x85 [<c0226f31>] ? file_ra_state_init+0x12/0x29 - and - WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396 irty_metadata+0x14a/0x1ae() ... Call Trace: [<c01b879f>] ? console_unlock+0x3a1/0x3ce [<c082cbb4>] dump_stack+0x48/0x60 [<c0178b65>] warn_slowpath_common+0x89/0xa0 [<c02ef2cf>] ? jbd2_journal_dirty_metadata+0x14a/0x1ae [<c0178bef>] warn_slowpath_null+0x14/0x18 [<c02ef2cf>] jbd2_journal_dirty_metadata+0x14a/0x1ae [<c02d8615>] __ext4_handle_dirty_metadata+0xd4/0x19d [<c02b2f44>] write_end_fn+0x40/0x53 [<c02b4a16>] ext4_walk_page_buffers+0x4e/0x6a [<c02b59e7>] ext4_writepage+0x354/0x3b8 [<c02b2f04>] ? mpage_release_unused_pages+0xd4/0xd4 [<c02b1b21>] ? wait_on_buffer+0x2c/0x2c [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8 [<c02b5a5b>] __writepage+0x10/0x2e [<c0225956>] write_cache_pages+0x22d/0x32c [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8 [<c02b6ee8>] ext4_writepages+0x102/0x607 [<c019adfe>] ? sched_clock_local+0x10/0x10e [<c01a8a7c>] ? __lock_is_held+0x2e/0x44 [<c01a8ad5>] ? lock_is_held+0x43/0x51 [<c0226dff>] do_writepages+0x1c/0x29 [<c0276bed>] __writeback_single_inode+0xc3/0x545 [<c0277c07>] writeback_sb_inodes+0x21f/0x36d ... Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 13 May, 2015 1 commit
-
-
Lukas Czerner authored
commit d2dc317d upstream. Currently it is possible to lose whole file system block worth of data when we hit the specific interaction with unwritten and delayed extents in status extent tree. The problem is that when we insert delayed extent into extent status tree the only way to get rid of it is when we write out delayed buffer. However there is a limitation in the extent status tree implementation so that when inserting unwritten extent should there be even a single delayed block the whole unwritten extent would be marked as delayed. At this point, there is no way to get rid of the delayed extents, because there are no delayed buffers to write out. So when a we write into said unwritten extent we will convert it to written, but it still remains delayed. When we try to write into that block later ext4_da_map_blocks() will set the buffer new and delayed and map it to invalid block which causes the rest of the block to be zeroed loosing already written data. For now we can fix this by simply not allowing to set delayed status on written extent in the extent status tree. Also add WARN_ON() to make sure that we notice if this happens in the future. This problem can be easily reproduced by running the following xfs_io. xfs_io -f -c "pwrite -S 0xaa 4096 2048" \ -c "falloc 0 131072" \ -c "pwrite -S 0xbb 65536 2048" \ -c "fsync" /mnt/test/fff echo 3 > /proc/sys/vm/drop_caches xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff This can be theoretically also reproduced by at random by running fsx, but it's not very reliable, though on machines with bigger page size (like ppc) this can be seen more often (especially xfstest generic/127) Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 14 Nov, 2014 4 commits
-
-
Dmitry Monakhov authored
commit 9aa5d32b upstream. Besides the fact that this replacement improves code readability it also protects from errors caused direct EXT4_S(sb)->s_es manipulation which may result attempt to use uninitialized csum machinery. #Testcase_BEGIN IMG=/dev/ram0 MNT=/mnt mkfs.ext4 $IMG mount $IMG $MNT #Enable feature directly on disk, on mounted fs tune2fs -O metadata_csum $IMG # Provoke metadata update, likey result in OOPS touch $MNT/test umount $MNT #Testcase_END # Replacement script @@ expression E; @@ - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) + ext4_has_metadata_csum(E) https://bugzilla.kernel.org/show_bug.cgi?id=82201 Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Sandeen authored
commit 0ff8947f upstream. Delalloc write journal reservations only reserve 1 credit, to update the inode if necessary. However, it may happen once in a filesystem's lifetime that a file will cross the 2G threshold, and require the LARGE_FILE feature to be set in the superblock as well, if it was not set already. This overruns the transaction reservation, and can be demonstrated simply on any ext4 filesystem without the LARGE_FILE feature already set: dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \ conv=notrunc of=testfile sync dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \ conv=notrunc of=testfile leads to: EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28 EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28 EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28 Adjust the number of credits based on whether the flag is already set, and whether the current write may extend past the LARGE_FILE limit. Signed-off-by:
Eric Sandeen <sandeen@redhat.com> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Reviewed-by:
Andreas Dilger <adilger@dilger.ca> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Theodore Ts'o authored
commit f4bb2981 upstream. If there is a corrupted file system which has directory entries that point at reserved, metadata inodes, prohibit them from being used by treating them the same way we treat Boot Loader inodes --- that is, mark them to be bad inodes. This prohibits them from being opened, deleted, or modified via chmod, chown, utimes, etc. In particular, this prevents a corrupted file system which has a directory entry which points at the journal inode from being deleted and its blocks released, after which point Much Hilarity Ensues. Reported-by:
Sami Liedes <sami.liedes@iki.fi> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jan Kara authored
commit d6320cbf upstream. Use truncate_isize_extended() when hole is being created in a file so that ->page_mkwrite() will get called for the partial tail page if it is mmaped (see the first patch in the series for details). Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 05 Sep, 2014 1 commit
-
-
Dmitry Monakhov authored
commit 6603120e upstream. In case of delalloc block i_disksize may be less than i_size. So we have to update i_disksize each time we allocated and submitted some blocks beyond i_disksize. We weren't doing this on the error paths, so fix this. testcase: xfstest generic/019 Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 01 Jul, 2014 1 commit
-
-
Namjae Jeon authored
commit 1c8349a1 upstream. When we perform a data integrity sync we tag all the dirty pages with PAGECACHE_TAG_TOWRITE at start of ext4_da_writepages. Later we check for this tag in write_cache_pages_da and creates a struct mpage_da_data containing contiguously indexed pages tagged with this tag and sync these pages with a call to mpage_da_map_and_submit. This process is done in while loop until all the PAGECACHE_TAG_TOWRITE pages are synced. We also do journal start and stop in each iteration. journal_stop could initiate journal commit which would call ext4_writepage which in turn will call ext4_bio_write_page even for delayed OR unwritten buffers. When ext4_bio_write_page is called for such buffers, even though it does not sync them but it clears the PAGECACHE_TAG_TOWRITE of the corresponding page and hence these pages are also not synced by the currently running data integrity sync. We will end up with dirty pages although sync is completed. This could cause a potential data loss when the sync call is followed by a truncate_pagecache call, which is exactly the case in collapse_range. (It will cause generic/127 failure in xfstests) To avoid this issue, we can use set_page_writeback_keepwrite instead of set_page_writeback, which doesn't clear TOWRITE tag. Signed-off-by:
Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by:
Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 May, 2014 2 commits
-
-
Theodore Ts'o authored
commit 622cad13 upstream. The function ext4_update_i_disksize() is used in only one place, in the function mpage_map_and_submit_extent(). Move its code to simplify the code paths, and also move the call to ext4_mark_inode_dirty() into the i_data_sem's critical region, to be consistent with all of the other places where we update i_disksize. That way, we also keep the raw_inode's i_disksize protected, to avoid the following race: CPU #1 CPU #2 down_write(&i_data_sem) Modify i_disk_size up_write(&i_data_sem) down_write(&i_data_sem) Modify i_disk_size Copy i_disk_size to on-disk inode up_write(&i_data_sem) Copy i_disk_size to on-disk inode Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kazuya Mio authored
commit 4adb6ab3 upstream. When we try to get 2^32-1 block of the file which has the extent (ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON in ext4_ext_put_gap_in_cache(). To avoid the problem, ext4_map_blocks() needs to check the file logical block number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot handle 2^32-1 because the maximum file logical block number is 2^32-2. Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid. So ext4_map_blocks() should also return the same errno. Signed-off-by:
Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 31 Mar, 2014 1 commit
-
-
Theodore Ts'o authored
Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by:
John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 26 Jan, 2014 1 commit
-
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 07 Jan, 2014 1 commit
-
-
Theodore Ts'o authored
This is harmless, since ext4_walk_page_buffers only passes the handle onto the callback function, and in this call site the function in question, bput_one(), doesn't actually use the handle. But there's no point passing in an invalid handle, and it creates a Coverity warning, so let's just clean it up. Addresses-Coverity-Id: #1091168 Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 06 Jan, 2014 2 commits
-
-
Yongqiang Yang authored
Can be reproduced by xfstests 62 with bigalloc and 128bit size inode. Signed-off-by:
Yongqiang Yang <yangyongqiang01@baidu.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Carlos Maiolino <cmaiolino@redhat.com>
-
Zheng Liu authored
After applied this commit (d23142c6 ), ext4 has supported punch hole for a file system with bigalloc feature. But we forgot to enable it. This commit fixes it. Cc: Lukas Czerner <lczerner@redhat.com> Signed-off-by:
Zheng Liu <wenqing.lz@taobao.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 18 Dec, 2013 1 commit
-
-
Jan Kara authored
Akira-san has been reporting rare deadlocks of his machine when running xfstests test 269 on ext4 filesystem. The problem turned out to be in ext4_da_reserve_metadata() and ext4_da_reserve_space() which called ext4_should_retry_alloc() while holding i_data_sem. Since ext4_should_retry_alloc() can force a transaction commit, this is a lock ordering violation and leads to deadlocks. Fix the problem by just removing the retry loops. These functions should just report ENOSPC to the caller (e.g. ext4_da_write_begin()) and that function must take care of retrying after dropping all necessary locks. Reported-and-tested-by:
Akira Fujita <a-fujita@rs.jp.nec.com> Reviewed-by:
Zheng Liu <wenqing.lz@taobao.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 05 Dec, 2013 1 commit
-
-
Christoph Hellwig authored
Currently notify_change directly updates i_version for size updates, which not only is counter to how all other fields are updated through struct iattr, but also breaks XFS, which need inode updates to happen under its own lock, and synchronized to the structure that gets written to the log. Remove the update in the common code, and it to btrfs and ext4, XFS already does a proper updaste internally and currently gets a double update with the existing code. IMHO this is 3.13 and -stable material and should go in through the XFS tree. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Andreas Dilger <adilger@dilger.ca> Acked-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Dave Chinner <dchinner@redhat.com> Signed-off-by:
Chris Mason <clm@fb.com> Signed-off-by:
Ben Myers <bpm@sgi.com>
-
- 12 Nov, 2013 1 commit
-
-
Andreas Dilger authored
Return a non-zero st_blocks to userspace for statfs() and friends. Some versions of tar will assume that files with st_blocks == 0 do not contain any data and will skip reading them entirely. Signed-off-by:
Andreas Dilger <andreas.dilger@intel.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 30 Oct, 2013 1 commit
-
-
Ming Lei authored
Pair the two trace events to make troubeshooting writepages easier, and it should be more convinient to write a simple script to parse the traces. Cc: linux-ext4@vger.kernel.org Cc: Jan Kara <jack@suse.cz> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 17 Oct, 2013 1 commit
-
-
Ming Lei authored
Commit 4e7ea81d (ext4: restructure writeback path) introduces another performance regression on random write: - one more page may be added to ext4 extent in mpage_prepare_extent_to_map, and will be submitted for I/O so nr_to_write will become -1 before 'done' is set - the worse thing is that dirty pages may still be retrieved from page cache after nr_to_write becomes negative, so lots of small chunks can be submitted to block device when page writeback is catching up with write path, and performance is hurted. On one arm A15 board with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB, SATA controller: 3.0Gbps), this patch can improve below test's result from 157MB/sec to 174MB/sec(>10%): dd if=/dev/zero of=./z.img bs=8K count=512K The above test is actually prototype of block write in bonnie++ utility. This patch makes sure no more pages than nr_to_write can be added to extent for mapping, so that nr_to_write won't become negative. Cc: linux-ext4@vger.kernel.org Acked-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Ming Lei <ming.lei@canonical.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 16 Oct, 2013 1 commit
-
-
Jan Kara authored
Document give_up_on_write argument of mpage_map_and_submit_extent(). Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 16 Sep, 2013 1 commit
-
-
Jan Kara authored
The Linux Kernel Performance project guys have reported that commit 4e7ea81d introduces a performance regression for the following fio workload: [global] direct=0 ioengine=mmap size=1500M bs=4k pre_read=1 numjobs=1 overwrite=1 loops=5 runtime=300 group_reporting invalidate=0 directory=/mnt/ file_service_type=random:36 file_service_type=random:36 [job0] startdelay=0 rw=randrw filename=data0/f1:data0/f2 [job1] startdelay=0 rw=randrw filename=data0/f2:data0/f1 ... [job7] startdelay=0 rw=randrw filename=data0/f2:data0/f1 The culprit of the problem is that after the commit ext4_writepages() are more aggressive in writing back pages. Thus we have less consecutive dirty pages resulting in more seeking. This increased aggressivity is caused by a bug in the condition terminating ext4_writepages(). We start writing from the beginning of the file even if we should have terminated ext4_writepages() because wbc->nr_to_write <= 0. After fixing the condition the throughput of the fio workload is about 20% better than before writeback reorganization. Reported-by:
"Yan, Zheng" <zheng.z.yan@intel.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 12 Sep, 2013 1 commit
-
-
Kirill A. Shutemov authored
truncate_pagecache() doesn't care about old size since commit cedabed4 ("vfs: Fix vmtruncate() regression"). Let's drop it. Signed-off-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 04 Sep, 2013 1 commit
-
-
Christoph Hellwig authored
Add support to the core direct-io code to defer AIO completions to user context using a workqueue. This replaces opencoded and less efficient code in XFS and ext4 (we save a memory allocation for each direct IO) and will be needed to properly support O_(D)SYNC for AIO. The communication between the filesystem and the direct I/O code requires a new buffer head flag, which is a bit ugly but not avoidable until the direct I/O code stops abusing the buffer_head structure for communicating with the filesystems. Currently this creates a per-superblock unbound workqueue for these completions, which is taken from an earlier patch by Jan Kara. I'm not really convinced about this use and would prefer a "normal" global workqueue with a high concurrency limit, but this needs further discussion. JK: Fixed ext4 part, dynamic allocation of the workqueue. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- 28 Aug, 2013 2 commits
-
-
Anatol Pomozov authored
Signed-off-by:
Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Dmitry Monakhov authored
Use wait_for_stable_page() instead of wait_on_page_writeback() Signed-off-by:
Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Jan Kara <jack@suse.cz>
-
- 17 Aug, 2013 6 commits
-
-
Jan Kara authored
The following race can lead to a loss of i_disksize update from truncate thus resulting in a wrong inode size if the inode size isn't updated again before inode is reclaimed: ext4_setattr() mpage_map_and_submit_extent() EXT4_I(inode)->i_disksize = attr->ia_size; ... ... disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT /* False because i_size isn't * updated yet */ if (disksize > i_size_read(inode)) /* True, because i_disksize is * already truncated */ if (disksize > EXT4_I(inode)->i_disksize) /* Overwrite i_disksize * update from truncate */ ext4_update_i_disksize() i_size_write(inode, attr->ia_size); For other places updating i_disksize such race cannot happen because i_mutex prevents these races. Writeback is the only place where we do not hold i_mutex and we cannot grab it there because of lock ordering. We fix the race by doing both i_disksize and i_size update in truncate atomically under i_data_sem and in mpage_map_and_submit_extent() we move the check against i_size under i_data_sem as well. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
Jan Kara authored
Merge conditions in ext4_setattr() handling inode size changes, also move ext4_begin_ordered_truncate() call somewhat earlier because it simplifies error recovery in case of failure. Also add error handling in case i_disksize update fails. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
Jan Kara authored
Inode size can arbitrarily change while writeback is in progress. When ext4_writepages() has prepared a long extent for mapping and truncate then reduces i_size, mpage_map_and_submit_buffers() will always map just one buffer in a page instead of all of them due to lblk < blocks check. So we end up not using all blocks we've allocated (thus leaking them) and also delalloc accounting goes wrong manifesting as a warning like: ext4_da_release_space:1333: ext4_da_release_space: ino 12, to_free 1 with only 0 reserved data blocks Note that the problem can happen only when blocksize < pagesize because otherwise we have only a single buffer in the page. Fix the problem by removing the size check from the mapping loop. We have an extent allocated so we have to use it all before checking for i_size. We also rename add_page_bufs_to_extent() to mpage_process_page_bufs() and make that function submit the page for IO if all buffers (upto EOF) in it are mapped. Reported-by:
Dave Jones <davej@redhat.com> Reported-by:
Zheng Liu <gnehzuil.liu@gmail.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
Jan Kara authored
Currently the logic whether the current buffer can be added to an extent of buffers to map is split between mpage_add_bh_to_extent() and add_page_bufs_to_extent(). Move the whole logic to mpage_add_bh_to_extent() which makes things a bit more straightforward and make following i_size fixes easier. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
Theodore Ts'o authored
Don't use an unsigned long long for the es_status flags; this requires that we pass 64-bit values around which is painful on 32-bit systems. Instead pass the extent status flags around using the low 4 bits of an unsigned int, and shift them into place when we are reading or writing es_pblk. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Zheng Liu <wenqing.lz@taobao.com>
-
Jan Kara authored
Commit 0713ed0c added jbd2_journal_file_inode() call into ext4_block_zero_page_range(). However that function gets called from truncate path and thus inode needn't have jinode attached - that happens in ext4_file_open() but the file needn't be ever open since mount. Calling jbd2_journal_file_inode() without jinode attached results in the oops. We fix the problem by attaching jinode to inode also in ext4_truncate() and ext4_punch_hole() when we are going to zero out partial blocks. Reported-by:
majianpeng <majianpeng@gmail.com> Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 29 Jul, 2013 1 commit
-
-
Zheng Liu authored
In commit 921f266b : ext4: add self-testing infrastructure to do a sanity check, some sanity checks were added in map_blocks to make sure 'retval == map->m_len'. Enable these checks by default and report any assertion failures using ext4_warning() and WARN_ON() since they can help us to figure out some bugs that are otherwise hard to hit. Signed-off-by:
Zheng Liu <wenqing.lz@taobao.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- 16 Jul, 2013 1 commit
-
-
Theodore Ts'o authored
If there are no items in the extent status tree, ext4_es_lru_add() is a no-op. So it is not sufficient to call ext4_es_lru_add() before we try to lookup an entry in the extent status tree. We also need to call it at the end of ext4_ext_map_blocks(), after items have been added to the extent status tree. This could lead to inodes with that have extent status trees but which are not in the LRU list, which means they won't get considered for eviction by the es_shrinker. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
-
- 13 Jul, 2013 1 commit
-
-
Theodore Ts'o authored
Replace "assertation" with "assertion" in lots and lots of debugging messages. Correct the comment stating when ext4_es_insert_extent() is used. It was no doubt tree at one point, but it is no longer true... Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <gnehzuil.liu@gmail.com>
-
- 06 Jul, 2013 1 commit
-
-
Jan Kara authored
The loop in mpage_map_and_submit_extent() is guaranteed to always run at least once since the caller of mpage_map_and_submit_extent() makes sure map->m_len > 0. So make that explicit using do-while instead of pure while which also silences the compiler warning about uninitialized 'err' variable. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Lukas Czerner <lczerner@redhat.com>
-
- 01 Jul, 2013 3 commits
-
-
Theodore Ts'o authored
The function mpage_released_unused_page() must only be called once; otherwise the kernel will BUG() when the second call to mpage_released_unused_page() tries to unlock the pages which had been unlocked by the first call. Also restructure the error handling so that we only give up on writing the dirty pages in the case of ENOSPC where retrying the allocation won't help. Otherwise, a transient failure, such as a kmalloc() failure in calling ext4_map_blocks() might cause us to give up on those pages, leading to a scary message in /var/log/messages plus data loss. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Reviewed-by:
Jan Kara <jack@suse.cz>
-
Lukas Czerner authored
Currently if we pass range into ext4_zero_partial_blocks() which covers entire block we would attempt to zero it even though we should only zero unaligned part of the block. Fix this by checking whether the range covers the whole block skip zeroing if so. Signed-off-by:
Lukas Czerner <lczerner@redhat.com> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
Theodore Ts'o authored
The function ext4_write_inline_data_end() can return an error. So we need to assign it to a signed integer variable to check for an error return (since copied is an unsigned int). Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
-