Commit 4ad6aa46 authored by Brian Foster's avatar Brian Foster Committed by Kent Overstreet

bcachefs: fix truncate overflow if folio is beyond EOF

generic/083 occasionally reproduces a panic caused by an overflow
when accessing the bch_folio_sector array of the folio being
processed by __bch2_truncate_folio(). The immediate cause of the
overflow is that the folio offset is beyond i_size, and therefore
the sector index calculation underflows on subtraction of the folio
offset.

One cause of this is mainly observed on nocow mounts. When nocow is
enabled, fallocate performs physical block allocation (as opposed to
block reservation in cow mode), which range_has_data() then
interprets as valid data that requires partial zeroing on truncate.
Therefore, if a post-eof zero range request lands across post-eof
preallocated blocks, __bch2_truncate_folio() may actually create a
post-eof folio in order to perform zeroing. To avoid this problem,
update range_has_data() to filter out unwritten blocks from folio
creation and partial zeroing.

Even though we should never create folios beyond EOF like this, the
mere existence of such folios is not necessarily a fatal error. Fix
up the truncate code to warn about this condition and not overflow
the sector array and possibly crash the system. The addition of this
warning without the corresponding unwritten extent fix has shown
that various other fstests are able to reproduce this problem fairly
frequently, but often in ways that doesn't necessarily result in a
kernel panic or a change in user observable behavior, and therefore
the problem goes undetected.
Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
parent 550a6a49
...@@ -2783,7 +2783,7 @@ static inline int range_has_data(struct bch_fs *c, u32 subvol, ...@@ -2783,7 +2783,7 @@ static inline int range_has_data(struct bch_fs *c, u32 subvol,
goto err; goto err;
for_each_btree_key_upto_norestart(&trans, iter, BTREE_ID_extents, start, end, 0, k, ret) for_each_btree_key_upto_norestart(&trans, iter, BTREE_ID_extents, start, end, 0, k, ret)
if (bkey_extent_is_data(k.k)) { if (bkey_extent_is_data(k.k) && !bkey_extent_is_unwritten(k)) {
ret = 1; ret = 1;
break; break;
} }
...@@ -2809,6 +2809,7 @@ static int __bch2_truncate_folio(struct bch_inode_info *inode, ...@@ -2809,6 +2809,7 @@ static int __bch2_truncate_folio(struct bch_inode_info *inode,
struct folio *folio; struct folio *folio;
s64 i_sectors_delta = 0; s64 i_sectors_delta = 0;
int ret = 0; int ret = 0;
loff_t end_pos;
folio = filemap_lock_folio(mapping, index); folio = filemap_lock_folio(mapping, index);
if (!folio) { if (!folio) {
...@@ -2875,10 +2876,18 @@ static int __bch2_truncate_folio(struct bch_inode_info *inode, ...@@ -2875,10 +2876,18 @@ static int __bch2_truncate_folio(struct bch_inode_info *inode,
/* /*
* Caller needs to know whether this folio will be written out by * Caller needs to know whether this folio will be written out by
* writeback - doing an i_size update if necessary - or whether it will * writeback - doing an i_size update if necessary - or whether it will
* be responsible for the i_size update: * be responsible for the i_size update.
*
* Note that we shouldn't ever see a folio beyond EOF, but check and
* warn if so. This has been observed by failure to clean up folios
* after a short write and there's still a chance reclaim will fix
* things up.
*/ */
ret = s->s[(min(inode->v.i_size, folio_end_pos(folio)) - WARN_ON_ONCE(folio_pos(folio) >= inode->v.i_size);
folio_pos(folio) - 1) >> 9].state >= SECTOR_dirty; end_pos = folio_end_pos(folio);
if (inode->v.i_size > folio_pos(folio))
end_pos = min(inode->v.i_size, end_pos);
ret = s->s[(end_pos - folio_pos(folio) - 1) >> 9].state >= SECTOR_dirty;
folio_zero_segment(folio, start_offset, end_offset); folio_zero_segment(folio, start_offset, end_offset);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment