Commits · 590a809ff743e7bd890ba5fb36bc38e20a36de53 · Kirill Smelkov / linux

04 Aug, 2023 2 commits

jbd2: check 'jh->b_transaction' before removing it from checkpoint · 590a809f

Zhihao Cheng authored Jul 14, 2023

Following process will corrupt ext4 image:
Step 1:
jbd2_journal_commit_transaction
 __jbd2_journal_insert_checkpoint(jh, commit_transaction)
 // Put jh into trans1->t_checkpoint_list
 journal->j_checkpoint_transactions = commit_transaction
 // Put trans1 into journal->j_checkpoint_transactions

Step 2:
do_get_write_access
 test_clear_buffer_dirty(bh) // clear buffer dirty，set jbd dirty
 __jbd2_journal_file_buffer(jh, transaction) // jh belongs to trans2

Step 3:
drop_cache
 journal_shrink_one_cp_list
  jbd2_journal_try_remove_checkpoint
   if (!trylock_buffer(bh))  // lock bh, true
   if (buffer_dirty(bh))     // buffer is not dirty
   __jbd2_journal_remove_checkpoint(jh)
   // remove jh from trans1->t_checkpoint_list

Step 4:
jbd2_log_do_checkpoint
 trans1 = journal->j_checkpoint_transactions
 // jh is not in trans1->t_checkpoint_list
 jbd2_cleanup_journal_tail(journal)  // trans1 is done

Step 5: Power cut, trans2 is not committed, jh is lost in next mounting.

Fix it by checking 'jh->b_transaction' before remove it from checkpoint.

Cc: stable@kernel.org
Fixes: 46f881b5 ("jbd2: fix a race when checking checkpoint buffer busy")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230714025528.564988-3-yi.zhang@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

590a809f

jbd2: fix checkpoint cleanup performance regression · 373ac521

Zhang Yi authored Jul 14, 2023

journal_clean_one_cp_list() has been merged into
journal_shrink_one_cp_list(), but do chekpoint buffer cleanup from the
committing process is just a best effort, it should stop scan once it
meet a busy buffer, or else it will cause a lot of invalid buffer scan
and checks. We catch a performance regression when doing fs_mark tests
below.

Test cmd:
 ./fs_mark  -d  scratch  -s  1024  -n  10000  -t  1  -D  100  -N  100

Before merging checkpoint buffer cleanup:
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       8304.9            49033

After merging checkpoint buffer cleanup:
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       7649.0            50012
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       2107.1            50871

After merging checkpoint buffer cleanup, the total loop count in
journal_shrink_one_cp_list() could be up to 6,261,600+ (50,000+ ~
100,000+ in general), most of them are invalid. This patch fix it
through passing 'shrink_type' into journal_shrink_one_cp_list() and add
a new 'SHRINK_BUSY_STOP' to indicate it should stop once meet a busy
buffer. After fix, the loop count descending back to 10,000+.

After this fix:
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       8558.4            49109

Cc: stable@kernel.org
Fixes: b98dba27 ("jbd2: remove journal_clean_one_cp_list()")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230714025528.564988-2-yi.zhang@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

373ac521

03 Aug, 2023 11 commits

ext4: correct some stale comment of criteria · 4eea9fbe

Kemeng Shi authored Aug 01, 2023

We named criteria with CR_XXX, correct stale comment to criteria with
raw number.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-11-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

4eea9fbe

ext4: return found group directly in ext4_mb_choose_next_group_best_avail · bcb123ac

Kemeng Shi authored Aug 01, 2023

Return good group when it's found in loop to remove futher check if good
group is found after loop.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-10-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

bcb123ac

ext4: return found group directly in ext4_mb_choose_next_group_goal_fast · b50675a4

Kemeng Shi authored Aug 01, 2023

Return good group when it's found in loop to remove futher check if good
group is found after loop.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-9-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

b50675a4

ext4: remove unused ext4_{set}/{clear}_bit_atomic · f6c72fef

Kemeng Shi authored Aug 01, 2023

Remove ext4_set_bit_atomic and ext4_clear_bit_atomic which are defined but not
used.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-8-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

f6c72fef

ext4: replace the traditional ternary conditional operator with with max()/min() · de8bf0e5

Kemeng Shi authored Aug 01, 2023

Replace the traditional ternary conditional operator with with max()/min()
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-7-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

de8bf0e5

ext4: remove unnecessary return for void function · ad635507

Kemeng Shi authored Aug 01, 2023

The return at end of void function is unnecessary, just remove it.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-6-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

ad635507

ext4: use is_power_of_2 helper in ext4_mb_regular_allocator · bb60caa2

Kemeng Shi authored Aug 01, 2023

Use intuitive is_power_of_2 helper in ext4_mb_regular_allocator.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-5-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

bb60caa2

ext4: return found group directly in ext4_mb_choose_next_group_p2_aligned · 919eb90c

Kemeng Shi authored Aug 01, 2023

Return good group when it's found in loop to remove unnecessary NULL
initialization of grp and futher check if good group is found after loop.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-4-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

919eb90c

ext4: avoid potential data overflow in next_linear_group · 60c672b7

Kemeng Shi authored Aug 01, 2023

ngroups is ext4_group_t (unsigned int) while next_linear_group treat it
in int. If ngroups is bigger than max number described by int, it will
be treat as a negative number. Then "return group + 1 >= ngroups ? 0 :
group + 1;" may keep returning 0.
Switch int to ext4_group_t in next_linear_group to fix the overflow.

Fixes: 196e402a ("ext4: improve cr 0 / cr 1 group scanning")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-3-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

60c672b7

ext4: correct grp validation in ext4_mb_good_group · a9ce5993

Kemeng Shi authored Aug 01, 2023

Group corruption check will access memory of grp and will trigger kernel
crash if grp is NULL. So do NULL check before corruption check.

Fixes: 5354b2af ("ext4: allow ext4_get_group_info() to fail")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230801143204.2284343-2-shikemeng@huaweicloud.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

a9ce5993

ext4: replace CR_FAST macro with inline function for readability · 304749c0

Ojaswin Mujoo authored Jun 30, 2023

Replace CR_FAST with ext4_mb_cr_expensive() inline function for better
readability. This function returns true if the criteria is one of the
expensive/slower ones where lots of disk IO/prefetching is acceptable.

No functional changes are intended in this patch.
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20230630085927.140137-1-ojaswin@linux.ibm.comSigned-off-by: Theodore Ts'o <tytso@mit.edu>

304749c0

29 Jul, 2023 15 commits