1. 10 Aug, 2018 5 commits
    • Chao Yu's avatar
      f2fs: fix to avoid broken of dnode block list · 50fa53ec
      Chao Yu authored
      f2fs recovery flow is relying on dnode block link list, it means fsynced
      file recovery depends on previous dnode's persistence in the list, so
      during fsync() we should wait on all regular inode's dnode writebacked
      before issuing flush.
      
      By this way, we can avoid dnode block list being broken by out-of-order
      IO submission due to IO scheduler or driver.
      
      Sheng Yong helps to do the test with this patch:
      
      Target:/data (f2fs, -)
      64MB / 32768KB / 4KB / 8
      
      1 / PERSIST / Index
      
      Base:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	867.82		204.15		41440.03	41370.54	680.8		1025.94		1031.08
      2	871.87		205.87		41370.3		40275.2		791.14		1065.84		1101.7
      3	866.52		205.69		41795.67	40596.16	694.69		1037.16		1031.48
      Avg	868.7366667	205.2366667	41535.33333	40747.3		722.21		1042.98		1054.753333
      
      After:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	798.81		202.5		41143		40613.87	602.71		838.08		913.83
      2	805.79		206.47		40297.2		41291.46	604.44		840.75		924.27
      3	814.83		206.17		41209.57	40453.62	602.85		834.66		927.91
      Avg	806.4766667	205.0466667	40883.25667	40786.31667	603.3333333	837.83		922.0033333
      
      Patched/Original:
      	0.928332713	0.999074239	0.984300676	1.000957528	0.835398753	0.803303994	0.874141189
      
      It looks like atomic write will suffer performance regression.
      
      I suspect that the criminal is that we forcing to wait all dnode being in
      storage cache before we issue PREFLUSH+FUA.
      
      BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
      cause the problem: we will lose data of last transaction after SPO, even if
      atomic write return no error:
      
      - atomic_open();
      - write() P1, P2, P3;
      - atomic_commit();
       - writeback data: P1, P2, P3;
       - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with fsync_mark is
      writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
      last transaction.
       - preflush + fua;
      - power-cut
      
      If we don't wait dnode writeback for atomic_write:
      
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	779.91		206.03		41621.5		40333.16	716.9		1038.21		1034.85
      2	848.51		204.35		40082.44	39486.17	791.83		1119.96		1083.77
      3	772.12		206.27		41335.25	41599.65	723.29		1055.07		971.92
      Avg	800.18		205.55		41013.06333	40472.99333	744.0066667	1071.08		1030.18
      
      Patched/Original:
      	0.92108464	1.001526693	0.987425886	0.993268102	1.030180511	1.026942031	0.976702294
      
      SQLite's performance recovers.
      
      Jaegeuk:
      "Practically, I don't see db corruption becase of this. We can excuse to lose
      the last transaction."
      
      Finally, we decide to keep original implementation of atomic write interface
      sematics that we don't wait all dnode writeback before preflush+fua submission.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      50fa53ec
    • Gustavo A. R. Silva's avatar
      f2fs: use true and false for boolean values · 6e45f2a5
      Gustavo A. R. Silva authored
      Return statements in functions returning bool should use true or false
      instead of an integer value.
      
      This issue was detected with the help of Coccinelle.
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6e45f2a5
    • Chao Yu's avatar
      f2fs: fix to do sanity check with cp_pack_start_sum · e494c2f9
      Chao Yu authored
      After fuzzing, cp_pack_start_sum could be corrupted, so current log's
      summary info should be wrong due to loading incorrect summary block.
      Then, if segment's type in current log is exceeded NR_CURSEG_TYPE, it
      can lead accessing invalid dirty_i->dirty_segmap bitmap finally.
      
      Add sanity check for cp_pack_start_sum to fix this issue.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200419
      
      - Reproduce
      
      - Kernel message (f2fs-dev w/ KASAN)
      [ 3117.578432] F2FS-fs (loop0): Invalid log blocks per segment (8)
      
      [ 3117.578445] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
      [ 3117.581364] F2FS-fs (loop0): invalid crc_offset: 30716
      [ 3117.583564] WARNING: CPU: 1 PID: 1225 at fs/f2fs/checkpoint.c:90 __get_meta_page+0x448/0x4b0
      [ 3117.583570] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
      [ 3117.584014] CPU: 1 PID: 1225 Comm: mount Not tainted 4.17.0+ #1
      [ 3117.584017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [ 3117.584022] RIP: 0010:__get_meta_page+0x448/0x4b0
      [ 3117.584023] Code: 00 49 8d bc 24 84 00 00 00 e8 74 54 da ff 41 83 8c 24 84 00 00 00 08 4c 89 f6 4c 89 ef e8 c0 d9 95 00 48 89 ef e8 18 e3 00 00 <0f> 0b f0 80 4d 48 04 e9 0f fe ff ff 0f 0b 48 89 c7 48 89 04 24 e8
      [ 3117.584072] RSP: 0018:ffff88018eb678c0 EFLAGS: 00010286
      [ 3117.584082] RAX: ffff88018f0a6a78 RBX: ffffea0007a46600 RCX: ffffffff9314d1b2
      [ 3117.584085] RDX: ffffffff00000001 RSI: 0000000000000000 RDI: ffff88018f0a6a98
      [ 3117.584087] RBP: ffff88018ebe9980 R08: 0000000000000002 R09: 0000000000000001
      [ 3117.584090] R10: 0000000000000001 R11: ffffed00326e4450 R12: ffff880193722200
      [ 3117.584092] R13: ffff88018ebe9afc R14: 0000000000000206 R15: ffff88018eb67900
      [ 3117.584096] FS:  00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
      [ 3117.584098] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3117.584101] CR2: 00000000016f21b8 CR3: 0000000191c22000 CR4: 00000000000006e0
      [ 3117.584112] Call Trace:
      [ 3117.584121]  ? f2fs_set_meta_page_dirty+0x150/0x150
      [ 3117.584127]  ? f2fs_build_segment_manager+0xbf9/0x3190
      [ 3117.584133]  ? f2fs_npages_for_summary_flush+0x75/0x120
      [ 3117.584145]  f2fs_build_segment_manager+0xda8/0x3190
      [ 3117.584151]  ? f2fs_get_valid_checkpoint+0x298/0xa00
      [ 3117.584156]  ? f2fs_flush_sit_entries+0x10e0/0x10e0
      [ 3117.584184]  ? map_id_range_down+0x17c/0x1b0
      [ 3117.584188]  ? __put_user_ns+0x30/0x30
      [ 3117.584206]  ? find_next_bit+0x53/0x90
      [ 3117.584237]  ? cpumask_next+0x16/0x20
      [ 3117.584249]  f2fs_fill_super+0x1948/0x2b40
      [ 3117.584258]  ? f2fs_commit_super+0x1a0/0x1a0
      [ 3117.584279]  ? sget_userns+0x65e/0x690
      [ 3117.584296]  ? set_blocksize+0x88/0x130
      [ 3117.584302]  ? f2fs_commit_super+0x1a0/0x1a0
      [ 3117.584305]  mount_bdev+0x1c0/0x200
      [ 3117.584310]  mount_fs+0x5c/0x190
      [ 3117.584320]  vfs_kern_mount+0x64/0x190
      [ 3117.584330]  do_mount+0x2e4/0x1450
      [ 3117.584343]  ? lockref_put_return+0x130/0x130
      [ 3117.584347]  ? copy_mount_string+0x20/0x20
      [ 3117.584357]  ? kasan_unpoison_shadow+0x31/0x40
      [ 3117.584362]  ? kasan_kmalloc+0xa6/0xd0
      [ 3117.584373]  ? memcg_kmem_put_cache+0x16/0x90
      [ 3117.584377]  ? __kmalloc_track_caller+0x196/0x210
      [ 3117.584383]  ? _copy_from_user+0x61/0x90
      [ 3117.584396]  ? memdup_user+0x3e/0x60
      [ 3117.584401]  ksys_mount+0x7e/0xd0
      [ 3117.584405]  __x64_sys_mount+0x62/0x70
      [ 3117.584427]  do_syscall_64+0x73/0x160
      [ 3117.584440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 3117.584455] RIP: 0033:0x7f5693f14b9a
      [ 3117.584456] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
      [ 3117.584505] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [ 3117.584510] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
      [ 3117.584512] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
      [ 3117.584514] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
      [ 3117.584516] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
      [ 3117.584519] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
      [ 3117.584523] ---[ end trace a8e0d899985faf31 ]---
      [ 3117.685663] F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix.
      [ 3117.685673] F2FS-fs (loop0): recover_data: ino = 2 (i_size: recover) recovered = 1, err = 0
      [ 3117.685707] ==================================================================
      [ 3117.685955] BUG: KASAN: slab-out-of-bounds in __remove_dirty_segment+0xdd/0x1e0
      [ 3117.686175] Read of size 8 at addr ffff88018f0a63d0 by task mount/1225
      
      [ 3117.686477] CPU: 0 PID: 1225 Comm: mount Tainted: G        W         4.17.0+ #1
      [ 3117.686481] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [ 3117.686483] Call Trace:
      [ 3117.686494]  dump_stack+0x71/0xab
      [ 3117.686512]  print_address_description+0x6b/0x290
      [ 3117.686517]  kasan_report+0x28e/0x390
      [ 3117.686522]  ? __remove_dirty_segment+0xdd/0x1e0
      [ 3117.686527]  __remove_dirty_segment+0xdd/0x1e0
      [ 3117.686532]  locate_dirty_segment+0x189/0x190
      [ 3117.686538]  f2fs_allocate_new_segments+0xa9/0xe0
      [ 3117.686543]  recover_data+0x703/0x2c20
      [ 3117.686547]  ? f2fs_recover_fsync_data+0x48f/0xd50
      [ 3117.686553]  ? ksys_mount+0x7e/0xd0
      [ 3117.686564]  ? policy_nodemask+0x1a/0x90
      [ 3117.686567]  ? policy_node+0x56/0x70
      [ 3117.686571]  ? add_fsync_inode+0xf0/0xf0
      [ 3117.686592]  ? blk_finish_plug+0x44/0x60
      [ 3117.686597]  ? f2fs_ra_meta_pages+0x38b/0x5e0
      [ 3117.686602]  ? find_inode_fast+0xac/0xc0
      [ 3117.686606]  ? f2fs_is_valid_blkaddr+0x320/0x320
      [ 3117.686618]  ? __radix_tree_lookup+0x150/0x150
      [ 3117.686633]  ? dqget+0x670/0x670
      [ 3117.686648]  ? pagecache_get_page+0x29/0x410
      [ 3117.686656]  ? kmem_cache_alloc+0x176/0x1e0
      [ 3117.686660]  ? f2fs_is_valid_blkaddr+0x11d/0x320
      [ 3117.686664]  f2fs_recover_fsync_data+0xc23/0xd50
      [ 3117.686670]  ? f2fs_space_for_roll_forward+0x60/0x60
      [ 3117.686674]  ? rb_insert_color+0x323/0x3d0
      [ 3117.686678]  ? f2fs_recover_orphan_inodes+0xa5/0x700
      [ 3117.686683]  ? proc_register+0x153/0x1d0
      [ 3117.686686]  ? f2fs_remove_orphan_inode+0x10/0x10
      [ 3117.686695]  ? f2fs_attr_store+0x50/0x50
      [ 3117.686700]  ? proc_create_single_data+0x52/0x60
      [ 3117.686707]  f2fs_fill_super+0x1d06/0x2b40
      [ 3117.686728]  ? f2fs_commit_super+0x1a0/0x1a0
      [ 3117.686735]  ? sget_userns+0x65e/0x690
      [ 3117.686740]  ? set_blocksize+0x88/0x130
      [ 3117.686745]  ? f2fs_commit_super+0x1a0/0x1a0
      [ 3117.686748]  mount_bdev+0x1c0/0x200
      [ 3117.686753]  mount_fs+0x5c/0x190
      [ 3117.686758]  vfs_kern_mount+0x64/0x190
      [ 3117.686762]  do_mount+0x2e4/0x1450
      [ 3117.686769]  ? lockref_put_return+0x130/0x130
      [ 3117.686773]  ? copy_mount_string+0x20/0x20
      [ 3117.686777]  ? kasan_unpoison_shadow+0x31/0x40
      [ 3117.686780]  ? kasan_kmalloc+0xa6/0xd0
      [ 3117.686786]  ? memcg_kmem_put_cache+0x16/0x90
      [ 3117.686790]  ? __kmalloc_track_caller+0x196/0x210
      [ 3117.686795]  ? _copy_from_user+0x61/0x90
      [ 3117.686801]  ? memdup_user+0x3e/0x60
      [ 3117.686804]  ksys_mount+0x7e/0xd0
      [ 3117.686809]  __x64_sys_mount+0x62/0x70
      [ 3117.686816]  do_syscall_64+0x73/0x160
      [ 3117.686824]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 3117.686829] RIP: 0033:0x7f5693f14b9a
      [ 3117.686830] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
      [ 3117.686887] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [ 3117.686892] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
      [ 3117.686894] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
      [ 3117.686896] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
      [ 3117.686899] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
      [ 3117.686901] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
      
      [ 3117.687005] Allocated by task 1225:
      [ 3117.687152]  kasan_kmalloc+0xa6/0xd0
      [ 3117.687157]  kmem_cache_alloc_trace+0xfd/0x200
      [ 3117.687161]  f2fs_build_segment_manager+0x2d09/0x3190
      [ 3117.687165]  f2fs_fill_super+0x1948/0x2b40
      [ 3117.687168]  mount_bdev+0x1c0/0x200
      [ 3117.687171]  mount_fs+0x5c/0x190
      [ 3117.687174]  vfs_kern_mount+0x64/0x190
      [ 3117.687177]  do_mount+0x2e4/0x1450
      [ 3117.687180]  ksys_mount+0x7e/0xd0
      [ 3117.687182]  __x64_sys_mount+0x62/0x70
      [ 3117.687186]  do_syscall_64+0x73/0x160
      [ 3117.687190]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [ 3117.687285] Freed by task 19:
      [ 3117.687412]  __kasan_slab_free+0x137/0x190
      [ 3117.687416]  kfree+0x8b/0x1b0
      [ 3117.687460]  ttm_bo_man_put_node+0x61/0x80 [ttm]
      [ 3117.687476]  ttm_bo_cleanup_refs+0x15f/0x250 [ttm]
      [ 3117.687492]  ttm_bo_delayed_delete+0x2f0/0x300 [ttm]
      [ 3117.687507]  ttm_bo_delayed_workqueue+0x17/0x50 [ttm]
      [ 3117.687528]  process_one_work+0x2f9/0x740
      [ 3117.687531]  worker_thread+0x78/0x6b0
      [ 3117.687541]  kthread+0x177/0x1c0
      [ 3117.687545]  ret_from_fork+0x35/0x40
      
      [ 3117.687638] The buggy address belongs to the object at ffff88018f0a6300
                      which belongs to the cache kmalloc-192 of size 192
      [ 3117.688014] The buggy address is located 16 bytes to the right of
                      192-byte region [ffff88018f0a6300, ffff88018f0a63c0)
      [ 3117.688382] The buggy address belongs to the page:
      [ 3117.688554] page:ffffea00063c2980 count:1 mapcount:0 mapping:ffff8801f3403180 index:0x0
      [ 3117.688788] flags: 0x17fff8000000100(slab)
      [ 3117.688944] raw: 017fff8000000100 ffffea00063c2840 0000000e0000000e ffff8801f3403180
      [ 3117.689166] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
      [ 3117.689386] page dumped because: kasan: bad access detected
      
      [ 3117.689653] Memory state around the buggy address:
      [ 3117.689816]  ffff88018f0a6280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [ 3117.690027]  ffff88018f0a6300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [ 3117.690239] >ffff88018f0a6380: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 3117.690448]                                                  ^
      [ 3117.690644]  ffff88018f0a6400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [ 3117.690868]  ffff88018f0a6480: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 3117.691077] ==================================================================
      [ 3117.691290] Disabling lock debugging due to kernel taint
      [ 3117.693893] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [ 3117.694120] PGD 80000001f01bc067 P4D 80000001f01bc067 PUD 1d9638067 PMD 0
      [ 3117.694338] Oops: 0002 [#1] SMP KASAN PTI
      [ 3117.694490] CPU: 1 PID: 1225 Comm: mount Tainted: G    B   W         4.17.0+ #1
      [ 3117.694703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [ 3117.695073] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0
      [ 3117.695246] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7
      [ 3117.695793] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292
      [ 3117.695969] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000
      [ 3117.696182] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297
      [ 3117.696391] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb
      [ 3117.696604] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019
      [ 3117.696813] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0
      [ 3117.697032] FS:  00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
      [ 3117.697280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3117.702357] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0
      [ 3117.707235] Call Trace:
      [ 3117.712077]  locate_dirty_segment+0x189/0x190
      [ 3117.716891]  f2fs_allocate_new_segments+0xa9/0xe0
      [ 3117.721617]  recover_data+0x703/0x2c20
      [ 3117.726316]  ? f2fs_recover_fsync_data+0x48f/0xd50
      [ 3117.730957]  ? ksys_mount+0x7e/0xd0
      [ 3117.735573]  ? policy_nodemask+0x1a/0x90
      [ 3117.740198]  ? policy_node+0x56/0x70
      [ 3117.744829]  ? add_fsync_inode+0xf0/0xf0
      [ 3117.749487]  ? blk_finish_plug+0x44/0x60
      [ 3117.754152]  ? f2fs_ra_meta_pages+0x38b/0x5e0
      [ 3117.758831]  ? find_inode_fast+0xac/0xc0
      [ 3117.763448]  ? f2fs_is_valid_blkaddr+0x320/0x320
      [ 3117.768046]  ? __radix_tree_lookup+0x150/0x150
      [ 3117.772603]  ? dqget+0x670/0x670
      [ 3117.777159]  ? pagecache_get_page+0x29/0x410
      [ 3117.781648]  ? kmem_cache_alloc+0x176/0x1e0
      [ 3117.786067]  ? f2fs_is_valid_blkaddr+0x11d/0x320
      [ 3117.790476]  f2fs_recover_fsync_data+0xc23/0xd50
      [ 3117.794790]  ? f2fs_space_for_roll_forward+0x60/0x60
      [ 3117.799086]  ? rb_insert_color+0x323/0x3d0
      [ 3117.803304]  ? f2fs_recover_orphan_inodes+0xa5/0x700
      [ 3117.807563]  ? proc_register+0x153/0x1d0
      [ 3117.811766]  ? f2fs_remove_orphan_inode+0x10/0x10
      [ 3117.815947]  ? f2fs_attr_store+0x50/0x50
      [ 3117.820087]  ? proc_create_single_data+0x52/0x60
      [ 3117.824262]  f2fs_fill_super+0x1d06/0x2b40
      [ 3117.828367]  ? f2fs_commit_super+0x1a0/0x1a0
      [ 3117.832432]  ? sget_userns+0x65e/0x690
      [ 3117.836500]  ? set_blocksize+0x88/0x130
      [ 3117.840501]  ? f2fs_commit_super+0x1a0/0x1a0
      [ 3117.844420]  mount_bdev+0x1c0/0x200
      [ 3117.848275]  mount_fs+0x5c/0x190
      [ 3117.852053]  vfs_kern_mount+0x64/0x190
      [ 3117.855810]  do_mount+0x2e4/0x1450
      [ 3117.859441]  ? lockref_put_return+0x130/0x130
      [ 3117.862996]  ? copy_mount_string+0x20/0x20
      [ 3117.866417]  ? kasan_unpoison_shadow+0x31/0x40
      [ 3117.869719]  ? kasan_kmalloc+0xa6/0xd0
      [ 3117.872948]  ? memcg_kmem_put_cache+0x16/0x90
      [ 3117.876121]  ? __kmalloc_track_caller+0x196/0x210
      [ 3117.879333]  ? _copy_from_user+0x61/0x90
      [ 3117.882467]  ? memdup_user+0x3e/0x60
      [ 3117.885604]  ksys_mount+0x7e/0xd0
      [ 3117.888700]  __x64_sys_mount+0x62/0x70
      [ 3117.891742]  do_syscall_64+0x73/0x160
      [ 3117.894692]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 3117.897669] RIP: 0033:0x7f5693f14b9a
      [ 3117.900563] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
      [ 3117.906922] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [ 3117.910159] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a
      [ 3117.913469] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040
      [ 3117.916764] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
      [ 3117.920071] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040
      [ 3117.923393] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
      [ 3117.926680] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
      [ 3117.949979] CR2: 0000000000000000
      [ 3117.954283] ---[ end trace a8e0d899985faf32 ]---
      [ 3117.958575] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0
      [ 3117.962810] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7
      [ 3117.971789] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292
      [ 3117.976333] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000
      [ 3117.980926] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297
      [ 3117.985497] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb
      [ 3117.990098] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019
      [ 3117.994761] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0
      [ 3117.999392] FS:  00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
      [ 3118.004096] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3118.008816] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0
      
      - Location
      https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/segment.c#L775
      		if (test_and_clear_bit(segno, dirty_i->dirty_segmap[t]))
      			dirty_i->nr_dirty[t]--;
      Here dirty_i->dirty_segmap[t] can be NULL which leads to crash in test_and_clear_bit()
      
      Reported-by Wen Xu <wen.xu@gatech.edu>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e494c2f9
    • Jaegeuk Kim's avatar
      f2fs: avoid f2fs_bug_on() in cp_error case · 8d714f8a
      Jaegeuk Kim authored
      There is a subtle race condition to invoke f2fs_bug_on() in shutdown tests. I've
      confirmed that the last checkpoint is preserved in consistent state, so it'd be
      fine to just return error at this moment.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8d714f8a
    • Chao Yu's avatar
      f2fs: fix to clear PG_checked flag in set_page_dirty() · 66110abc
      Chao Yu authored
      PG_checked flag will be set on data page during GC, later, we can
      recognize such page by the flag and migrate page to cold segment.
      
      But previously, we don't clear this flag when invalidating data page,
      after page redirtying, we will write it into wrong log.
      
      Let's clear PG_checked flag in set_page_dirty() to avoid this.
      Signed-off-by: default avatarWeichao Guo <guoweichao@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      66110abc
  2. 01 Aug, 2018 31 commits
    • Chao Yu's avatar
      f2fs: fix to active page in lru list for read path · 82cf4f13
      Chao Yu authored
      If config CONFIG_F2FS_FAULT_INJECTION is on, for both read or write path
      we will call find_lock_page() to get the page, but for read path, it
      missed to passing FGP_ACCESSED to allocator to active the page in LRU
      list, result in being reclaimed in advance incorrectly, fix it.
      Reported-by: default avatarXianrong Zhou <zhouxianrong@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      82cf4f13
    • Chao Yu's avatar
      f2fs: don't keep meta pages used for block migration · 18767e62
      Chao Yu authored
      For migration of encrypted inode's block, we load data of encrypted block
      into meta inode's page cache, after checkpoint, those all intermediate
      pages should be clean, and no one will read them again, so let's just
      release them for more memory.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      18767e62
    • Chao Yu's avatar
      f2fs: fix to restrict mount condition when without CONFIG_QUOTA · 4ddc1b28
      Chao Yu authored
      Like quota_ino feature, we need to reject mounting RDWR with image
      which enables project_quota feature when there is no CONFIG_QUOTA
      be set in kernel.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4ddc1b28
    • Sheng Yong's avatar
      f2fs: quota: do not mount as RDWR without QUOTA if quota feature enabled · 00960c2c
      Sheng Yong authored
      If quota feature is enabled, quota is on by default. However, if
      CONFIG_QUOTA is not built in kernel, dquot entries will not get updated,
      which leads to quota inconsistency.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      00960c2c
    • Sheng Yong's avatar
      f2fs: quota: fix incorrect comments · 76cf05d7
      Sheng Yong authored
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      76cf05d7
    • Sheng Yong's avatar
      f2fs: quota: decrease the lock granularity of statfs_project · 955ac6e5
      Sheng Yong authored
      According to fs/quota/dquot.c, `dq_data_lock' protects mem_dqinfo
      structures and modifications of dquot pointers in the inode, and
      `dquot->dq_dqb_lock' protects data from dq_dqb.
      
      We should use dquot->dq_dqb_lock in statfs_project instead of
      dq_dat_lock.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      955ac6e5
    • Yunlong Song's avatar
      f2fs: add proc entry to show victim_secmap bitmap · 970e348d
      Yunlong Song authored
      This patch adds a new proc entry to show victim_secmap information in
      more detail, which is very helpful to know the get_victim candidate
      status clearly, and helpful to debug problems (e.g., some sections can
      not gc all of its blocks, since some blocks belong to atomic file,
      leaving victim_secmap with section bit setting, in extrem case, this
      will lead all bytes of victim_secmap setting with 0xff).
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      970e348d
    • Chao Yu's avatar
      f2fs: let checkpoint flush dnode page of regular · fd8c8caf
      Chao Yu authored
      Fsyncer will wait on all dnode pages of regular writeback before flushing,
      if there are async dnode pages blocked by IO scheduler, it may decrease
      fsync's performance.
      
      In this patch, we choose to let f2fs_balance_fs_bg() to trigger checkpoint
      to flush these dnode pages of regular, so async IO of dnode page can be
      elimitnated, making fsyncer only need to wait for sync IO.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      fd8c8caf
    • Yunlong Song's avatar
      f2fs: issue discard align to section in LFS mode · ad6672bb
      Yunlong Song authored
      For the case when sbi->segs_per_sec > 1 with lfs mode, take
      section:segment = 5 for example, if the section prefree_map is
      ...previous section | current section (1 1 0 1 1) | next section...,
      then the start = x, end = x + 1, after start = start_segno +
      sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
      their bitmap is still set, which will cause duplicated
      f2fs_issue_discard of this same section in the next write_checkpoint:
      
      round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
      then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
      write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
      prefree of NO.2 is cleared, and no discard issued
      
      round 2: rm data block NO.0, NO.1, NO.3, NO.4
      all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
      prefree_map: 1 1 0 1 1
      write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
      valid blocks of this section, so discard issued, but this time prefree
      bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
      
      round 3:
      write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
      0 0 0 0 0, no valid blocks of this section, so discard issued,
      this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
      this section is sent again...
      
      To fix this problem, we can align the start and end value to section
      boundary for fstrim and real-time discard operation, and decide to issue
      discard only when the whole section is invalid, which can issue discard
      aligned to section size as much as possible and avoid redundant discard.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      ad6672bb
    • Jaegeuk Kim's avatar
      f2fs: don't allow any writes on aborted atomic writes · 455e3a58
      Jaegeuk Kim authored
      In order to prevent abusing atomic writes by abnormal users, we've added a
      threshold, 20% over memory footprint, which disallows further atomic writes.
      Previously, however, SQLite doesn't know the files became normal, so that
      it could write stale data and commit on revoked normal database file.
      
      Once f2fs detects such the abnormal behavior, this patch tries to avoid further
      writes in write_begin().
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      455e3a58
    • Chao Yu's avatar
      f2fs: restrict setting up inode.i_advise · 797c1cb5
      Chao Yu authored
      In order to give advise to f2fs to recognize hot/cold file, it is possible
      that we can set specific bit in inode.i_advise through setxattr(), but
      there are several bits which are used internally, such as encrypt_bit,
      keep_size_bit, they should never be changed through setxattr().
      
      So that this patch 1) adds FADVISE_MODIFIABLE_BITS to filter modifiable
      bits user given, 2) supports to clear {hot,cold}_file bits.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      797c1cb5
    • Yunlei He's avatar
      f2fs: fix wrong kernel message when recover fsync data on ro fs · e6b0b159
      Yunlei He authored
      This patch fix wrong message info for recover fsync data
      on readonly fs.
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e6b0b159
    • Chao Yu's avatar
      f2fs: clean up ioctl interface naming · 059c0648
      Chao Yu authored
      Romve redundant prefix 'f2fs_' in the middle of f2fs_ioc_f2fs_write_checkpoint().
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      059c0648
    • Chao Yu's avatar
      2079f115
    • Chao Yu's avatar
      f2fs: clean up with f2fs_encrypted_inode() · 5b72d5e0
      Chao Yu authored
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5b72d5e0
    • Chao Yu's avatar
      f2fs: clean up with get_current_nat_page · 80551d17
      Chao Yu authored
      Just cleanup, no logic change.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      80551d17
    • Chao Yu's avatar
      f2fs: kill EXT_TREE_VEC_SIZE · 6122003a
      Chao Yu authored
      Since commit 201ef5e0 ("f2fs: improve shrink performance of extent nodes"),
      there is no user of EXT_TREE_VEC_SIZE, just kill it for cleanup.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6122003a
    • Hyunchul Lee's avatar
      f2fs: avoid duplicated permission check for "trusted." xattrs · 5d3ce4f7
      Hyunchul Lee authored
      Because xattr_permission already checks CAP_SYS_ADMIN
      capability, we don't need to check it.
      Signed-off-by: default avatarHyunchul Lee <cheol.lee@lge.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5d3ce4f7
    • Chao Yu's avatar
      f2fs: fix to propagate error from __get_meta_page() · 7735730d
      Chao Yu authored
      If caller of __get_meta_page() can handle error, let's propagate error
      from __get_meta_page().
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      7735730d
    • Chao Yu's avatar
      f2fs: fix to do sanity check with i_extra_isize · 18dd6470
      Chao Yu authored
      If inode.i_extra_isize was fuzzed to an abnormal value, when
      calculating inline data size, the result will overflow, result
      in accessing invalid memory area when operating inline data.
      
      Let's do sanity check with i_extra_isize during inode loading
      for fixing.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200421
      
      - Reproduce
      
      - POC (poc.c)
          #define _GNU_SOURCE
          #include <sys/types.h>
          #include <sys/mount.h>
          #include <sys/mman.h>
          #include <sys/stat.h>
          #include <sys/xattr.h>
      
          #include <dirent.h>
          #include <errno.h>
          #include <error.h>
          #include <fcntl.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
      
          #include <linux/falloc.h>
          #include <linux/loop.h>
      
          static void activity(char *mpoint) {
      
            char *foo_bar_baz;
            char *foo_baz;
            char *xattr;
            int err;
      
            err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
            err = asprintf(&foo_baz, "%s/foo/baz", mpoint);
            err = asprintf(&xattr, "%s/foo/bar/xattr", mpoint);
      
            rename(foo_bar_baz, foo_baz);
      
            char buf2[113];
            memset(buf2, 0, sizeof(buf2));
            listxattr(xattr, buf2, sizeof(buf2));
            removexattr(xattr, "user.mime_type");
      
          }
      
          int main(int argc, char *argv[]) {
            activity(argv[1]);
            return 0;
          }
      
      - Kernel message
      Umount the image will leave the following message
      [ 2910.995489] F2FS-fs (loop0): Mounted with checkpoint version = 2
      [ 2918.416465] ==================================================================
      [ 2918.416807] BUG: KASAN: slab-out-of-bounds in f2fs_iget+0xcb9/0x1a80
      [ 2918.417009] Read of size 4 at addr ffff88018efc2068 by task a.out/1229
      
      [ 2918.417311] CPU: 1 PID: 1229 Comm: a.out Not tainted 4.17.0+ #1
      [ 2918.417314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [ 2918.417323] Call Trace:
      [ 2918.417366]  dump_stack+0x71/0xab
      [ 2918.417401]  print_address_description+0x6b/0x290
      [ 2918.417407]  kasan_report+0x28e/0x390
      [ 2918.417411]  ? f2fs_iget+0xcb9/0x1a80
      [ 2918.417415]  f2fs_iget+0xcb9/0x1a80
      [ 2918.417422]  ? f2fs_lookup+0x2e7/0x580
      [ 2918.417425]  f2fs_lookup+0x2e7/0x580
      [ 2918.417433]  ? __recover_dot_dentries+0x400/0x400
      [ 2918.417447]  ? legitimize_path.isra.29+0x5a/0xa0
      [ 2918.417453]  __lookup_slow+0x11c/0x220
      [ 2918.417457]  ? may_delete+0x2a0/0x2a0
      [ 2918.417475]  ? deref_stack_reg+0xe0/0xe0
      [ 2918.417479]  ? __lookup_hash+0xb0/0xb0
      [ 2918.417483]  lookup_slow+0x3e/0x60
      [ 2918.417488]  walk_component+0x3ac/0x990
      [ 2918.417492]  ? generic_permission+0x51/0x1e0
      [ 2918.417495]  ? inode_permission+0x51/0x1d0
      [ 2918.417499]  ? pick_link+0x3e0/0x3e0
      [ 2918.417502]  ? link_path_walk+0x4b1/0x770
      [ 2918.417513]  ? _raw_spin_lock_irqsave+0x25/0x50
      [ 2918.417518]  ? walk_component+0x990/0x990
      [ 2918.417522]  ? path_init+0x2e6/0x580
      [ 2918.417526]  path_lookupat+0x13f/0x430
      [ 2918.417531]  ? trailing_symlink+0x3a0/0x3a0
      [ 2918.417534]  ? do_renameat2+0x270/0x7b0
      [ 2918.417538]  ? __kasan_slab_free+0x14c/0x190
      [ 2918.417541]  ? do_renameat2+0x270/0x7b0
      [ 2918.417553]  ? kmem_cache_free+0x85/0x1e0
      [ 2918.417558]  ? do_renameat2+0x270/0x7b0
      [ 2918.417563]  filename_lookup+0x13c/0x280
      [ 2918.417567]  ? filename_parentat+0x2b0/0x2b0
      [ 2918.417572]  ? kasan_unpoison_shadow+0x31/0x40
      [ 2918.417575]  ? kasan_kmalloc+0xa6/0xd0
      [ 2918.417593]  ? strncpy_from_user+0xaa/0x1c0
      [ 2918.417598]  ? getname_flags+0x101/0x2b0
      [ 2918.417614]  ? path_listxattr+0x87/0x110
      [ 2918.417619]  path_listxattr+0x87/0x110
      [ 2918.417623]  ? listxattr+0xc0/0xc0
      [ 2918.417637]  ? mm_fault_error+0x1b0/0x1b0
      [ 2918.417654]  do_syscall_64+0x73/0x160
      [ 2918.417660]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 2918.417676] RIP: 0033:0x7f2f3a3480d7
      [ 2918.417677] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48
      [ 2918.417732] RSP: 002b:00007fff4095b7d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000c2
      [ 2918.417744] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2f3a3480d7
      [ 2918.417746] RDX: 0000000000000071 RSI: 00007fff4095b810 RDI: 000000000126a0c0
      [ 2918.417749] RBP: 00007fff4095b890 R08: 000000000126a010 R09: 0000000000000000
      [ 2918.417751] R10: 00000000000001ab R11: 0000000000000206 R12: 00000000004005e0
      [ 2918.417753] R13: 00007fff4095b990 R14: 0000000000000000 R15: 0000000000000000
      
      [ 2918.417853] Allocated by task 329:
      [ 2918.418002]  kasan_kmalloc+0xa6/0xd0
      [ 2918.418007]  kmem_cache_alloc+0xc8/0x1e0
      [ 2918.418023]  mempool_init_node+0x194/0x230
      [ 2918.418027]  mempool_init+0x12/0x20
      [ 2918.418042]  bioset_init+0x2bd/0x380
      [ 2918.418052]  blk_alloc_queue_node+0xe9/0x540
      [ 2918.418075]  dm_create+0x2c0/0x800
      [ 2918.418080]  dev_create+0xd2/0x530
      [ 2918.418083]  ctl_ioctl+0x2a3/0x5b0
      [ 2918.418087]  dm_ctl_ioctl+0xa/0x10
      [ 2918.418092]  do_vfs_ioctl+0x13e/0x8c0
      [ 2918.418095]  ksys_ioctl+0x66/0x70
      [ 2918.418098]  __x64_sys_ioctl+0x3d/0x50
      [ 2918.418102]  do_syscall_64+0x73/0x160
      [ 2918.418106]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [ 2918.418204] Freed by task 0:
      [ 2918.418301] (stack is not available)
      
      [ 2918.418521] The buggy address belongs to the object at ffff88018efc0000
                      which belongs to the cache biovec-max of size 8192
      [ 2918.418894] The buggy address is located 104 bytes to the right of
                      8192-byte region [ffff88018efc0000, ffff88018efc2000)
      [ 2918.419257] The buggy address belongs to the page:
      [ 2918.419431] page:ffffea00063bf000 count:1 mapcount:0 mapping:ffff8801f2242540 index:0x0 compound_mapcount: 0
      [ 2918.419702] flags: 0x17fff8000008100(slab|head)
      [ 2918.419879] raw: 017fff8000008100 dead000000000100 dead000000000200 ffff8801f2242540
      [ 2918.420101] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
      [ 2918.420322] page dumped because: kasan: bad access detected
      
      [ 2918.420599] Memory state around the buggy address:
      [ 2918.420764]  ffff88018efc1f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 2918.420975]  ffff88018efc1f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 2918.421194] >ffff88018efc2000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 2918.421406]                                                           ^
      [ 2918.421627]  ffff88018efc2080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 2918.421838]  ffff88018efc2100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 2918.422046] ==================================================================
      [ 2918.422264] Disabling lock debugging due to kernel taint
      [ 2923.901641] BUG: unable to handle kernel paging request at ffff88018f0db000
      [ 2923.901884] PGD 22226a067 P4D 22226a067 PUD 222273067 PMD 18e642063 PTE 800000018f0db061
      [ 2923.902120] Oops: 0003 [#1] SMP KASAN PTI
      [ 2923.902274] CPU: 1 PID: 1231 Comm: umount Tainted: G    B             4.17.0+ #1
      [ 2923.902490] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [ 2923.902761] RIP: 0010:__memset+0x24/0x30
      [ 2923.902906] Code: 90 90 90 90 90 90 66 66 90 66 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
      [ 2923.903446] RSP: 0018:ffff88018ddf7ae0 EFLAGS: 00010206
      [ 2923.903622] RAX: 0000000000000000 RBX: ffff8801d549d888 RCX: 1ffffffffffdaffb
      [ 2923.903833] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018f0daffc
      [ 2923.904062] RBP: ffff88018efc206c R08: 1ffff10031df840d R09: ffff88018efc206c
      [ 2923.904273] R10: ffffffffffffe1ee R11: ffffed0031df65fa R12: 0000000000000000
      [ 2923.904485] R13: ffff8801d549dc98 R14: 00000000ffffc3db R15: ffffea00063bec80
      [ 2923.904693] FS:  00007fa8b2f8a840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
      [ 2923.904937] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2923.910080] CR2: ffff88018f0db000 CR3: 000000018f892000 CR4: 00000000000006e0
      [ 2923.914930] Call Trace:
      [ 2923.919724]  f2fs_truncate_inline_inode+0x114/0x170
      [ 2923.924487]  f2fs_truncate_blocks+0x11b/0x7c0
      [ 2923.929178]  ? f2fs_truncate_data_blocks+0x10/0x10
      [ 2923.933834]  ? dqget+0x670/0x670
      [ 2923.938437]  ? f2fs_destroy_extent_tree+0xd6/0x270
      [ 2923.943107]  ? __radix_tree_lookup+0x2f/0x150
      [ 2923.947772]  f2fs_truncate+0xd4/0x1a0
      [ 2923.952491]  f2fs_evict_inode+0x5ab/0x610
      [ 2923.957204]  evict+0x15f/0x280
      [ 2923.961898]  __dentry_kill+0x161/0x250
      [ 2923.966634]  shrink_dentry_list+0xf3/0x250
      [ 2923.971897]  shrink_dcache_parent+0xa9/0x100
      [ 2923.976561]  ? shrink_dcache_sb+0x1f0/0x1f0
      [ 2923.981177]  ? wait_for_completion+0x8a/0x210
      [ 2923.985781]  ? migrate_swap_stop+0x2d0/0x2d0
      [ 2923.990332]  do_one_tree+0xe/0x40
      [ 2923.994735]  shrink_dcache_for_umount+0x3a/0xa0
      [ 2923.999077]  generic_shutdown_super+0x3e/0x1c0
      [ 2924.003350]  kill_block_super+0x4b/0x70
      [ 2924.007619]  deactivate_locked_super+0x65/0x90
      [ 2924.011812]  cleanup_mnt+0x5c/0xa0
      [ 2924.015995]  task_work_run+0xce/0xf0
      [ 2924.020174]  exit_to_usermode_loop+0x115/0x120
      [ 2924.024293]  do_syscall_64+0x12f/0x160
      [ 2924.028479]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 2924.032709] RIP: 0033:0x7fa8b2868487
      [ 2924.036888] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48
      [ 2924.045750] RSP: 002b:00007ffc39824d58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      [ 2924.050190] RAX: 0000000000000000 RBX: 00000000008ea030 RCX: 00007fa8b2868487
      [ 2924.054604] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000008f4360
      [ 2924.058940] RBP: 00000000008f4360 R08: 0000000000000000 R09: 0000000000000014
      [ 2924.063186] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007fa8b2d7183c
      [ 2924.067418] R13: 0000000000000000 R14: 00000000008ea210 R15: 00007ffc39824fe0
      [ 2924.071534] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy
      [ 2924.098044] CR2: ffff88018f0db000
      [ 2924.102520] ---[ end trace a8e0d899985faf31 ]---
      [ 2924.107012] RIP: 0010:__memset+0x24/0x30
      [ 2924.111448] Code: 90 90 90 90 90 90 66 66 90 66 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
      [ 2924.120724] RSP: 0018:ffff88018ddf7ae0 EFLAGS: 00010206
      [ 2924.125312] RAX: 0000000000000000 RBX: ffff8801d549d888 RCX: 1ffffffffffdaffb
      [ 2924.129931] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018f0daffc
      [ 2924.134537] RBP: ffff88018efc206c R08: 1ffff10031df840d R09: ffff88018efc206c
      [ 2924.139175] R10: ffffffffffffe1ee R11: ffffed0031df65fa R12: 0000000000000000
      [ 2924.143825] R13: ffff8801d549dc98 R14: 00000000ffffc3db R15: ffffea00063bec80
      [ 2924.148500] FS:  00007fa8b2f8a840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000
      [ 2924.153247] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 2924.158003] CR2: ffff88018f0db000 CR3: 000000018f892000 CR4: 00000000000006e0
      [ 2924.164641] BUG: Bad rss-counter state mm:00000000fa04621e idx:0 val:4
      [ 2924.170007] BUG: Bad rss-counter
      tate mm:00000000fa04621e idx:1 val:2
      
      - Location
      https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/inline.c#L78
      	memset(addr + from, 0, MAX_INLINE_DATA(inode) - from);
      Here the length can be negative.
      
      Reported-by Wen Xu <wen.xu@gatech.edu>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      18dd6470
    • Yunlong Song's avatar
      f2fs: blk_finish_plug of submit_bio in lfs mode · 66415cee
      Yunlong Song authored
      Expand the blk_finish_plug action from blkzoned to normal lfs mode,
      since plug will cause the out-of-order IO submission, which is not
      friendly to flash in lfs mode.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      66415cee
    • Yunlong Song's avatar
      f2fs: do not set free of current section · 3611ce99
      Yunlong Song authored
      For the case when sbi->segs_per_sec > 1, take section:segment = 5 for
      example, if segment 1 is just used and allocate new segment 2, and the
      blocks of segment 1 is invalidated, at this time, the previous code will
      use __set_test_and_free to free the free_secmap and free_sections++,
      this is not correct since it is still a current section, so fix it.
      Signed-off-by: default avatarYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      3611ce99
    • Daniel Rosenberg's avatar
      f2fs: Keep alloc_valid_block_count in sync · 36b877af
      Daniel Rosenberg authored
      If we attempt to request more blocks than we have room for, we try to
      instead request as much as we can, however, alloc_valid_block_count
      is not decremented to match the new value, allowing it to drift higher
      until the next checkpoint. This always decrements it when the requested
      amount cannot be fulfilled.
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      36b877af
    • Chao Yu's avatar
      f2fs: issue small discard by LBA order · 20ee4382
      Chao Yu authored
      For small granularity discard which size is smaller than 64KB, if we
      issue those kind of discards orderly by size, their IOs will be spread
      into entire logical address, so that in FTL, L2P table will be updated
      randomly, result bad wear rate in the table.
      
      In this patch, we choose to issue small discard by LBA order, by this
      way, we can expect that L2P table updates from adjacent discard IOs can
      be merged in the cache, so it can reduce lifetime wearing of flash.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      20ee4382
    • Chao Yu's avatar
      f2fs: stop issuing discard immediately if there is queued IO · 522d1711
      Chao Yu authored
      For background discard policy, even if there is queued user IO, still
      we will check max_requests times for next discard entry, it is unneeded,
      let's just stop this round submission immediately.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      522d1711
    • Chao Yu's avatar
      f2fs: clean up with IS_INODE() · 4c6b56c0
      Chao Yu authored
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4c6b56c0
    • Chao Yu's avatar
      f2fs: detect bug_on in f2fs_wait_discard_bios · 2482c432
      Chao Yu authored
      Add bug_on to detect potential non-empty discard wait list.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2482c432
    • Randy Dunlap's avatar
      f2fs: fix defined but not used build warnings · cb15d1e4
      Randy Dunlap authored
      Fix build warnings in f2fs when CONFIG_PROC_FS is not enabled
      by marking the unused functions as __maybe_unused.
      
      ../fs/f2fs/sysfs.c:519:12: warning: 'segment_info_seq_show' defined but not used [-Wunused-function]
      ../fs/f2fs/sysfs.c:546:12: warning: 'segment_bits_seq_show' defined but not used [-Wunused-function]
      ../fs/f2fs/sysfs.c:570:12: warning: 'iostat_info_seq_show' defined but not used [-Wunused-function]
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Chao Yu <yuchao0@huawei.com>
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      cb15d1e4
    • Chao Yu's avatar
      f2fs: enable real-time discard by default · a39e5365
      Chao Yu authored
      f2fs is focused on flash based storage, so let's enable real-time
      discard by default, if user don't want to enable it, 'nodiscard'
      mount option should be used on mount.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a39e5365
    • Chao Yu's avatar
      f2fs: fix to detect looped node chain correctly · 82902c06
      Chao Yu authored
      Below dmesg was printed when testing generic/388 of fstest:
      
      F2FS-fs (zram1): find_fsync_dnodes: detect looped node chain, blkaddr:526615, next:526616
      F2FS-fs (zram1): Cannot recover all fsync data errno=-22
      F2FS-fs (zram1): Mounted with checkpoint version = 22300d0e
      F2FS-fs (zram1): find_fsync_dnodes: detect looped node chain, blkaddr:526615, next:526616
      F2FS-fs (zram1): Cannot recover all fsync data errno=-22
      
      The reason is that we initialize free_blocks with free blocks of
      filesystem, so if filesystem is full, free_blocks can be zero,
      below condition will be true, so that, it will fail recovery.
      
      if (++loop_cnt >= free_blocks ||
      	blkaddr == next_blkaddr_of_node(page))
      
      To fix this issue, initialize free_blocks with correct value which
      includes over-privision blocks.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      82902c06
    • Chao Yu's avatar
      f2fs: fix to do sanity check with block address in main area · c9b60788
      Chao Yu authored
      This patch add to do sanity check with below field:
      - cp_pack_total_block_count
      - blkaddr of data/node
      - extent info
      
      - Overview
      BUG() in verify_block_addr() when writing to a corrupted f2fs image
      
      - Reproduce (4.18 upstream kernel)
      
      - POC (poc.c)
      
      static void activity(char *mpoint) {
      
        char *foo_bar_baz;
        int err;
      
        static int buf[8192];
        memset(buf, 0, sizeof(buf));
      
        err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
      
        int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777);
        if (fd >= 0) {
          write(fd, (char *)buf, sizeof(buf));
          fdatasync(fd);
          close(fd);
        }
      }
      
      int main(int argc, char *argv[]) {
        activity(argv[1]);
        return 0;
      }
      
      - Kernel message
      [  689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3
      [  699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240
      [  699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4
      [  699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240
      [  699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff <0f> 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d
      [  699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202
      [  699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113
      [  699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540
      [  699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8
      [  699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540
      [  699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450
      [  699.729154] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.729156] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.729171] Call Trace:
      [  699.729192]  f2fs_do_write_data_page+0x2e2/0xe00
      [  699.729203]  ? f2fs_should_update_outplace+0xd0/0xd0
      [  699.729238]  ? memcg_drain_all_list_lrus+0x280/0x280
      [  699.729269]  ? __radix_tree_replace+0xa3/0x120
      [  699.729276]  __write_data_page+0x5c7/0xe30
      [  699.729291]  ? kasan_check_read+0x11/0x20
      [  699.729310]  ? page_mapped+0x8a/0x110
      [  699.729321]  ? page_mkclean+0xe9/0x160
      [  699.729327]  ? f2fs_do_write_data_page+0xe00/0xe00
      [  699.729331]  ? invalid_page_referenced_vma+0x130/0x130
      [  699.729345]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.729351]  f2fs_write_cache_pages+0x4ca/0x860
      [  699.729358]  ? __write_data_page+0xe30/0xe30
      [  699.729374]  ? percpu_counter_add_batch+0x22/0xa0
      [  699.729380]  ? kasan_check_write+0x14/0x20
      [  699.729391]  ? _raw_spin_lock+0x17/0x40
      [  699.729403]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
      [  699.729413]  ? iov_iter_advance+0x113/0x640
      [  699.729418]  ? f2fs_write_end+0x133/0x2e0
      [  699.729423]  ? balance_dirty_pages_ratelimited+0x239/0x640
      [  699.729428]  f2fs_write_data_pages+0x329/0x520
      [  699.729433]  ? generic_perform_write+0x250/0x320
      [  699.729438]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.729454]  ? current_time+0x110/0x110
      [  699.729459]  ? f2fs_preallocate_blocks+0x1ef/0x370
      [  699.729464]  do_writepages+0x37/0xb0
      [  699.729468]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.729472]  ? do_writepages+0x37/0xb0
      [  699.729478]  __filemap_fdatawrite_range+0x19a/0x1f0
      [  699.729483]  ? delete_from_page_cache_batch+0x4e0/0x4e0
      [  699.729496]  ? __vfs_write+0x2b2/0x410
      [  699.729501]  file_write_and_wait_range+0x66/0xb0
      [  699.729506]  f2fs_do_sync_file+0x1f9/0xd90
      [  699.729511]  ? truncate_partial_data_page+0x290/0x290
      [  699.729521]  ? __sb_end_write+0x30/0x50
      [  699.729526]  ? vfs_write+0x20f/0x260
      [  699.729530]  f2fs_sync_file+0x9a/0xb0
      [  699.729534]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.729548]  vfs_fsync_range+0x68/0x100
      [  699.729554]  ? __fget_light+0xc9/0xe0
      [  699.729558]  do_fsync+0x3d/0x70
      [  699.729562]  __x64_sys_fdatasync+0x24/0x30
      [  699.729585]  do_syscall_64+0x78/0x170
      [  699.729595]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  699.729613] RIP: 0033:0x7f9bf930d800
      [  699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
      [  699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      [  699.729687] ---[ end trace 4ce02f25ff7d3df5 ]---
      [  699.729782] ------------[ cut here ]------------
      [  699.729785] kernel BUG at fs/f2fs/segment.h:654!
      [  699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI
      [  699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G        W         4.18.0-rc1+ #4
      [  699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
      [  699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
      [  699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
      [  699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
      [  699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
      [  699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
      [  699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
      [  699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
      [  699.748683] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.750293] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.752874] Call Trace:
      [  699.753386]  ? f2fs_inplace_write_data+0x93/0x240
      [  699.754341]  f2fs_inplace_write_data+0xd2/0x240
      [  699.755271]  f2fs_do_write_data_page+0x2e2/0xe00
      [  699.756214]  ? f2fs_should_update_outplace+0xd0/0xd0
      [  699.757215]  ? memcg_drain_all_list_lrus+0x280/0x280
      [  699.758209]  ? __radix_tree_replace+0xa3/0x120
      [  699.759164]  __write_data_page+0x5c7/0xe30
      [  699.760002]  ? kasan_check_read+0x11/0x20
      [  699.760823]  ? page_mapped+0x8a/0x110
      [  699.761573]  ? page_mkclean+0xe9/0x160
      [  699.762345]  ? f2fs_do_write_data_page+0xe00/0xe00
      [  699.763332]  ? invalid_page_referenced_vma+0x130/0x130
      [  699.764374]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.765347]  f2fs_write_cache_pages+0x4ca/0x860
      [  699.766276]  ? __write_data_page+0xe30/0xe30
      [  699.767161]  ? percpu_counter_add_batch+0x22/0xa0
      [  699.768112]  ? kasan_check_write+0x14/0x20
      [  699.768951]  ? _raw_spin_lock+0x17/0x40
      [  699.769739]  ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30
      [  699.770885]  ? iov_iter_advance+0x113/0x640
      [  699.771743]  ? f2fs_write_end+0x133/0x2e0
      [  699.772569]  ? balance_dirty_pages_ratelimited+0x239/0x640
      [  699.773680]  f2fs_write_data_pages+0x329/0x520
      [  699.774603]  ? generic_perform_write+0x250/0x320
      [  699.775544]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.776510]  ? current_time+0x110/0x110
      [  699.777299]  ? f2fs_preallocate_blocks+0x1ef/0x370
      [  699.778279]  do_writepages+0x37/0xb0
      [  699.779026]  ? f2fs_write_cache_pages+0x860/0x860
      [  699.779978]  ? do_writepages+0x37/0xb0
      [  699.780755]  __filemap_fdatawrite_range+0x19a/0x1f0
      [  699.781746]  ? delete_from_page_cache_batch+0x4e0/0x4e0
      [  699.782820]  ? __vfs_write+0x2b2/0x410
      [  699.783597]  file_write_and_wait_range+0x66/0xb0
      [  699.784540]  f2fs_do_sync_file+0x1f9/0xd90
      [  699.785381]  ? truncate_partial_data_page+0x290/0x290
      [  699.786415]  ? __sb_end_write+0x30/0x50
      [  699.787204]  ? vfs_write+0x20f/0x260
      [  699.787941]  f2fs_sync_file+0x9a/0xb0
      [  699.788694]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.789572]  vfs_fsync_range+0x68/0x100
      [  699.790360]  ? __fget_light+0xc9/0xe0
      [  699.791128]  do_fsync+0x3d/0x70
      [  699.791779]  __x64_sys_fdatasync+0x24/0x30
      [  699.792614]  do_syscall_64+0x78/0x170
      [  699.793371]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  699.794406] RIP: 0033:0x7f9bf930d800
      [  699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24
      [  699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      [  699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  699.817079] ---[ end trace 4ce02f25ff7d3df6 ]---
      [  699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730
      [  699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0
      [  699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283
      [  699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef
      [  699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c
      [  699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55
      [  699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940
      [  699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001
      [  699.831192] FS:  00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  699.832793] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0
      [  699.835556] ==================================================================
      [  699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0
      [  699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309
      
      [  699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G      D W         4.18.0-rc1+ #4
      [  699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  699.843475] Call Trace:
      [  699.843982]  dump_stack+0x7b/0xb5
      [  699.844661]  print_address_description+0x70/0x290
      [  699.845607]  kasan_report+0x291/0x390
      [  699.846351]  ? update_stack_state+0x38c/0x3e0
      [  699.853831]  __asan_load8+0x54/0x90
      [  699.854569]  update_stack_state+0x38c/0x3e0
      [  699.855428]  ? __read_once_size_nocheck.constprop.7+0x20/0x20
      [  699.856601]  ? __save_stack_trace+0x5e/0x100
      [  699.857476]  unwind_next_frame.part.5+0x18e/0x490
      [  699.858448]  ? unwind_dump+0x290/0x290
      [  699.859217]  ? clear_page_dirty_for_io+0x332/0x450
      [  699.860185]  __unwind_start+0x106/0x190
      [  699.860974]  __save_stack_trace+0x5e/0x100
      [  699.861808]  ? __save_stack_trace+0x5e/0x100
      [  699.862691]  ? unlink_anon_vmas+0xba/0x2c0
      [  699.863525]  save_stack_trace+0x1f/0x30
      [  699.864312]  save_stack+0x46/0xd0
      [  699.864993]  ? __alloc_pages_slowpath+0x1420/0x1420
      [  699.865990]  ? flush_tlb_mm_range+0x15e/0x220
      [  699.866889]  ? kasan_check_write+0x14/0x20
      [  699.867724]  ? __dec_node_state+0x92/0xb0
      [  699.868543]  ? lock_page_memcg+0x85/0xf0
      [  699.869350]  ? unlock_page_memcg+0x16/0x80
      [  699.870185]  ? page_remove_rmap+0x198/0x520
      [  699.871048]  ? mark_page_accessed+0x133/0x200
      [  699.871930]  ? _cond_resched+0x1a/0x50
      [  699.872700]  ? unmap_page_range+0xcd4/0xe50
      [  699.873551]  ? rb_next+0x58/0x80
      [  699.874217]  ? rb_next+0x58/0x80
      [  699.874895]  __kasan_slab_free+0x13c/0x1a0
      [  699.875734]  ? unlink_anon_vmas+0xba/0x2c0
      [  699.876563]  kasan_slab_free+0xe/0x10
      [  699.877315]  kmem_cache_free+0x89/0x1e0
      [  699.878095]  unlink_anon_vmas+0xba/0x2c0
      [  699.878913]  free_pgtables+0x101/0x1b0
      [  699.879677]  exit_mmap+0x146/0x2a0
      [  699.880378]  ? __ia32_sys_munmap+0x50/0x50
      [  699.881214]  ? kasan_check_read+0x11/0x20
      [  699.882052]  ? mm_update_next_owner+0x322/0x380
      [  699.882985]  mmput+0x8b/0x1d0
      [  699.883602]  do_exit+0x43a/0x1390
      [  699.884288]  ? mm_update_next_owner+0x380/0x380
      [  699.885212]  ? f2fs_sync_file+0x9a/0xb0
      [  699.885995]  ? f2fs_do_sync_file+0xd90/0xd90
      [  699.886877]  ? vfs_fsync_range+0x68/0x100
      [  699.887694]  ? __fget_light+0xc9/0xe0
      [  699.888442]  ? do_fsync+0x3d/0x70
      [  699.889118]  ? __x64_sys_fdatasync+0x24/0x30
      [  699.889996]  rewind_stack_do_exit+0x17/0x20
      [  699.890860] RIP: 0033:0x7f9bf930d800
      [  699.891585] Code: Bad RIP value.
      [  699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
      [  699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800
      [  699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003
      [  699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000
      [  699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610
      [  699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
      
      [  699.901241] The buggy address belongs to the page:
      [  699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [  699.903811] flags: 0x2ffff0000000000()
      [  699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000
      [  699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000
      [  699.907673] page dumped because: kasan: bad access detected
      
      [  699.909108] Memory state around the buggy address:
      [  699.910077]  ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00
      [  699.911528]  ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2
      [  699.914392]                                                              ^
      [  699.915758]  ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00
      [  699.917193]  ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00
      [  699.918634] ==================================================================
      
      - Location
      https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644
      
      Reported-by Wen Xu <wen.xu@gatech.edu>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c9b60788
  3. 29 Jul, 2018 4 commits
    • Chao Yu's avatar
      f2fs: fix to skip GC if type in SSA and SIT is inconsistent · 10d255c3
      Chao Yu authored
      If segment type in SSA and SIT is inconsistent, we will encounter below
      BUG_ON during GC, to avoid this panic, let's just skip doing GC on such
      segment.
      
      The bug is triggered with image reported in below link:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200223
      
      [  388.060262] ------------[ cut here ]------------
      [  388.060268] kernel BUG at /home/y00370721/git/devf2fs/gc.c:989!
      [  388.061172] invalid opcode: 0000 [#1] SMP
      [  388.061773] Modules linked in: f2fs(O) bluetooth ecdh_generic xt_tcpudp iptable_filter ip_tables x_tables lp ttm drm_kms_helper drm intel_rapl sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel fb_sys_fops ppdev aes_x86_64 syscopyarea crypto_simd sysfillrect parport_pc joydev sysimgblt glue_helper parport cryptd i2c_piix4 serio_raw mac_hid btrfs hid_generic usbhid hid raid6_pq psmouse pata_acpi floppy
      [  388.064247] CPU: 7 PID: 4151 Comm: f2fs_gc-7:0 Tainted: G           O    4.13.0-rc1+ #26
      [  388.065306] Hardware name: Xen HVM domU, BIOS 4.1.2_115-900.260_ 11/06/2015
      [  388.066058] task: ffff880201583b80 task.stack: ffffc90004d7c000
      [  388.069948] RIP: 0010:do_garbage_collect+0xcc8/0xcd0 [f2fs]
      [  388.070766] RSP: 0018:ffffc90004d7fc68 EFLAGS: 00010202
      [  388.071783] RAX: ffff8801ed227000 RBX: 0000000000000001 RCX: ffffea0007b489c0
      [  388.072700] RDX: ffff880000000000 RSI: 0000000000000001 RDI: ffffea0007b489c0
      [  388.073607] RBP: ffffc90004d7fd58 R08: 0000000000000003 R09: ffffea0007b489dc
      [  388.074619] R10: 0000000000000000 R11: 0052782ab317138d R12: 0000000000000018
      [  388.075625] R13: 0000000000000018 R14: ffff880211ceb000 R15: ffff880211ceb000
      [  388.076687] FS:  0000000000000000(0000) GS:ffff880214fc0000(0000) knlGS:0000000000000000
      [  388.083277] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  388.084536] CR2: 0000000000e18c60 CR3: 00000001ecf2e000 CR4: 00000000001406e0
      [  388.085748] Call Trace:
      [  388.086690]  ? find_next_bit+0xb/0x10
      [  388.088091]  f2fs_gc+0x1a8/0x9d0 [f2fs]
      [  388.088888]  ? lock_timer_base+0x7d/0xa0
      [  388.090213]  ? try_to_del_timer_sync+0x44/0x60
      [  388.091698]  gc_thread_func+0x342/0x4b0 [f2fs]
      [  388.092892]  ? wait_woken+0x80/0x80
      [  388.094098]  kthread+0x109/0x140
      [  388.095010]  ? f2fs_gc+0x9d0/0x9d0 [f2fs]
      [  388.096043]  ? kthread_park+0x60/0x60
      [  388.097281]  ret_from_fork+0x25/0x30
      [  388.098401] Code: ff ff 48 83 e8 01 48 89 44 24 58 e9 27 f8 ff ff 48 83 e8 01 e9 78 fc ff ff 48 8d 78 ff e9 17 fb ff ff 48 83 ef 01 e9 4d f4 ff ff <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
      [  388.100864] RIP: do_garbage_collect+0xcc8/0xcd0 [f2fs] RSP: ffffc90004d7fc68
      [  388.101810] ---[ end trace 81c73d6e6b7da61d ]---
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      10d255c3
    • Chao Yu's avatar
      f2fs: try grabbing node page lock aggressively in sync scenario · 4b270a8c
      Chao Yu authored
      In synchronous scenario, like in checkpoint(), we are going to flush
      dirty node pages to device synchronously, we can easily failed
      writebacking node page due to trylock_page() failure, especially in
      condition of intensive lock competition, which can cause long latency
      of checkpoint(). So let's use lock_page() in synchronous scenario to
      avoid this issue.
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4b270a8c
    • Sahitya Tummala's avatar
      f2fs: show the fsync_mode=nobarrier mount option · dc132802
      Sahitya Tummala authored
      This patch shows the fsync_mode=nobarrier mount option in
      f2fs_show_options().
      Signed-off-by: default avatarSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      dc132802
    • Yunlei He's avatar
      f2fs: check the right return value of memory alloc function · 68c43a23
      Yunlei He authored
      This patch check the right return value of memory alloc function
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      68c43a23