1. 26 Jan, 2019 16 commits
    • Jaegeuk Kim's avatar
      f2fs: put directory inodes before checkpoint in roll-forward recovery · 9213c2b5
      Jaegeuk Kim authored
      commit 9e1e6df4 upstream.
      
      Before checkpoint, we'd be better drop any inodes.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9213c2b5
    • Tiezhu Yang's avatar
      f2fs: introduce get_checkpoint_version for cleanup · 2f958b8e
      Tiezhu Yang authored
      commit fc0065ad upstream.
      
      There exists almost same codes when get the value of pre_version
      and cur_version in function validate_checkpoint, this patch adds
      get_checkpoint_version to clean up redundant codes.
      Signed-off-by: default avatarTiezhu Yang <kernelpatch@126.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      [bwh: Backported to 4.4: f2fs_crc_valid() doesn't take an f2fs_sb_info pointer]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f958b8e
    • Jaegeuk Kim's avatar
      f2fs: use crc and cp version to determine roll-forward recovery · 65b9d532
      Jaegeuk Kim authored
      commit a468f0ef upstream.
      
      Previously, we used cp_version only to detect recoverable dnodes.
      In order to avoid same garbage cp_version, we needed to truncate the next
      dnode during checkpoint, resulting in additional discard or data write.
      If we can distinguish this by using crc in addition to cp_version, we can
      remove this overhead.
      
      There is backward compatibility concern where it changes node_footer layout.
      So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
      detect new layout. New layout will be activated only when this flag is set.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      [bwh: Backported to 4.4:
       - Deleted code is slightly different
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      65b9d532
    • Chao Yu's avatar
      f2fs: avoid unneeded loop in build_sit_entries · 5dfb9eb6
      Chao Yu authored
      commit d600af23 upstream.
      
      When building each sit entry in cache, firstly, we will load it from
      sit page, and then check all entries in sit journal, if there is one
      updated entry in journal, cover cached entry with the journaled one.
      
      Actually, most of check operation is unneeded since we only need
      to update cached entries with journaled entries in batch, so
      changing the flow as below for more efficient:
      1. load all sit entries into cache from sit pages;
      2. update sit entries with journal.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      [bwh: Backported to 4.4:
       - Keep using curseg->curseg_mutex for serialisation
       - Use sum instead of journal
       - Don't add f2fs_discard_en() condition]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5dfb9eb6
    • Yunlei He's avatar
      f2fs: not allow to write illegal blkaddr · 6ef26eb1
      Yunlei He authored
      commit bb413d6a upstream.
      
      we came across an error as below:
      
      [build_nat_area_bitmap:1710] nid[0x    1718] addr[0x         1c18ddc] ino[0x    1718]
      [build_nat_area_bitmap:1710] nid[0x    1719] addr[0x         1c193d5] ino[0x    1719]
      [build_nat_area_bitmap:1710] nid[0x    171a] addr[0x         1c1736e] ino[0x    171a]
      [build_nat_area_bitmap:1710] nid[0x    171b] addr[0x        58b3ee8f] ino[0x815f92ed]
      [build_nat_area_bitmap:1710] nid[0x    171c] addr[0x         fcdc94b] ino[0x49366377]
      [build_nat_area_bitmap:1710] nid[0x    171d] addr[0x        7cd2facf] ino[0xb3c55300]
      [build_nat_area_bitmap:1710] nid[0x    171e] addr[0x        bd4e25d0] ino[0x77c34c09]
      
      ... ...
      
      [build_nat_area_bitmap:1710] nid[0x    1718] addr[0x         1c18ddc] ino[0x    1718]
      [build_nat_area_bitmap:1710] nid[0x    1719] addr[0x         1c193d5] ino[0x    1719]
      [build_nat_area_bitmap:1710] nid[0x    171a] addr[0x         1c1736e] ino[0x    171a]
      [build_nat_area_bitmap:1710] nid[0x    171b] addr[0x        58b3ee8f] ino[0x815f92ed]
      [build_nat_area_bitmap:1710] nid[0x    171c] addr[0x         fcdc94b] ino[0x49366377]
      [build_nat_area_bitmap:1710] nid[0x    171d] addr[0x        7cd2facf] ino[0xb3c55300]
      [build_nat_area_bitmap:1710] nid[0x    171e] addr[0x        bd4e25d0] ino[0x77c34c09]
      
      One nat block may be stepped by a data block, so this patch forbid to
      write if the blkaddr is illegal
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ef26eb1
    • Chao Yu's avatar
      f2fs: fix to avoid reading out encrypted data in page cache · 70c35785
      Chao Yu authored
      commit 78682f79 upstream.
      
      For encrypted inode, if user overwrites data of the inode, f2fs will read
      encrypted data into page cache, and then do the decryption.
      
      However reader can race with overwriter, and it will see encrypted data
      which has not been decrypted by overwriter yet. Fix it by moving decrypting
      work to background and keep page non-uptodated until data is decrypted.
      
      Thread A				Thread B
      - f2fs_file_write_iter
       - __generic_file_write_iter
        - generic_perform_write
         - f2fs_write_begin
          - f2fs_submit_page_bio
      					- generic_file_read_iter
      					 - do_generic_file_read
      					  - lock_page_killable
      					  - unlock_page
      					  - copy_page_to_iter
      					  hit the encrypted data in updated page
          - lock_page
          - fscrypt_decrypt_page
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      [bwh: Backported to 4.4:
       - Keep using f2fs_crypto functions instead of generic fscrypt API
       - Use PAGE_CACHE_SIZE instead of PAGE_SIZE
       - Use submit_bio() instead of __submit_bio()
       - In f2fs_write_begin(), use dn.data_blkaddr instead of blkaddr
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70c35785
    • Chao Yu's avatar
      f2fs: fix inode cache leak · 4aa4ce1c
      Chao Yu authored
      commit f61cce5b upstream.
      
      When testing f2fs with inline_dentry option, generic/342 reports:
      VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds.  Have a nice day...
      
      After rmmod f2fs module, kenrel shows following dmesg:
       =============================================================================
       BUG f2fs_inode_cache (Tainted: G           O   ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       Disabling lock debugging due to kernel taint
       INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
       CPU: 3 PID: 7455 Comm: rmmod Tainted: G    B      O    4.6.0-rc4+ #16
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
        00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
        c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
        616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
       Call Trace:
        [<c13a83a0>] dump_stack+0x5f/0x8f
        [<c11c7276>] slab_err+0x76/0x80
        [<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
        [<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
        [<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
        [<c1198a38>] kmem_cache_destroy+0x158/0x1f0
        [<c176b43d>] ? mutex_unlock+0xd/0x10
        [<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
        [<c10f596c>] SyS_delete_module+0x16c/0x1d0
        [<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
        [<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
        [<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
        [<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
        [<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
        [<c176d888>] sysenter_past_esp+0x45/0x74
       INFO: Object 0xd1e6d9e0 @offset=6624
       kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
       CPU: 3 PID: 7455 Comm: rmmod Tainted: G    B      O    4.6.0-rc4+ #16
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
        00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
        c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
        f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
       Call Trace:
        [<c13a83a0>] dump_stack+0x5f/0x8f
        [<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
        [<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
        [<c10f596c>] SyS_delete_module+0x16c/0x1d0
        [<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
        [<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
        [<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
        [<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
        [<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
        [<c176d888>] sysenter_past_esp+0x45/0x74
      
      The reason is: in recovery flow, we use delayed iput mechanism for directory
      which has recovered dentry block. It means the reference of inode will be
      held until last dirty dentry page being writebacked.
      
      But when we mount f2fs with inline_dentry option, during recovery, dirent
      may only be recovered into dir inode page rather than dentry page, so there
      are no chance for us to release inode reference in ->writepage when
      writebacking last dentry page.
      
      We can call paired iget/iput explicityly for inline_dentry case, but for
      non-inline_dentry case, iput will call writeback_single_inode to write all
      data pages synchronously, but during recovery, ->writepages of f2fs skips
      writing all pages, result in losing dirent.
      
      This patch fixes this issue by obsoleting old mechanism, and introduce a
      new dir_list to hold all directory inodes which has recovered datas until
      finishing recovery.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      [bwh: Backported to 4.4:
       - Deleted add_dirty_dir_inode() function is different
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4aa4ce1c
    • Chao Yu's avatar
      f2fs: factor out fsync inode entry operations · 6d07c0f4
      Chao Yu authored
      commit 3f8ab270 upstream.
      
      Factor out fsync inode entry operations into {add,del}_fsync_inode.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d07c0f4
    • Jaegeuk Kim's avatar
      f2fs: remove an obsolete variable · 8f7c4fb9
      Jaegeuk Kim authored
      commit fb58ae22 upstream.
      
      This patch removes an obsolete variable used in add_free_nid.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      [bwh: Picked as dependency of commit 30a61ddf "f2fs: fix race condition
       in between free nid allocator/initializer"]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f7c4fb9
    • Jaegeuk Kim's avatar
      f2fs: give -EINVAL for norecovery and rw mount · 1499d39b
      Jaegeuk Kim authored
      commit 6781eabb upstream.
      
      Once detecting something to recover, f2fs should stop mounting, given norecovery
      and rw mount options.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1499d39b
    • Chao Yu's avatar
      f2fs: fix to convert inline directory correctly · 523972a6
      Chao Yu authored
      With below serials, we will lose parts of dirents:
      
      1) mount f2fs with inline_dentry option
      2) echo 1 > /sys/fs/f2fs/sdX/dir_level
      3) mkdir dir
      4) touch 180 files named [1-180] in dir
      5) touch 181 in dir
      6) echo 3 > /proc/sys/vm/drop_caches
      7) ll dir
      
      ls: cannot access 2: No such file or directory
      ls: cannot access 4: No such file or directory
      ls: cannot access 5: No such file or directory
      ls: cannot access 6: No such file or directory
      ls: cannot access 8: No such file or directory
      ls: cannot access 9: No such file or directory
      ...
      total 360
      drwxr-xr-x 2 root root 4096 Feb 19 15:12 ./
      drwxr-xr-x 3 root root 4096 Feb 19 15:11 ../
      -rw-r--r-- 1 root root    0 Feb 19 15:12 1
      -rw-r--r-- 1 root root    0 Feb 19 15:12 10
      -rw-r--r-- 1 root root    0 Feb 19 15:12 100
      -????????? ? ?    ?       ?            ? 101
      -????????? ? ?    ?       ?            ? 102
      -????????? ? ?    ?       ?            ? 103
      ...
      
      The reason is: when doing the inline dir conversion, we didn't consider
      that directory has hierarchical hash structure which can be configured
      through sysfs interface 'dir_level'.
      
      By default, dir_level of directory inode is 0, it means we have one bucket
      in hash table located in first level, all dirents will be hashed in this
      bucket, so it has no problem for us to do the duplication simply between
      inline dentry page and converted normal dentry page.
      
      However, if we configured dir_level with the value N (greater than 0), it
      will expand the bucket number of first level hash table by 2^N - 1, it
      hashs dirents into different buckets according their hash value, if we
      still move all dirents to first bucket, it makes incorrent locating for
      inline dirents, the result is, although we can iterate all dirents through
      ->readdir, we can't stat some of them in ->lookup which based on hash
      table searching.
      
      This patch fixes this issue by rehashing dirents into correct position
      when converting inline directory.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      [bwh: Backported to 4.4:
       - Keep using f2fs_crypto functions instead of generic fscrypt API
       - Use remove_dirty_dir_inode() instead of remove_dirty_inode()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      523972a6
    • Shawn Lin's avatar
      f2fs: move sanity checking of cp into get_valid_checkpoint · 8c5dfff5
      Shawn Lin authored
      commit 984ec63c upstream.
      
      >From the function name of get_valid_checkpoint, it seems to return
      the valid cp or NULL for caller to check. If no valid one is found,
      f2fs_fill_super will print the err log. But if get_valid_checkpoint
      get one valid(the return value indicate that it's valid, however actually
      it is invalid after sanity checking), then print another similar err
      log. That seems strange. Let's keep sanity checking inside the procedure
      of geting valid cp. Another improvement we gained from this move is
      that even the large volume is supported, we check the cp in advanced
      to skip the following procedure if failing the sanity checking.
      Signed-off-by: default avatarShawn Lin <shawn.lin@rock-chips.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c5dfff5
    • Jaegeuk Kim's avatar
      f2fs: cover more area with nat_tree_lock · 87a099c6
      Jaegeuk Kim authored
      commit a5131193 upstream.
      
      There was a subtle bug on nat cache management which incurs wrong nid allocation
      or wrong block addresses when try_to_free_nats is triggered heavily.
      This patch enlarges the previous coverage of nat_tree_lock to avoid data race.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87a099c6
    • Chao Yu's avatar
      f2fs: clean up argument of recover_data · 139211c6
      Chao Yu authored
      commit b7973f23 upstream.
      
      In recover_data, value of argument 'type' will be CURSEG_WARM_NODE all
      the time, remove it for cleanup.
      Signed-off-by: default avatarChao Yu <chao2.yu@samsung.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      [bwh: Picked as dependency of commit 6781eabb "f2fs: give -EINVAL for
       norecovery and rw mount"]
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      139211c6
    • Oliver Hartkopp's avatar
      can: gw: ensure DLC boundaries after CAN frame modification · 693ae291
      Oliver Hartkopp authored
      commit 0aaa8137 upstream.
      
      Muyu Yu provided a POC where user root with CAP_NET_ADMIN can create a CAN
      frame modification rule that makes the data length code a higher value than
      the available CAN frame data size. In combination with a configured checksum
      calculation where the result is stored relatively to the end of the data
      (e.g. cgw_csum_xor_rel) the tail of the skb (e.g. frag_list pointer in
      skb_shared_info) can be rewritten which finally can cause a system crash.
      
      Michael Kubecek suggested to drop frames that have a DLC exceeding the
      available space after the modification process and provided a patch that can
      handle CAN FD frames too. Within this patch we also limit the length for the
      checksum calculations to the maximum of Classic CAN data length (8).
      
      CAN frames that are dropped by these additional checks are counted with the
      CGW_DELETED counter which indicates misconfigurations in can-gw rules.
      
      This fixes CVE-2019-3701.
      Reported-by: default avatarMuyu Yu <ieatmuttonchuan@gmail.com>
      Reported-by: default avatarMarcus Meissner <meissner@suse.de>
      Suggested-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Tested-by: default avatarMuyu Yu <ieatmuttonchuan@gmail.com>
      Tested-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Cc: linux-stable <stable@vger.kernel.org> # >= v3.2
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      693ae291
    • Dmitry Safonov's avatar
      tty/ldsem: Wake up readers after timed out down_write() · d93216e5
      Dmitry Safonov authored
      commit 231f8fd0 upstream.
      
      ldsem_down_read() will sleep if there is pending writer in the queue.
      If the writer times out, readers in the queue should be woken up,
      otherwise they may miss a chance to acquire the semaphore until the last
      active reader will do ldsem_up_read().
      
      There was a couple of reports where there was one active reader and
      other readers soft locked up:
        Showing all locks held in the system:
        2 locks held by khungtaskd/17:
         #0:  (rcu_read_lock){......}, at: watchdog+0x124/0x6d1
         #1:  (tasklist_lock){.+.+..}, at: debug_show_all_locks+0x72/0x2d3
        2 locks held by askfirst/123:
         #0:  (&tty->ldisc_sem){.+.+.+}, at: ldsem_down_read+0x46/0x58
         #1:  (&ldata->atomic_read_lock){+.+...}, at: n_tty_read+0x115/0xbe4
      
      Prevent readers wait for active readers to release ldisc semaphore.
      
      Link: lkml.kernel.org/r/20171121132855.ajdv4k6swzhvktl6@wfg-t540p.sh.intel.com
      Link: lkml.kernel.org/r/20180907045041.GF1110@shao2-debian
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Reported-by: default avatarkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d93216e5
  2. 16 Jan, 2019 24 commits