1. 16 Jan, 2015 34 commits
  2. 08 Jan, 2015 6 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.14.28 · c3b70f0b
      Greg Kroah-Hartman authored
      c3b70f0b
    • Filipe Manana's avatar
      Btrfs: fix fs corruption on transaction abort if device supports discard · b91261aa
      Filipe Manana authored
      commit 678886bd upstream.
      
      When we abort a transaction we iterate over all the ranges marked as dirty
      in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them
      from those trees, add them back (unpin) to the free space caches and, if
      the fs was mounted with "-o discard", perform a discard on those regions.
      Also, after adding the regions to the free space caches, a fitrim ioctl call
      can see those ranges in a block group's free space cache and perform a discard
      on the ranges, so the same issue can happen without "-o discard" as well.
      
      This causes corruption, affecting one or multiple btree nodes (in the worst
      case leaving the fs unmountable) because some of those ranges (the ones in
      the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are
      referred by the last committed super block - breaking the rule that anything
      that was committed by a transaction is untouched until the next transaction
      commits successfully.
      
      I ran into this while running in a loop (for several hours) the fstest that
      I recently submitted:
      
        [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim
      
      The corruption always happened when a transaction aborted and then fsck complained
      like this:
      
         _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
         *** fsck.btrfs output ***
         Check tree block failed, want=94945280, have=0
         Check tree block failed, want=94945280, have=0
         Check tree block failed, want=94945280, have=0
         Check tree block failed, want=94945280, have=0
         Check tree block failed, want=94945280, have=0
         read block failed check_tree_block
         Couldn't open file system
      
      In this case 94945280 corresponded to the root of a tree.
      Using frace what I observed was the following sequence of steps happened:
      
         1) transaction N started, fs_info->pinned_extents pointed to
            fs_info->freed_extents[0];
      
         2) node/eb 94945280 is created;
      
         3) eb is persisted to disk;
      
         4) transaction N commit starts, fs_info->pinned_extents now points to
            fs_info->freed_extents[1], and transaction N completes;
      
         5) transaction N + 1 starts;
      
         6) eb is COWed, and btrfs_free_tree_block() called for this eb;
      
         7) eb range (94945280 to 94945280 + 16Kb) is added to
            fs_info->pinned_extents (fs_info->freed_extents[1]);
      
         8) Something goes wrong in transaction N + 1, like hitting ENOSPC
            for example, and the transaction is aborted, turning the fs into
            readonly mode. The stack trace I got for example:
      
            [112065.253935]  [<ffffffff8140c7b6>] dump_stack+0x4d/0x66
            [112065.254271]  [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98
            [112065.254567]  [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs]
            [112065.261674]  [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50
            [112065.261922]  [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs]
            [112065.262211]  [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs]
            [112065.262545]  [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs]
            [112065.262771]  [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs]
            [112065.263105]  [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs]
            (...)
            [112065.264493] ---[ end trace dd7903a975a31a08 ]---
            [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left
            [112065.264997] BTRFS info (device sdc): forced readonly
      
         9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in
            fs_info->fs_state and calls btrfs_cleanup_transaction(), which in
            turn calls btrfs_destroy_pinned_extent();
      
         10) Then btrfs_destroy_pinned_extent() iterates over all the ranges
             marked as dirty in fs_info->freed_extents[], and for each one
             it calls discard, if the fs was mounted with "-o discard", and
             adds the range to the free space cache of the respective block
             group;
      
         11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path,
             sees the free space entries and performs a discard;
      
         12) After an umount and mount (or fsck), our eb's location on disk was full
             of zeroes, and it should have been untouched, because it was marked as
             dirty in the fs_info->pinned_extents tree, and therefore used by the
             trees that the last committed superblock points to.
      
      Fix this by not performing a discard and not adding the ranges to the free space
      caches - it's useless from this point since the fs is now in readonly mode and
      we won't write free space caches to disk anymore (otherwise we would leak space)
      nor any new superblock. By not adding the ranges to the free space caches, it
      prevents other code paths from allocating that space and write to it as well,
      therefore being safer and simpler.
      
      This isn't a new problem, as it's been present since 2011 (git commit
      acce952b).
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b91261aa
    • Josef Bacik's avatar
      Btrfs: do not move em to modified list when unpinning · d61b7a2d
      Josef Bacik authored
      commit a2804695 upstream.
      
      We use the modified list to keep track of which extents have been modified so we
      know which ones are candidates for logging at fsync() time.  Newly modified
      extents are added to the list at modification time, around the same time the
      ordered extent is created.  We do this so that we don't have to wait for ordered
      extents to complete before we know what we need to log.  The problem is when
      something like this happens
      
      log extent 0-4k on inode 1
      copy csum for 0-4k from ordered extent into log
      sync log
      commit transaction
      log some other extent on inode 1
      ordered extent for 0-4k completes and adds itself onto modified list again
      log changed extents
      see ordered extent for 0-4k has already been logged
      	at this point we assume the csum has been copied
      sync log
      crash
      
      On replay we will see the extent 0-4k in the log, drop the original 0-4k extent
      which is the same one that we are replaying which also drops the csum, and then
      we won't find the csum in the log for that bytenr.  This of course causes us to
      have errors about not having csums for certain ranges of our inode.  So remove
      the modified list manipulation in unpin_extent_cache, any modified extents
      should have been added well before now, and we don't want them re-logged.  This
      fixes my test that I could reliably reproduce this problem with.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d61b7a2d
    • Michael Halcrow's avatar
      eCryptfs: Remove buggy and unnecessary write in file name decode routine · a306ae6a
      Michael Halcrow authored
      commit 94208064 upstream.
      
      Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the
      end of the allocated buffer during encrypted filename decoding. This
      fix corrects the issue by getting rid of the unnecessary 0 write when
      the current bit offset is 2.
      Signed-off-by: default avatarMichael Halcrow <mhalcrow@google.com>
      Reported-by: default avatarDmitry Chernenkov <dmitryc@google.com>
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a306ae6a
    • Tyler Hicks's avatar
      eCryptfs: Force RO mount when encrypted view is enabled · 2f531cca
      Tyler Hicks authored
      commit 332b122d upstream.
      
      The ecryptfs_encrypted_view mount option greatly changes the
      functionality of an eCryptfs mount. Instead of encrypting and decrypting
      lower files, it provides a unified view of the encrypted files in the
      lower filesystem. The presence of the ecryptfs_encrypted_view mount
      option is intended to force a read-only mount and modifying files is not
      supported when the feature is in use. See the following commit for more
      information:
      
        e77a56dd [PATCH] eCryptfs: Encrypted passthrough
      
      This patch forces the mount to be read-only when the
      ecryptfs_encrypted_view mount option is specified by setting the
      MS_RDONLY flag on the superblock. Additionally, this patch removes some
      broken logic in ecryptfs_open() that attempted to prevent modifications
      of files when the encrypted view feature was in use. The check in
      ecryptfs_open() was not sufficient to prevent file modifications using
      system calls that do not operate on a file descriptor.
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Reported-by: default avatarPriya Bansal <p.bansal@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f531cca
    • Jan Kara's avatar
      udf: Verify symlink size before loading it · 5c90d036
      Jan Kara authored
      commit a1d47b26 upstream.
      
      UDF specification allows arbitrarily large symlinks. However we support
      only symlinks at most one block large. Check the length of the symlink
      so that we don't access memory beyond end of the symlink block.
      Reported-by: default avatarCarl Henrik Lunde <chlunde@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5c90d036