1. 28 Feb, 2020 10 commits
    • Filipe Manana's avatar
      Btrfs: fix race between using extent maps and merging them · 3e507157
      Filipe Manana authored
      commit ac05ca91 upstream.
      
      We have a few cases where we allow an extent map that is in an extent map
      tree to be merged with other extents in the tree. Such cases include the
      unpinning of an extent after the respective ordered extent completed or
      after logging an extent during a fast fsync. This can lead to subtle and
      dangerous problems because when doing the merge some other task might be
      using the same extent map and as consequence see an inconsistent state of
      the extent map - for example sees the new length but has seen the old start
      offset.
      
      With luck this triggers a BUG_ON(), and not some silent bug, such as the
      following one in __do_readpage():
      
        $ cat -n fs/btrfs/extent_io.c
        3061  static int __do_readpage(struct extent_io_tree *tree,
        3062                           struct page *page,
        (...)
        3127                  em = __get_extent_map(inode, page, pg_offset, cur,
        3128                                        end - cur + 1, get_extent, em_cached);
        3129                  if (IS_ERR_OR_NULL(em)) {
        3130                          SetPageError(page);
        3131                          unlock_extent(tree, cur, end);
        3132                          break;
        3133                  }
        3134                  extent_offset = cur - em->start;
        3135                  BUG_ON(extent_map_end(em) <= cur);
        (...)
      
      Consider the following example scenario, where we end up hitting the
      BUG_ON() in __do_readpage().
      
      We have an inode with a size of 8KiB and 2 extent maps:
      
        extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by
                  a previous transaction
      
        extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet
                  persisted but writeback started for it already. The extent map
      	    is pinned since there's writeback and an ordered extent in
      	    progress, so it can not be merged with extent map A yet
      
      The following sequence of steps leads to the BUG_ON():
      
      1) The ordered extent for extent B completes, the respective page gets its
         writeback bit cleared and the extent map is unpinned, at that point it
         is not yet merged with extent map A because it's in the list of modified
         extents;
      
      2) Due to memory pressure, or some other reason, the MM subsystem releases
         the page corresponding to extent B - btrfs_releasepage() is called and
         returns 1, meaning the page can be released as it's not dirty, not under
         writeback anymore and the extent range is not locked in the inode's
         iotree. However the extent map is not released, either because we are
         not in a context that allows memory allocations to block or because the
         inode's size is smaller than 16MiB - in this case our inode has a size
         of 8KiB;
      
      3) Task B needs to read extent B and ends up __do_readpage() through the
         btrfs_readpage() callback. At __do_readpage() it gets a reference to
         extent map B;
      
      4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B
         while holding the write lock on the inode's extent map tree - this
         results in try_merge_map() being called and since it's possible to merge
         extent map B with extent map A now (the extent map B was removed from
         the list of modified extents), the merging begins - it sets extent map
         B's start offset to 0 (was 4KiB), but before it increments the map's
         length to 8KiB (4kb + 4KiB), task A is at:
      
         BUG_ON(extent_map_end(em) <= cur);
      
         The call to extent_map_end() sees the extent map has a start of 0
         and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so
         the BUG_ON() is triggered.
      
      So it's dangerous to modify an extent map that is in the tree, because some
      other task might have got a reference to it before and still using it, and
      needs to see a consistent map while using it. Generally this is very rare
      since most paths that lookup and use extent maps also have the file range
      locked in the inode's iotree. The fsync path is pretty much the only
      exception where we don't do it to avoid serialization with concurrent
      reads.
      
      Fix this by not allowing an extent map do be merged if if it's being used
      by tasks other then the one attempting to merge the extent map (when the
      reference count of the extent map is greater than 2).
      Reported-by: default avatarryusuke1925 <st13s20@gm.ibaraki-ct.ac.jp>
      Reported-by: default avatarKoki Mitani <koki.mitani.xg@hco.ntt.co.jp>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e507157
    • Theodore Ts'o's avatar
      ext4: improve explanation of a mount failure caused by a misconfigured kernel · 2d99bc58
      Theodore Ts'o authored
      commit d65d87a0 upstream.
      
      If CONFIG_QFMT_V2 is not enabled, but CONFIG_QUOTA is enabled, when a
      user tries to mount a file system with the quota or project quota
      enabled, the kernel will emit a very confusing messsage:
      
          EXT4-fs warning (device vdc): ext4_enable_quotas:5914: Failed to enable quota tracking (type=0, err=-3). Please run e2fsck to fix.
          EXT4-fs (vdc): mount failed
      
      We will now report an explanatory message indicating which kernel
      configuration options have to be enabled, to avoid customer/sysadmin
      confusion.
      
      Link: https://lore.kernel.org/r/20200215012738.565735-1-tytso@mit.edu
      Google-Bug-Id: 149093531
      Fixes: 7c319d32 ("ext4: make quota as first class supported feature")
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d99bc58
    • Jan Kara's avatar
      ext4: fix checksum errors with indexed dirs · 3f3beb23
      Jan Kara authored
      commit 48a34311 upstream.
      
      DIR_INDEX has been introduced as a compat ext4 feature. That means that
      even kernels / tools that don't understand the feature may modify the
      filesystem. This works because for kernels not understanding indexed dir
      format, internal htree nodes appear just as empty directory entries.
      Index dir aware kernels then check the htree structure is still
      consistent before using the data. This all worked reasonably well until
      metadata checksums were introduced. The problem is that these
      effectively made DIR_INDEX only ro-compatible because internal htree
      nodes store checksums in a different place than normal directory blocks.
      Thus any modification ignorant to DIR_INDEX (or just clearing
      EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch
      and trigger kernel errors. So we have to be more careful when dealing
      with indexed directories on filesystems with checksumming enabled.
      
      1) We just disallow loading any directory inodes with EXT4_INDEX_FL when
      DIR_INDEX is not enabled. This is harsh but it should be very rare (it
      means someone disabled DIR_INDEX on existing filesystem and didn't run
      e2fsck), e2fsck can fix the problem, and we don't want to answer the
      difficult question: "Should we rather corrupt the directory more or
      should we ignore that DIR_INDEX feature is not set?"
      
      2) When we find out htree structure is corrupted (but the filesystem and
      the directory should in support htrees), we continue just ignoring htree
      information for reading but we refuse to add new entries to the
      directory to avoid corrupting it more.
      
      Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz
      Fixes: dbe89444 ("ext4: Calculate and verify checksums for htree nodes")
      Reviewed-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f3beb23
    • Andreas Dilger's avatar
      ext4: don't assume that mmp_nodename/bdevname have NUL · 46cc9ff7
      Andreas Dilger authored
      commit 14c9ca05 upstream.
      
      Don't assume that the mmp_nodename and mmp_bdevname strings are NUL
      terminated, since they are filled in by snprintf(), which is not
      guaranteed to do so.
      
      Link: https://lore.kernel.org/r/1580076215-1048-1-git-send-email-adilger@dilger.caSigned-off-by: default avatarAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46cc9ff7
    • Arvind Sankar's avatar
      ALSA: usb-audio: Apply sample rate quirk for Audioengine D1 · 34951b0d
      Arvind Sankar authored
      commit 93f9d1a4 upstream.
      
      The Audioengine D1 (0x2912:0x30c8) does support reading the sample rate,
      but it returns the rate in byte-reversed order.
      
      When setting sampling rate, the driver produces these warning messages:
      [168840.944226] usb 3-2.2: current rate 4500480 is different from the runtime rate 44100
      [168854.930414] usb 3-2.2: current rate 8436480 is different from the runtime rate 48000
      [168905.185825] usb 3-2.1.2: current rate 30465 is different from the runtime rate 96000
      
      As can be seen from the hexadecimal conversion, the current rate read
      back is byte-reversed from the rate that was set.
      
      44100 == 0x00ac44, 4500480 == 0x44ac00
      48000 == 0x00bb80, 8436480 == 0x80bb00
      96000 == 0x017700,   30465 == 0x007701
      
      Rather than implementing a new quirk to reverse the order, just skip
      checking the rate to avoid spamming the log.
      Signed-off-by: default avatarArvind Sankar <nivedita@alum.mit.edu>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200211162235.1639889-1-nivedita@alum.mit.eduSigned-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      34951b0d
    • Wenwen Wang's avatar
      ecryptfs: fix a memory leak bug in ecryptfs_init_messaging() · 2aa170c8
      Wenwen Wang authored
      commit b4a81b87 upstream.
      
      In ecryptfs_init_messaging(), if the allocation for 'ecryptfs_msg_ctx_arr'
      fails, the previously allocated 'ecryptfs_daemon_hash' is not deallocated,
      leading to a memory leak bug. To fix this issue, free
      'ecryptfs_daemon_hash' before returning the error.
      
      Cc: stable@vger.kernel.org
      Fixes: 88b4a07e ("[PATCH] eCryptfs: Public key transport mechanism")
      Signed-off-by: default avatarWenwen Wang <wenwen@cs.uga.edu>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2aa170c8
    • Wenwen Wang's avatar
      ecryptfs: fix a memory leak bug in parse_tag_1_packet() · f3ee3bad
      Wenwen Wang authored
      commit fe2e082f upstream.
      
      In parse_tag_1_packet(), if tag 1 packet contains a key larger than
      ECRYPTFS_MAX_ENCRYPTED_KEY_BYTES, no cleanup is executed, leading to a
      memory leak on the allocated 'auth_tok_list_item'. To fix this issue, go to
      the label 'out_free' to perform the cleanup work.
      
      Cc: stable@vger.kernel.org
      Fixes: dddfa461 ("[PATCH] eCryptfs: Public key; packet management")
      Signed-off-by: default avatarWenwen Wang <wenwen@cs.uga.edu>
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3ee3bad
    • Takashi Iwai's avatar
      ALSA: hda: Use scnprintf() for printing texts for sysfs/procfs · f67043b6
      Takashi Iwai authored
      commit 44eeb081 upstream.
      
      Some code in HD-audio driver calls snprintf() in a loop and still
      expects that the return value were actually written size, while
      snprintf() returns the expected would-be length instead.  When the
      given buffer limit were small, this leads to a buffer overflow.
      
      Use scnprintf() for addressing those issues.  It returns the actually
      written size unlike snprintf().
      
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200218091409.27162-1-tiwai@suse.deSigned-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f67043b6
    • Paolo Bonzini's avatar
      KVM: x86: emulate RDPID · 01b7a509
      Paolo Bonzini authored
      commit fb6d4d34 upstream.
      
      This is encoded as F3 0F C7 /7 with a register argument.  The register
      argument is the second array in the group9 GroupDual, while F3 is the
      fourth element of a Prefix.
      Reviewed-by: default avatarWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01b7a509
    • Andy Lutomirski's avatar
      x86/vdso: Use RDPID in preference to LSL when available · ab19f949
      Andy Lutomirski authored
      commit a582c540 upstream.
      
      RDPID is a new instruction that reads MSR_TSC_AUX quickly.  This
      should be considerably faster than reading the GDT.  Add a
      cpufeature for it and use it from __vdso_getcpu() when available.
      Tested-by: default avatarMegha Dey <megha.dey@intel.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4f6c3a22012d10f1c65b9ca15800e01b42c7d39d.1479320367.git.luto@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab19f949
  2. 14 Feb, 2020 30 commits