1. 27 Nov, 2018 8 commits
  2. 21 Nov, 2018 32 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.164 · 8e592f4a
      Greg Kroah-Hartman authored
      8e592f4a
    • Clint Taylor's avatar
      drm/i915/hdmi: Add HDMI 2.0 audio clock recovery N values · 29a23181
      Clint Taylor authored
      commit 65034931 upstream.
      
      HDMI 2.0 594Mhz modes were incorrectly selecting 25.200Mhz Automatic N value
      mode instead of HDMI specification values.
      
      V2: Fix 88.2 Hz N value
      
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarClint Taylor <clinton.a.taylor@intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/1540493521-1746-2-git-send-email-clinton.a.taylor@intel.com
      (cherry picked from commit 5a400aa3)
      Signed-off-by: default avatarJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29a23181
    • Stanislav Lisovskiy's avatar
      drm/dp_mst: Check if primary mstb is null · 8ef21c40
      Stanislav Lisovskiy authored
      commit 23d80039 upstream.
      
      Unfortunately drm_dp_get_mst_branch_device which is called from both
      drm_dp_mst_handle_down_rep and drm_dp_mst_handle_up_rep seem to rely
      on that mgr->mst_primary is not NULL, which seem to be wrong as it can be
      cleared with simultaneous mode set, if probing fails or in other case.
      mgr->lock mutex doesn't protect against that as it might just get
      assigned to NULL right before, not simultaneously.
      
      There are currently bugs 107738, 108616 bugs which crash in
      drm_dp_get_mst_branch_device, caused by this issue.
      
      v2: Refactored the code, as it was nicely noticed.
          Fixed Bugzilla bug numbers(second was 108616, but not 108816)
          and added links.
      
      [changed title and added stable cc]
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarStanislav Lisovskiy <stanislav.lisovskiy@intel.com>
      Cc: stable@vger.kernel.org
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108616
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107738
      Link: https://patchwork.freedesktop.org/patch/msgid/20181109090012.24438-1-stanislav.lisovskiy@intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ef21c40
    • Marc Zyngier's avatar
      drm/rockchip: Allow driver to be shutdown on reboot/kexec · 67dbeda8
      Marc Zyngier authored
      commit 7f3ef5de upstream.
      
      Leaving the DRM driver enabled on reboot or kexec has the annoying
      effect of leaving the display generating transactions whilst the
      IOMMU has been shut down.
      
      In turn, the IOMMU driver (which shares its interrupt line with
      the VOP) starts warning either on shutdown or when entering the
      secondary kernel in the kexec case (nothing is expected on that
      front).
      
      A cheap way of ensuring that things are nicely shut down is to
      register a shutdown callback in the platform driver.
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Tested-by: default avatarVicente Bergas <vicencb@gmail.com>
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180805124807.18169-1-marc.zyngier@arm.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67dbeda8
    • Mike Kravetz's avatar
      mm: migration: fix migration of huge PMD shared pages · b026c7ee
      Mike Kravetz authored
      commit 017b1660 upstream.
      
      The page migration code employs try_to_unmap() to try and unmap the source
      page.  This is accomplished by using rmap_walk to find all vmas where the
      page is mapped.  This search stops when page mapcount is zero.  For shared
      PMD huge pages, the page map count is always 1 no matter the number of
      mappings.  Shared mappings are tracked via the reference count of the PMD
      page.  Therefore, try_to_unmap stops prematurely and does not completely
      unmap all mappings of the source page.
      
      This problem can result is data corruption as writes to the original
      source page can happen after contents of the page are copied to the target
      page.  Hence, data is lost.
      
      This problem was originally seen as DB corruption of shared global areas
      after a huge page was soft offlined due to ECC memory errors.  DB
      developers noticed they could reproduce the issue by (hotplug) offlining
      memory used to back huge pages.  A simple testcase can reproduce the
      problem by creating a shared PMD mapping (note that this must be at least
      PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using
      migrate_pages() to migrate process pages between nodes while continually
      writing to the huge pages being migrated.
      
      To fix, have the try_to_unmap_one routine check for huge PMD sharing by
      calling huge_pmd_unshare for hugetlbfs huge pages.  If it is a shared
      mapping it will be 'unshared' which removes the page table entry and drops
      the reference on the PMD page.  After this, flush caches and TLB.
      
      mmu notifiers are called before locking page tables, but we can not be
      sure of PMD sharing until page tables are locked.  Therefore, check for
      the possibility of PMD sharing before locking so that notifiers can
      prepare for the worst possible case.
      
      Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com
      [mike.kravetz@oracle.com: make _range_in_vma() a static inline]
        Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com
      Fixes: 39dde65c ("shared page table for hugetlb page")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b026c7ee
    • Mike Kravetz's avatar
      hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444! · 575361a2
      Mike Kravetz authored
      commit 5e41540c upstream.
      
      This bug has been experienced several times by the Oracle DB team.  The
      BUG is in remove_inode_hugepages() as follows:
      
      	/*
      	 * If page is mapped, it was faulted in after being
      	 * unmapped in caller.  Unmap (again) now after taking
      	 * the fault mutex.  The mutex will prevent faults
      	 * until we finish removing the page.
      	 *
      	 * This race can only happen in the hole punch case.
      	 * Getting here in a truncate operation is a bug.
      	 */
      	if (unlikely(page_mapped(page))) {
      		BUG_ON(truncate_op);
      
      In this case, the elevated map count is not the result of a race.
      Rather it was incorrectly incremented as the result of a bug in the huge
      pmd sharing code.  Consider the following:
      
       - Process A maps a hugetlbfs file of sufficient size and alignment
         (PUD_SIZE) that a pmd page could be shared.
      
       - Process B maps the same hugetlbfs file with the same size and
         alignment such that a pmd page is shared.
      
       - Process B then calls mprotect() to change protections for the mapping
         with the shared pmd. As a result, the pmd is 'unshared'.
      
       - Process B then calls mprotect() again to chage protections for the
         mapping back to their original value. pmd remains unshared.
      
       - Process B then forks and process C is created. During the fork
         process, we do dup_mm -> dup_mmap -> copy_page_range to copy page
         tables. Copying page tables for hugetlb mappings is done in the
         routine copy_hugetlb_page_range.
      
      In copy_hugetlb_page_range(), the destination pte is obtained by:
      
      	dst_pte = huge_pte_alloc(dst, addr, sz);
      
      If pmd sharing is possible, the returned pointer will be to a pte in an
      existing page table.  In the situation above, process C could share with
      either process A or process B.  Since process A is first in the list,
      the returned pte is a pointer to a pte in process A's page table.
      
      However, the check for pmd sharing in copy_hugetlb_page_range is:
      
      	/* If the pagetables are shared don't copy or take references */
      	if (dst_pte == src_pte)
      		continue;
      
      Since process C is sharing with process A instead of process B, the
      above test fails.  The code in copy_hugetlb_page_range which follows
      assumes dst_pte points to a huge_pte_none pte.  It copies the pte entry
      from src_pte to dst_pte and increments this map count of the associated
      page.  This is how we end up with an elevated map count.
      
      To solve, check the dst_pte entry for huge_pte_none.  If !none, this
      implies PMD sharing so do not copy.
      
      Link: http://lkml.kernel.org/r/20181105212315.14125-1-mike.kravetz@oracle.com
      Fixes: c5c99429 ("fix hugepages leak due to pagetable page sharing")
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      575361a2
    • Guenter Roeck's avatar
      configfs: replace strncpy with memcpy · 9546d660
      Guenter Roeck authored
      commit 1823342a upstream.
      
      gcc 8.1.0 complains:
      
      fs/configfs/symlink.c:67:3: warning:
      	'strncpy' output truncated before terminating nul copying as many
      	bytes from a string as its length
      fs/configfs/symlink.c: In function 'configfs_get_link':
      fs/configfs/symlink.c:63:13: note: length computed here
      
      Using strncpy() is indeed less than perfect since the length of data to
      be copied has already been determined with strlen(). Replace strncpy()
      with memcpy() to address the warning and optimize the code a little.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNobuhiro Iwamatsu <nobuhiro.iwamatsu@cybertrust.co.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9546d660
    • Miklos Szeredi's avatar
      fuse: fix leaked notify reply · 6023d16f
      Miklos Szeredi authored
      commit 7fabaf30 upstream.
      
      fuse_request_send_notify_reply() may fail if the connection was reset for
      some reason (e.g. fs was unmounted).  Don't leak request reference in this
      case.  Besides leaking memory, this resulted in fc->num_waiting not being
      decremented and hence fuse_wait_aborted() left in a hanging and unkillable
      state.
      
      Fixes: 2d45ba38 ("fuse: add retrieve request")
      Fixes: b8f95e5d ("fuse: umount should wait for all requests")
      Reported-and-tested-by: syzbot+6339eda9cb4ebbc4c37b@syzkaller.appspotmail.com
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org> #v2.6.36
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6023d16f
    • Maciej W. Rozycki's avatar
      rtc: hctosys: Add missing range error reporting · 453e01cc
      Maciej W. Rozycki authored
      commit 7ce9a992 upstream.
      
      Fix an issue with the 32-bit range error path in `rtc_hctosys' where no
      error code is set and consequently the successful preceding call result
      from `rtc_read_time' is propagated to `rtc_hctosys_ret'.  This in turn
      makes any subsequent call to `hctosys_show' incorrectly report in sysfs
      that the system time has been set from this RTC while it has not.
      
      Set the error to ERANGE then if we can't express the result due to an
      overflow.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      Fixes: b3a5ac42 ("rtc: hctosys: Ensure system time doesn't overflow time_t")
      Cc: stable@vger.kernel.org # 4.17+
      Signed-off-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      453e01cc
    • Frank Sorenson's avatar
      sunrpc: correct the computation for page_ptr when truncating · 4293fbc2
      Frank Sorenson authored
      commit 5d7a5bcb upstream.
      
      When truncating the encode buffer, the page_ptr is getting
      advanced, causing the next page to be skipped while encoding.
      The page is still included in the response, so the response
      contains a page of bogus data.
      
      We need to adjust the page_ptr backwards to ensure we encode
      the next page into the correct place.
      
      We saw this triggered when concurrent directory modifications caused
      nfsd4_encode_direct_fattr() to return nfserr_noent, and the resulting
      call to xdr_truncate_encode() corrupted the READDIR reply.
      Signed-off-by: default avatarFrank Sorenson <sorenson@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4293fbc2
    • Eric W. Biederman's avatar
      mount: Prevent MNT_DETACH from disconnecting locked mounts · f7e6ee2e
      Eric W. Biederman authored
      commit 9c8e0a1b upstream.
      
      Timothy Baldwin <timbaldwin@fastmail.co.uk> wrote:
      > As per mount_namespaces(7) unprivileged users should not be able to look under mount points:
      >
      >   Mounts that come as a single unit from more privileged mount are locked
      >   together and may not be separated in a less privileged mount namespace.
      >
      > However they can:
      >
      > 1. Create a mount namespace.
      > 2. In the mount namespace open a file descriptor to the parent of a mount point.
      > 3. Destroy the mount namespace.
      > 4. Use the file descriptor to look under the mount point.
      >
      > I have reproduced this with Linux 4.16.18 and Linux 4.18-rc8.
      >
      > The setup:
      >
      > $ sudo sysctl kernel.unprivileged_userns_clone=1
      > kernel.unprivileged_userns_clone = 1
      > $ mkdir -p A/B/Secret
      > $ sudo mount -t tmpfs hide A/B
      >
      >
      > "Secret" is indeed hidden as expected:
      >
      > $ ls -lR A
      > A:
      > total 0
      > drwxrwxrwt 2 root root 40 Feb 12 21:08 B
      >
      > A/B:
      > total 0
      >
      >
      > The attack revealing "Secret":
      >
      > $ unshare -Umr sh -c "exec unshare -m ls -lR /proc/self/fd/4/ 4<A"
      > /proc/self/fd/4/:
      > total 0
      > drwxr-xr-x 3 root root 60 Feb 12 21:08 B
      >
      > /proc/self/fd/4/B:
      > total 0
      > drwxr-xr-x 2 root root 40 Feb 12 21:08 Secret
      >
      > /proc/self/fd/4/B/Secret:
      > total 0
      
      I tracked this down to put_mnt_ns running passing UMOUNT_SYNC and
      disconnecting all of the mounts in a mount namespace.  Fix this by
      factoring drop_mounts out of drop_collected_mounts and passing
      0 instead of UMOUNT_SYNC.
      
      There are two possible behavior differences that result from this.
      - No longer setting UMOUNT_SYNC will no longer set MNT_SYNC_UMOUNT on
        the vfsmounts being unmounted.  This effects the lazy rcu walk by
        kicking the walk out of rcu mode and forcing it to be a non-lazy
        walk.
      - No longer disconnecting locked mounts will keep some mounts around
        longer as they stay because the are locked to other mounts.
      
      There are only two users of drop_collected mounts: audit_tree.c and
      put_mnt_ns.
      
      In audit_tree.c the mounts are private and there are no rcu lazy walks
      only calls to iterate_mounts. So the changes should have no effect
      except for a small timing effect as the connected mounts are disconnected.
      
      In put_mnt_ns there may be references from process outside the mount
      namespace to the mounts.  So the mounts remaining connected will
      be the bug fix that is needed.  That rcu walks are allowed to continue
      appears not to be a problem especially as the rcu walk change was about
      an implementation detail not about semantics.
      
      Cc: stable@vger.kernel.org
      Fixes: 5ff9d8a6 ("vfs: Lock in place mounts from more privileged users")
      Reported-by: default avatarTimothy Baldwin <timbaldwin@fastmail.co.uk>
      Tested-by: default avatarTimothy Baldwin <timbaldwin@fastmail.co.uk>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7e6ee2e
    • Eric W. Biederman's avatar
      mount: Don't allow copying MNT_UNBINDABLE|MNT_LOCKED mounts · 6258099f
      Eric W. Biederman authored
      commit df7342b2 upstream.
      
      Jonathan Calmels from NVIDIA reported that he's able to bypass the
      mount visibility security check in place in the Linux kernel by using
      a combination of the unbindable property along with the private mount
      propagation option to allow a unprivileged user to see a path which
      was purposefully hidden by the root user.
      
      Reproducer:
        # Hide a path to all users using a tmpfs
        root@castiana:~# mount -t tmpfs tmpfs /sys/devices/
        root@castiana:~#
      
        # As an unprivileged user, unshare user namespace and mount namespace
        stgraber@castiana:~$ unshare -U -m -r
      
        # Confirm the path is still not accessible
        root@castiana:~# ls /sys/devices/
      
        # Make /sys recursively unbindable and private
        root@castiana:~# mount --make-runbindable /sys
        root@castiana:~# mount --make-private /sys
      
        # Recursively bind-mount the rest of /sys over to /mnnt
        root@castiana:~# mount --rbind /sys/ /mnt
      
        # Access our hidden /sys/device as an unprivileged user
        root@castiana:~# ls /mnt/devices/
        breakpoint cpu cstate_core cstate_pkg i915 intel_pt isa kprobe
        LNXSYSTM:00 msr pci0000:00 platform pnp0 power software system
        tracepoint uncore_arb uncore_cbox_0 uncore_cbox_1 uprobe virtual
      
      Solve this by teaching copy_tree to fail if a mount turns out to be
      both unbindable and locked.
      
      Cc: stable@vger.kernel.org
      Fixes: 5ff9d8a6 ("vfs: Lock in place mounts from more privileged users")
      Reported-by: default avatarJonathan Calmels <jcalmels@nvidia.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6258099f
    • Eric W. Biederman's avatar
      mount: Retest MNT_LOCKED in do_umount · 5becfca9
      Eric W. Biederman authored
      commit 25d202ed upstream.
      
      It was recently pointed out that the one instance of testing MNT_LOCKED
      outside of the namespace_sem is in ksys_umount.
      
      Fix that by adding a test inside of do_umount with namespace_sem and
      the mount_lock held.  As it helps to fail fails the existing test is
      maintained with an additional comment pointing out that it may be racy
      because the locks are not held.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Fixes: 5ff9d8a6 ("vfs: Lock in place mounts from more privileged users")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5becfca9
    • Vasily Averin's avatar
      ext4: fix buffer leak in __ext4_read_dirblock() on error path · 954ff046
      Vasily Averin authored
      commit de59fae0 upstream.
      
      Fixes: dc6982ff ("ext4: refactor code to read directory blocks ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.9
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      954ff046
    • Vasily Averin's avatar
      ext4: fix buffer leak in ext4_xattr_move_to_block() on error path · 27acd8f8
      Vasily Averin authored
      commit 6bdc9977 upstream.
      
      Fixes: 3f2571c1 ("ext4: factor out xattr moving")
      Fixes: 6dd4ee7c ("ext4: Expand extra_inodes space per ...")
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.23
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27acd8f8
    • Vasily Averin's avatar
      ext4: release bs.bh before re-using in ext4_xattr_block_find() · 0d9902d4
      Vasily Averin authored
      commit 45ae932d upstream.
      
      bs.bh was taken in previous ext4_xattr_block_find() call,
      it should be released before re-using
      
      Fixes: 7e01c8e5 ("ext3/4: fix uninitialized bs in ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.26
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d9902d4
    • Theodore Ts'o's avatar
      ext4: fix possible leak of sbi->s_group_desc_leak in error path · 761ff77d
      Theodore Ts'o authored
      commit 9e463084 upstream.
      
      Fixes: bfe0a5f4 ("ext4: add more mount time checks of the superblock")
      Reported-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 4.18
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      761ff77d
    • Theodore Ts'o's avatar
      ext4: avoid possible double brelse() in add_new_gdb() on error path · 6371a05a
      Theodore Ts'o authored
      commit 4f32c38b upstream.
      
      Fixes: b4097142 ("ext4: add error checking to calls to ...")
      Reported-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.38
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6371a05a
    • Vasily Averin's avatar
      ext4: fix missing cleanup if ext4_alloc_flex_bg_array() fails while resizing · 7f168837
      Vasily Averin authored
      commit f348e224 upstream.
      
      Fixes: 117fff10 ("ext4: grow the s_flex_groups array as needed ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.7
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f168837
    • Vasily Averin's avatar
      ext4: avoid buffer leak in ext4_orphan_add() after prior errors · e85d624f
      Vasily Averin authored
      commit feaf264c upstream.
      
      Fixes: d745a8c2 ("ext4: reduce contention on s_orphan_lock")
      Fixes: 6e3617e5 ("ext4: Handle non empty on-disk orphan link")
      Cc: Dmitry Monakhov <dmonakhov@gmail.com>
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.34
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e85d624f
    • Vasily Averin's avatar
      ext4: fix possible inode leak in the retry loop of ext4_resize_fs() · 36ab0ab9
      Vasily Averin authored
      commit db6aee62 upstream.
      
      Fixes: 1c6bd717 ("ext4: convert file system to meta_bg if needed ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.7
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      36ab0ab9
    • Vasily Averin's avatar
      ext4: avoid potential extra brelse in setup_new_flex_group_blocks() · d507dfb5
      Vasily Averin authored
      commit 9e402893 upstream.
      
      Currently bh is set to NULL only during first iteration of for cycle,
      then this pointer is not cleared after end of using.
      Therefore rollback after errors can lead to extra brelse(bh) call,
      decrements bh counter and later trigger an unexpected warning in __brelse()
      
      Patch moves brelse() calls in body of cycle to exclude requirement of
      brelse() call in rollback.
      
      Fixes: 33afdcc5 ("ext4: add a function which sets up group blocks ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.3+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d507dfb5
    • Vasily Averin's avatar
      ext4: add missing brelse() add_new_gdb_meta_bg()'s error path · 47aa49f6
      Vasily Averin authored
      commit 61a9c11e upstream.
      
      Fixes: 01f795f9 ("ext4: add online resizing support for meta_bg ...")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 3.7
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      47aa49f6
    • Vasily Averin's avatar
      ext4: add missing brelse() in set_flexbg_block_bitmap()'s error path · 88cc2d51
      Vasily Averin authored
      commit cea57941 upstream.
      
      Fixes: 33afdcc5 ("ext4: add a function which sets up group blocks ...")
      Cc: stable@kernel.org # 3.3
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      88cc2d51
    • Vasily Averin's avatar
      ext4: add missing brelse() update_backups()'s error path · 2eefc694
      Vasily Averin authored
      commit ea0abbb6 upstream.
      
      Fixes: ac27a0ec ("ext4: initial copy of files from ext3")
      Signed-off-by: default avatarVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # 2.6.19
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2eefc694
    • Michael Kelley's avatar
      clockevents/drivers/i8253: Add support for PIT shutdown quirk · 989e44d6
      Michael Kelley authored
      commit 35b69a42 upstream.
      
      Add support for platforms where pit_shutdown() doesn't work because of a
      quirk in the PIT emulation. On these platforms setting the counter register
      to zero causes the PIT to start running again, negating the shutdown.
      
      Provide a global variable that controls whether the counter register is
      zero'ed, which platform specific code can override.
      Signed-off-by: default avatarMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
      Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org>
      Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>
      Cc: "virtualization@lists.linux-foundation.org" <virtualization@lists.linux-foundation.org>
      Cc: "jgross@suse.com" <jgross@suse.com>
      Cc: "akataria@vmware.com" <akataria@vmware.com>
      Cc: "olaf@aepfle.de" <olaf@aepfle.de>
      Cc: "apw@canonical.com" <apw@canonical.com>
      Cc: vkuznets <vkuznets@redhat.com>
      Cc: "jasowang@redhat.com" <jasowang@redhat.com>
      Cc: "marcelo.cerri@canonical.com" <marcelo.cerri@canonical.com>
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1541303219-11142-2-git-send-email-mikelley@microsoft.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      989e44d6
    • Filipe Manana's avatar
      Btrfs: fix data corruption due to cloning of eof block · f02c3c34
      Filipe Manana authored
      commit ac765f83 upstream.
      
      We currently allow cloning a range from a file which includes the last
      block of the file even if the file's size is not aligned to the block
      size. This is fine and useful when the destination file has the same size,
      but when it does not and the range ends somewhere in the middle of the
      destination file, it leads to corruption because the bytes between the EOF
      and the end of the block have undefined data (when there is support for
      discard/trimming they have a value of 0x00).
      
      Example:
      
       $ mkfs.btrfs -f /dev/sdb
       $ mount /dev/sdb /mnt
      
       $ export foo_size=$((256 * 1024 + 100))
       $ xfs_io -f -c "pwrite -S 0x3c 0 $foo_size" /mnt/foo
       $ xfs_io -f -c "pwrite -S 0xb5 0 1M" /mnt/bar
      
       $ xfs_io -c "reflink /mnt/foo 0 512K $foo_size" /mnt/bar
      
       $ od -A d -t x1 /mnt/bar
       0000000 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5
       *
       0524288 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c 3c
       *
       0786528 3c 3c 3c 3c 00 00 00 00 00 00 00 00 00 00 00 00
       0786544 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       *
       0790528 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5 b5
       *
       1048576
      
      The bytes in the range from 786532 (512Kb + 256Kb + 100 bytes) to 790527
      (512Kb + 256Kb + 4Kb - 1) got corrupted, having now a value of 0x00 instead
      of 0xb5.
      
      This is similar to the problem we had for deduplication that got recently
      fixed by commit de02b9f6 ("Btrfs: fix data corruption when
      deduplicating between different files").
      
      Fix this by not allowing such operations to be performed and return the
      errno -EINVAL to user space. This is what XFS is doing as well at the VFS
      level. This change however now makes us return -EINVAL instead of
      -EOPNOTSUPP for cases where the source range maps to an inline extent and
      the destination range's end is smaller then the destination file's size,
      since the detection of inline extents is done during the actual process of
      dropping file extent items (at __btrfs_drop_extents()). Returning the
      -EINVAL error is done early on and solely based on the input parameters
      (offsets and length) and destination file's size. This makes us consistent
      with XFS and anyone else supporting cloning since this case is now checked
      at a higher level in the VFS and is where the -EINVAL will be returned
      from starting with kernel 4.20 (the VFS changed was introduced in 4.20-rc1
      by commit 07d19dc9 ("vfs: avoid problematic remapping requests into
      partial EOF block"). So this change is more geared towards stable kernels,
      as it's unlikely the new VFS checks get removed intentionally.
      
      A test case for fstests follows soon, as well as an update to filter
      existing tests that expect -EOPNOTSUPP to accept -EINVAL as well.
      
      CC: <stable@vger.kernel.org> # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f02c3c34
    • H. Peter Anvin (Intel)'s avatar
      arch/alpha, termios: implement BOTHER, IBSHIFT and termios2 · d2db9e2e
      H. Peter Anvin (Intel) authored
      commit d0ffb805 upstream.
      
      Alpha has had c_ispeed and c_ospeed, but still set speeds in c_cflags
      using arbitrary flags. Because BOTHER is not defined, the general
      Linux code doesn't allow setting arbitrary baud rates, and because
      CBAUDEX == 0, we can have an array overrun of the baud_rate[] table in
      drivers/tty/tty_baudrate.c if (c_cflags & CBAUD) == 037.
      
      Resolve both problems by #defining BOTHER to 037 on Alpha.
      
      However, userspace still needs to know if setting BOTHER is actually
      safe given legacy kernels (does anyone actually care about that on
      Alpha anymore?), so enable the TCGETS2/TCSETS*2 ioctls on Alpha, even
      though they use the same structure. Define struct termios2 just for
      compatibility; it is the exact same structure as struct termios. In a
      future patchset, this will be cleaned up so the uapi headers are
      usable from libc.
      Signed-off-by: default avatarH. Peter Anvin (Intel) <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Eugene Syromiatnikov <esyr@redhat.com>
      Cc: <linux-alpha@vger.kernel.org>
      Cc: <linux-serial@vger.kernel.org>
      Cc: Johan Hovold <johan@kernel.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2db9e2e
    • H. Peter Anvin's avatar
      termios, tty/tty_baudrate.c: fix buffer overrun · 505bc0f3
      H. Peter Anvin authored
      commit 991a2519 upstream.
      
      On architectures with CBAUDEX == 0 (Alpha and PowerPC), the code in tty_baudrate.c does
      not do any limit checking on the tty_baudrate[] array, and in fact a
      buffer overrun is possible on both architectures. Add a limit check to
      prevent that situation.
      
      This will be followed by a much bigger cleanup/simplification patch.
      Signed-off-by: default avatarH. Peter Anvin (Intel) <hpa@zytor.com>
      Requested-by: default avatarCc: Johan Hovold <johan@kernel.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Eugene Syromiatnikov <esyr@redhat.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      505bc0f3
    • Arnd Bergmann's avatar
      mtd: docg3: don't set conflicting BCH_CONST_PARAMS option · c7d2166d
      Arnd Bergmann authored
      commit be2e1c9d upstream.
      
      I noticed during the creation of another bugfix that the BCH_CONST_PARAMS
      option that is set by DOCG3 breaks setting variable parameters for any
      other users of the BCH library code.
      
      The only other user we have today is the MTD_NAND software BCH
      implementation (most flash controllers use hardware BCH these days
      and are not affected). I considered removing BCH_CONST_PARAMS entirely
      because of the inherent conflict, but according to the description in
      lib/bch.c there is a significant performance benefit in keeping it.
      
      To avoid the immediate problem of the conflict between MTD_NAND_BCH
      and DOCG3, this only sets the constant parameters if MTD_NAND_BCH
      is disabled, which should fix the problem for all cases that
      are affected. This should also work for all stable kernels.
      
      Note that there is only one machine that actually seems to use the
      DOCG3 driver (arch/arm/mach-pxa/mioa701.c), so most users should have
      the driver disabled, but it almost certainly shows up if we wanted
      to test random kernels on machines that use software BCH in MTD.
      
      Fixes: d13d19ec ("mtd: docg3: add ECC correction code")
      Cc: stable@vger.kernel.org
      Cc: Robert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7d2166d
    • Andrea Arcangeli's avatar
      mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings · 877813e0
      Andrea Arcangeli authored
      commit ac5b2c18 upstream.
      
      THP allocation might be really disruptive when allocated on NUMA system
      with the local node full or hard to reclaim.  Stefan has posted an
      allocation stall report on 4.12 based SLES kernel which suggests the
      same issue:
      
        kvm: page allocation stalls for 194572ms, order:9, mode:0x4740ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null)
        kvm cpuset=/ mems_allowed=0-1
        CPU: 10 PID: 84752 Comm: kvm Tainted: G        W 4.12.0+98-ph <a href="/view.php?id=1" title="[geschlossen] Integration Ramdisk" class="resolved">0000001</a> SLE15 (unreleased)
        Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 2.0 12/05/2017
        Call Trace:
         dump_stack+0x5c/0x84
         warn_alloc+0xe0/0x180
         __alloc_pages_slowpath+0x820/0xc90
         __alloc_pages_nodemask+0x1cc/0x210
         alloc_pages_vma+0x1e5/0x280
         do_huge_pmd_wp_page+0x83f/0xf00
         __handle_mm_fault+0x93d/0x1060
         handle_mm_fault+0xc6/0x1b0
         __do_page_fault+0x230/0x430
         do_page_fault+0x2a/0x70
         page_fault+0x7b/0x80
         [...]
        Mem-Info:
        active_anon:126315487 inactive_anon:1612476 isolated_anon:5
         active_file:60183 inactive_file:245285 isolated_file:0
         unevictable:15657 dirty:286 writeback:1 unstable:0
         slab_reclaimable:75543 slab_unreclaimable:2509111
         mapped:81814 shmem:31764 pagetables:370616 bounce:0
         free:32294031 free_pcp:6233 free_cma:0
        Node 0 active_anon:254680388kB inactive_anon:1112760kB active_file:240648kB inactive_file:981168kB unevictable:13368kB isolated(anon):0kB isolated(file):0kB mapped:280240kB dirty:1144kB writeback:0kB shmem:95832kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 81225728kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
        Node 1 active_anon:250583072kB inactive_anon:5337144kB active_file:84kB inactive_file:0kB unevictable:49260kB isolated(anon):20kB isolated(file):0kB mapped:47016kB dirty:0kB writeback:4kB shmem:31224kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 31897600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
      
      The defrag mode is "madvise" and from the above report it is clear that
      the THP has been allocated for MADV_HUGEPAGA vma.
      
      Andrea has identified that the main source of the problem is
      __GFP_THISNODE usage:
      
      : The problem is that direct compaction combined with the NUMA
      : __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very
      : hard the local node, instead of failing the allocation if there's no
      : THP available in the local node.
      :
      : Such logic was ok until __GFP_THISNODE was added to the THP allocation
      : path even with MPOL_DEFAULT.
      :
      : The idea behind the __GFP_THISNODE addition, is that it is better to
      : provide local memory in PAGE_SIZE units than to use remote NUMA THP
      : backed memory. That largely depends on the remote latency though, on
      : threadrippers for example the overhead is relatively low in my
      : experience.
      :
      : The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in
      : extremely slow qemu startup with vfio, if the VM is larger than the
      : size of one host NUMA node. This is because it will try very hard to
      : unsuccessfully swapout get_user_pages pinned pages as result of the
      : __GFP_THISNODE being set, instead of falling back to PAGE_SIZE
      : allocations and instead of trying to allocate THP on other nodes (it
      : would be even worse without vfio type1 GUP pins of course, except it'd
      : be swapping heavily instead).
      
      Fix this by removing __GFP_THISNODE for THP requests which are
      requesting the direct reclaim.  This effectivelly reverts 5265047a
      on the grounds that the zone/node reclaim was known to be disruptive due
      to premature reclaim when there was memory free.  While it made sense at
      the time for HPC workloads without NUMA awareness on rare machines, it
      was ultimately harmful in the majority of cases.  The existing behaviour
      is similar, if not as widespare as it applies to a corner case but
      crucially, it cannot be tuned around like zone_reclaim_mode can.  The
      default behaviour should always be to cause the least harm for the
      common case.
      
      If there are specialised use cases out there that want zone_reclaim_mode
      in specific cases, then it can be built on top.  Longterm we should
      consider a memory policy which allows for the node reclaim like behavior
      for the specific memory ranges which would allow a
      
      [1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@redhat.com
      
      Mel said:
      
      : Both patches look correct to me but I'm responding to this one because
      : it's the fix.  The change makes sense and moves further away from the
      : severe stalling behaviour we used to see with both THP and zone reclaim
      : mode.
      :
      : I put together a basic experiment with usemem configured to reference a
      : buffer multiple times that is 80% the size of main memory on a 2-socket
      : box with symmetric node sizes and defrag set to "always".  The defrag
      : setting is not the default but it would be functionally similar to
      : accessing a buffer with madvise(MADV_HUGEPAGE).  Usemem is configured to
      : reference the buffer multiple times and while it's not an interesting
      : workload, it would be expected to complete reasonably quickly as it fits
      : within memory.  The results were;
      :
      : usemem
      :                                   vanilla           noreclaim-v1
      : Amean     Elapsd-1       42.78 (   0.00%)       26.87 (  37.18%)
      : Amean     Elapsd-3       27.55 (   0.00%)        7.44 (  73.00%)
      : Amean     Elapsd-4        5.72 (   0.00%)        5.69 (   0.45%)
      :
      : This shows the elapsed time in seconds for 1 thread, 3 threads and 4
      : threads referencing buffers 80% the size of memory.  With the patches
      : applied, it's 37.18% faster for the single thread and 73% faster with two
      : threads.  Note that 4 threads showing little difference does not indicate
      : the problem is related to thread counts.  It's simply the case that 4
      : threads gets spread so their workload mostly fits in one node.
      :
      : The overall view from /proc/vmstats is more startling
      :
      :                          4.19.0-rc1  4.19.0-rc1
      :                             vanillanoreclaim-v1r1
      : Minor Faults               35593425      708164
      : Major Faults                 484088          36
      : Swap Ins                    3772837           0
      : Swap Outs                   3932295           0
      :
      : Massive amounts of swap in/out without the patch
      :
      : Direct pages scanned        6013214           0
      : Kswapd pages scanned              0           0
      : Kswapd pages reclaimed            0           0
      : Direct pages reclaimed      4033009           0
      :
      : Lots of reclaim activity without the patch
      :
      : Kswapd efficiency              100%        100%
      : Kswapd velocity               0.000       0.000
      : Direct efficiency               67%        100%
      : Direct velocity           11191.956       0.000
      :
      : Mostly from direct reclaim context as you'd expect without the patch.
      :
      : Page writes by reclaim  3932314.000       0.000
      : Page writes file                 19           0
      : Page writes anon            3932295           0
      : Page reclaim immediate        42336           0
      :
      : Writes from reclaim context is never good but the patch eliminates it.
      :
      : We should never have default behaviour to thrash the system for such a
      : basic workload.  If zone reclaim mode behaviour is ever desired but on a
      : single task instead of a global basis then the sensible option is to build
      : a mempolicy that enforces that behaviour.
      
      This was a severe regression compared to previous kernels that made
      important workloads unusable and it starts when __GFP_THISNODE was
      added to THP allocations under MADV_HUGEPAGE.  It is not a significant
      risk to go to the previous behavior before __GFP_THISNODE was added, it
      worked like that for years.
      
      This was simply an optimization to some lucky workloads that can fit in
      a single node, but it ended up breaking the VM for others that can't
      possibly fit in a single node, so going back is safe.
      
      [mhocko@suse.com: rewrote the changelog based on the one from Andrea]
      Link: http://lkml.kernel.org/r/20180925120326.24392-2-mhocko@kernel.org
      Fixes: 5265047a ("mm, thp: really limit transparent hugepage allocation to local node")
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reported-by: default avatarStefan Priebe <s.priebe@profihost.ag>
      Debugged-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Tested-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: <stable@vger.kernel.org>	[4.1+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      877813e0
    • Changwei Ge's avatar
      ocfs2: fix a misuse a of brelse after failing ocfs2_check_dir_entry · 298ed64f
      Changwei Ge authored
      commit 29aa3016 upstream.
      
      Somehow, file system metadata was corrupted, which causes
      ocfs2_check_dir_entry() to fail in function ocfs2_dir_foreach_blk_el().
      
      According to the original design intention, if above happens we should
      skip the problematic block and continue to retrieve dir entry.  But
      there is obviouse misuse of brelse around related code.
      
      After failure of ocfs2_check_dir_entry(), current code just moves to
      next position and uses the problematic buffer head again and again
      during which the problematic buffer head is released for multiple times.
      I suppose, this a serious issue which is long-lived in ocfs2.  This may
      cause other file systems which is also used in a the same host insane.
      
      So we should also consider about bakcporting this patch into linux
      -stable.
      
      Link: http://lkml.kernel.org/r/HK2PR06MB045211675B43EED794E597B6D56E0@HK2PR06MB0452.apcprd06.prod.outlook.comSigned-off-by: default avatarChangwei Ge <ge.changwei@h3c.com>
      Suggested-by: default avatarChangkuo Shi <shi.changkuo@h3c.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      298ed64f