1. 14 Sep, 2019 4 commits
    • Paolo Bonzini's avatar
      Merge tag 'kvm-s390-master-5.3-1' of... · a9c20bb0
      Paolo Bonzini authored
      Merge tag 'kvm-s390-master-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
      
      KVM: s390: Fixes for 5.3
      
      - prevent a user triggerable oops in the migration code
      - do not leak kernel stack content
      a9c20bb0
    • Sean Christopherson's avatar
      KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot · 002c5f73
      Sean Christopherson authored
      James Harvey reported a livelock that was introduced by commit
      d012a06a ("Revert "KVM: x86/mmu: Zap only the relevant pages when
      removing a memslot"").
      
      The livelock occurs because kvm_mmu_zap_all() as it exists today will
      voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
      to add shadow pages.  With enough vCPUs, kvm_mmu_zap_all() can get stuck
      in an infinite loop as it can never zap all pages before observing lock
      contention or the need to reschedule.  The equivalent of kvm_mmu_zap_all()
      that was in use at the time of the reverted commit (4e103134, "KVM:
      x86/mmu: Zap only the relevant pages when removing a memslot") employed
      a fast invalidate mechanism and was not susceptible to the above livelock.
      
      There are three ways to fix the livelock:
      
      - Reverting the revert (commit d012a06a) is not a viable option as
        the revert is needed to fix a regression that occurs when the guest has
        one or more assigned devices.  It's unlikely we'll root cause the device
        assignment regression soon enough to fix the regression timely.
      
      - Remove the conditional reschedule from kvm_mmu_zap_all().  However, although
        removing the reschedule would be a smaller code change, it's less safe
        in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
        the wild for flushing memslots since the fast invalidate mechanism was
        introduced by commit 6ca18b69 ("KVM: x86: use the fast way to
        invalidate all pages"), back in 2013.
      
      - Reintroduce the fast invalidate mechanism and use it when zapping shadow
        pages in response to a memslot being deleted/moved, which is what this
        patch does.
      
      For all intents and purposes, this is a revert of commit ea145aac
      ("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
      commit 7390de1e ("Revert "KVM: x86: use the fast way to invalidate
      all pages""), i.e. restores the behavior of commit 5304b8d3 ("KVM:
      MMU: fast invalidate all pages") and commit 6ca18b69 ("KVM: x86:
      use the fast way to invalidate all pages") respectively.
      
      Fixes: d012a06a ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
      Reported-by: default avatarJames Harvey <jamespharvey20@gmail.com>
      Cc: Alex Willamson <alex.williamson@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      002c5f73
    • Fuqian Huang's avatar
      KVM: x86: work around leak of uninitialized stack contents · 541ab2ae
      Fuqian Huang authored
      Emulation of VMPTRST can incorrectly inject a page fault
      when passed an operand that points to an MMIO address.
      The page fault will use uninitialized kernel stack memory
      as the CR2 and error code.
      
      The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
      exit to userspace; however, it is not an easy fix, so for now just ensure
      that the error code and CR2 are zero.
      Signed-off-by: default avatarFuqian Huang <huangfq.daxian@gmail.com>
      Cc: stable@vger.kernel.org
      [add comment]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      541ab2ae
    • Paolo Bonzini's avatar
      KVM: nVMX: handle page fault in vmread · f7eea636
      Paolo Bonzini authored
      The implementation of vmread to memory is still incomplete, as it
      lacks the ability to do vmread to I/O memory just like vmptrst.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f7eea636
  2. 13 Sep, 2019 2 commits
    • Linus Torvalds's avatar
      Merge branch 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · a7f89616
      Linus Torvalds authored
      Pull cgroup fix from Tejun Heo:
       "Roman found and fixed a bug in the cgroup2 freezer which allows new
        child cgroup to escape frozen state"
      
      * 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: freezer: fix frozen state inheritance
        kselftests: cgroup: add freezer mkdir test
      a7f89616
    • Linus Torvalds's avatar
      Merge tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 1b304a1a
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
       "Here are two fixes, one of them urgent fixing a bug introduced in 5.2
        and reported by many users. It took time to identify the root cause,
        catching the 5.3 release is higly desired also to push the fix to 5.2
        stable tree.
      
        The bug is a mess up of return values after adding proper error
        handling and honestly the kind of bug that can cause sleeping
        disorders until it's caught. My appologies to everybody who was
        affected.
      
        Summary of what could happen:
      
        1) either a hang when committing a transaction, if this happens
           there's no risk of corruption, still the hang is very inconvenient
           and can't be resolved without a reboot
      
        2) writeback for some btree nodes may never be started and we end up
           committing a transaction without noticing that, this is really
           serious and that will lead to the "parent transid verify failed"
           messages"
      
      * tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
        Btrfs: fix assertion failure during fsync and use of stale transaction
      1b304a1a
  3. 12 Sep, 2019 15 commits
    • Roman Gushchin's avatar
      cgroup: freezer: fix frozen state inheritance · 97a61369
      Roman Gushchin authored
      If a new child cgroup is created in the frozen cgroup hierarchy
      (one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
      flag should be set. Otherwise if a process will be attached to the
      child cgroup, it won't become frozen.
      
      The problem can be reproduced with the test_cgfreezer_mkdir test.
      
      This is the output before this patch:
        ~/test_freezer
        ok 1 test_cgfreezer_simple
        ok 2 test_cgfreezer_tree
        ok 3 test_cgfreezer_forkbomb
        Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
        not ok 4 test_cgfreezer_mkdir
        ok 5 test_cgfreezer_rmdir
        ok 6 test_cgfreezer_migrate
        ok 7 test_cgfreezer_ptrace
        ok 8 test_cgfreezer_stopped
        ok 9 test_cgfreezer_ptraced
        ok 10 test_cgfreezer_vfork
      
      And with this patch:
        ~/test_freezer
        ok 1 test_cgfreezer_simple
        ok 2 test_cgfreezer_tree
        ok 3 test_cgfreezer_forkbomb
        ok 4 test_cgfreezer_mkdir
        ok 5 test_cgfreezer_rmdir
        ok 6 test_cgfreezer_migrate
        ok 7 test_cgfreezer_ptrace
        ok 8 test_cgfreezer_stopped
        ok 9 test_cgfreezer_ptraced
        ok 10 test_cgfreezer_vfork
      Reported-by: default avatarMark Crossen <mcrossen@fb.com>
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Fixes: 76f969e8 ("cgroup: cgroup v2 freezer")
      Cc: Tejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org # v5.2+
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      97a61369
    • Roman Gushchin's avatar
      kselftests: cgroup: add freezer mkdir test · 44e9d308
      Roman Gushchin authored
      Add a new cgroup freezer selftest, which checks that if a cgroup is
      frozen, their new child cgroups will properly inherit the frozen
      state.
      
      It creates a parent cgroup, freezes it, creates a child cgroup
      and populates it with a dummy process. Then it checks that both
      parent and child cgroup are frozen.
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      44e9d308
    • Chris Wilson's avatar
      Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" · 505a8ec7
      Chris Wilson authored
      The userptr put_pages can be called from inside try_to_unmap, and so
      enters with the page lock held on one of the object's backing pages. We
      cannot take the page lock ourselves for fear of recursion.
      Reported-by: default avatarLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reported-by: default avatarMartin Wilck <Martin.Wilck@suse.com>
      Reported-by: default avatarLeo Kraav <leho@kraav.com>
      Fixes: aa56a292 ("drm/i915/userptr: Acquire the page lock around set_page_dirty()")
      References: https://bugzilla.kernel.org/show_bug.cgi?id=203317Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      505a8ec7
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux · 98dcb386
      Linus Torvalds authored
      Pull clone3 fix from Christian Brauner:
       "This is a last-minute bugfix for clone3() that should go in before we
        release 5.3 with clone3().
      
        clone3() did not verify that the exit_signal argument was set to a
        valid signal. This can be used to cause a crash by specifying a signal
        greater than NSIG. e.g. -1.
      
        The commit from Eugene adds a check to copy_clone_args_from_user() to
        verify that the exit signal is limited by CSIGNAL as with legacy
        clone() and that the signal is valid. With this we don't get the
        legacy clone behavior were an invalid signal could be handed down and
        would only be detected and then ignored in do_notify_parent(). Users
        of clone3() will now get a proper error right when they pass an
        invalid exit signal. Note, that this is not a change in user-visible
        behavior since no kernel with clone3() has been released yet"
      
      * tag 'for-linus-20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
        fork: block invalid exit signals with clone3()
      98dcb386
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 95217783
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "A KVM guest fix, and a kdump kernel relocation errors fix"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/timer: Force PIT initialization when !X86_FEATURE_ARAT
        x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors
      95217783
    • Eugene Syromiatnikov's avatar
      fork: block invalid exit signals with clone3() · a0eb9abd
      Eugene Syromiatnikov authored
      Previously, higher 32 bits of exit_signal fields were lost when copied
      to the kernel args structure (that uses int as a type for the respective
      field). Moreover, as Oleg has noted, exit_signal is used unchecked, so
      it has to be checked for sanity before use; for the legacy syscalls,
      applying CSIGNAL mask guarantees that it is at least non-negative;
      however, there's no such thing is done in clone3() code path, and that
      can break at least thread_group_leader.
      
      This commit adds a check to copy_clone_args_from_user() to verify that
      the exit signal is limited by CSIGNAL as with legacy clone() and that
      the signal is valid. With this we don't get the legacy clone behavior
      were an invalid signal could be handed down and would only be detected
      and ignored in do_notify_parent(). Users of clone3() will now get a
      proper error when they pass an invalid exit signal. Note, that this is
      not user-visible behavior since no kernel with clone3() has been
      released yet.
      
      The following program will cause a splat on a non-fixed clone3() version
      and will fail correctly on a fixed version:
      
       #define _GNU_SOURCE
       #include <linux/sched.h>
       #include <linux/types.h>
       #include <sched.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <sys/syscall.h>
       #include <sys/wait.h>
       #include <unistd.h>
      
       int main(int argc, char *argv[])
       {
              pid_t pid = -1;
              struct clone_args args = {0};
              args.exit_signal = -1;
      
              pid = syscall(__NR_clone3, &args, sizeof(struct clone_args));
              if (pid < 0)
                      exit(EXIT_FAILURE);
      
              if (pid == 0)
                      exit(EXIT_SUCCESS);
      
              wait(NULL);
      
              exit(EXIT_SUCCESS);
       }
      
      Fixes: 7f192e3c ("fork: add clone3")
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Suggested-by: default avatarOleg Nesterov <oleg@redhat.com>
      Suggested-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarEugene Syromiatnikov <esyr@redhat.com>
      Link: https://lore.kernel.org/r/4b38fa4ce420b119a4c6345f42fe3cec2de9b0b5.1568223594.git.esyr@redhat.com
      [christian.brauner@ubuntu.com: simplify check and rework commit message]
      Signed-off-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      a0eb9abd
    • Thomas Huth's avatar
      KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl · 53936b5b
      Thomas Huth authored
      When the userspace program runs the KVM_S390_INTERRUPT ioctl to inject
      an interrupt, we convert them from the legacy struct kvm_s390_interrupt
      to the new struct kvm_s390_irq via the s390int_to_s390irq() function.
      However, this function does not take care of all types of interrupts
      that we can inject into the guest later (see do_inject_vcpu()). Since we
      do not clear out the s390irq values before calling s390int_to_s390irq(),
      there is a chance that we copy random data from the kernel stack which
      could be leaked to the userspace later.
      
      Specifically, the problem exists with the KVM_S390_INT_PFAULT_INIT
      interrupt: s390int_to_s390irq() does not handle it, and the function
      __inject_pfault_init() later copies irq->u.ext which contains the
      random kernel stack data. This data can then be leaked either to
      the guest memory in __deliver_pfault_init(), or the userspace might
      retrieve it directly with the KVM_S390_GET_IRQ_STATE ioctl.
      
      Fix it by handling that interrupt type in s390int_to_s390irq(), too,
      and by making sure that the s390irq struct is properly pre-initialized.
      And while we're at it, make sure that s390int_to_s390irq() now
      directly returns -EINVAL for unknown interrupt types, so that we
      immediately get a proper error code in case we add more interrupt
      types to do_inject_vcpu() without updating s390int_to_s390irq()
      sometime in the future.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Link: https://lore.kernel.org/kvm/20190912115438.25761-1-thuth@redhat.comSigned-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      53936b5b
    • Filipe Manana's avatar
      Btrfs: fix unwritten extent buffers and hangs on future writeback attempts · 18dfa711
      Filipe Manana authored
      The lock_extent_buffer_io() returns 1 to the caller to tell it everything
      went fine and the callers needs to start writeback for the extent buffer
      (submit a bio, etc), 0 to tell the caller everything went fine but it does
      not need to start writeback for the extent buffer, and a negative value if
      some error happened.
      
      When it's about to return 1 it tries to lock all pages, and if a try lock
      on a page fails, and we didn't flush any existing bio in our "epd", it
      calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
      an error. The page might have been locked elsewhere, not with the goal
      of starting writeback of the extent buffer, and even by some code other
      than btrfs, like page migration for example, so it does not mean the
      writeback of the extent buffer was already started by some other task,
      so returning a 0 tells the caller (btree_write_cache_pages()) to not
      start writeback for the extent buffer. Note that epd might currently have
      either no bio, so flush_write_bio() returns 0 (success) or it might have
      a bio for another extent buffer with a lower index (logical address).
      
      Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
      extent buffer and writeback is never started for the extent buffer,
      future attempts to writeback the extent buffer will hang forever waiting
      on that bit to be cleared, since it can only be cleared after writeback
      completes. Such hang is reported with a trace like the following:
      
        [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
        [49887.347059]       Not tainted 5.2.13-gentoo #2
        [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [49887.347062] btrfs-transacti D    0  1752      2 0x80004000
        [49887.347064] Call Trace:
        [49887.347069]  ? __schedule+0x265/0x830
        [49887.347071]  ? bit_wait+0x50/0x50
        [49887.347072]  ? bit_wait+0x50/0x50
        [49887.347074]  schedule+0x24/0x90
        [49887.347075]  io_schedule+0x3c/0x60
        [49887.347077]  bit_wait_io+0x8/0x50
        [49887.347079]  __wait_on_bit+0x6c/0x80
        [49887.347081]  ? __lock_release.isra.29+0x155/0x2d0
        [49887.347083]  out_of_line_wait_on_bit+0x7b/0x80
        [49887.347084]  ? var_wake_function+0x20/0x20
        [49887.347087]  lock_extent_buffer_for_io+0x28c/0x390
        [49887.347089]  btree_write_cache_pages+0x18e/0x340
        [49887.347091]  do_writepages+0x29/0xb0
        [49887.347093]  ? kmem_cache_free+0x132/0x160
        [49887.347095]  ? convert_extent_bit+0x544/0x680
        [49887.347097]  filemap_fdatawrite_range+0x70/0x90
        [49887.347099]  btrfs_write_marked_extents+0x53/0x120
        [49887.347100]  btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
        [49887.347102]  btrfs_commit_transaction+0x6bb/0x990
        [49887.347103]  ? start_transaction+0x33e/0x500
        [49887.347105]  transaction_kthread+0x139/0x15c
      
      So fix this by not overwriting the return value (ret) with the result
      from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
      bit in case flush_write_bio() returns an error, otherwise it will hang
      any future attempts to writeback the extent buffer, and undo all work
      done before (set back EXTENT_BUFFER_DIRTY, etc).
      
      This is a regression introduced in the 5.2 kernel.
      
      Fixes: 2e3c2513 ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
      Fixes: f4340622 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
      Reported-by: default avatarZdenek Sojka <zsojka@seznam.cz>
      Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#uReported-by: default avatarStefan Priebe - Profihost AG <s.priebe@profihost.ag>
      Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#tReported-by: default avatarDrazen Kacar <drazen.kacar@oradian.com>
      Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      18dfa711
    • Filipe Manana's avatar
      Btrfs: fix assertion failure during fsync and use of stale transaction · 410f954c
      Filipe Manana authored
      Sometimes when fsync'ing a file we need to log that other inodes exist and
      when we need to do that we acquire a reference on the inodes and then drop
      that reference using iput() after logging them.
      
      That generally is not a problem except if we end up doing the final iput()
      (dropping the last reference) on the inode and that inode has a link count
      of 0, which can happen in a very short time window if the logging path
      gets a reference on the inode while it's being unlinked.
      
      In that case we end up getting the eviction callback, btrfs_evict_inode(),
      invoked through the iput() call chain which needs to drop all of the
      inode's items from its subvolume btree, and in order to do that, it needs
      to join a transaction at the helper function evict_refill_and_join().
      However because the task previously started a transaction at the fsync
      handler, btrfs_sync_file(), it has current->journal_info already pointing
      to a transaction handle and therefore evict_refill_and_join() will get
      that transaction handle from btrfs_join_transaction(). From this point on,
      two different problems can happen:
      
      1) evict_refill_and_join() will often change the transaction handle's
         block reserve (->block_rsv) and set its ->bytes_reserved field to a
         value greater than 0. If evict_refill_and_join() never commits the
         transaction, the eviction handler ends up decreasing the reference
         count (->use_count) of the transaction handle through the call to
         btrfs_end_transaction(), and after that point we have a transaction
         handle with a NULL ->block_rsv (which is the value prior to the
         transaction join from evict_refill_and_join()) and a ->bytes_reserved
         value greater than 0. If after the eviction/iput completes the inode
         logging path hits an error or it decides that it must fallback to a
         transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
         non-zero value from btrfs_log_dentry_safe(), and because of that
         non-zero value it tries to commit the transaction using a handle with
         a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
         the transaction commit hit an assertion failure at
         btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
         the ->block_rsv is NULL. The produced stack trace for that is like the
         following:
      
         [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
         [192922.917553] ------------[ cut here ]------------
         [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
         [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
         [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
         [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
         [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
         (...)
         [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
         [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
         [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
         [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
         [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
         [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
         [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
         [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
         [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
         [192922.925105] Call Trace:
         [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
         [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
         [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
         [192922.926731]  do_fsync+0x38/0x60
         [192922.927138]  __x64_sys_fdatasync+0x13/0x20
         [192922.927543]  do_syscall_64+0x60/0x1c0
         [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
         (...)
         [192922.934077] ---[ end trace f00808b12068168f ]---
      
      2) If evict_refill_and_join() decides to commit the transaction, it will
         be able to do it, since the nested transaction join only increments the
         transaction handle's ->use_count reference counter and it does not
         prevent the transaction from getting committed. This means that after
         eviction completes, the fsync logging path will be using a transaction
         handle that refers to an already committed transaction. What happens
         when using such a stale transaction can be unpredictable, we are at
         least having a use-after-free on the transaction handle itself, since
         the transaction commit will call kmem_cache_free() against the handle
         regardless of its ->use_count value, or we can end up silently losing
         all the updates to the log tree after that iput() in the logging path,
         or using a transaction handle that in the meanwhile was allocated to
         another task for a new transaction, etc, pretty much unpredictable
         what can happen.
      
      In order to fix both of them, instead of using iput() during logging, use
      btrfs_add_delayed_iput(), so that the logging path of fsync never drops
      the last reference on an inode, that step is offloaded to a safe context
      (usually the cleaner kthread).
      
      The assertion failure issue was sporadically triggered by the test case
      generic/475 from fstests, which loads the dm error target while fsstress
      is running, which lead to fsync failing while logging inodes with -EIO
      errors and then trying later to commit the transaction, triggering the
      assertion failure.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      410f954c
    • Igor Mammedov's avatar
      KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset() · 13a17cc0
      Igor Mammedov authored
      If userspace doesn't set KVM_MEM_LOG_DIRTY_PAGES on memslot before calling
      kvm_s390_vm_start_migration(), kernel will oops with:
      
        Unable to handle kernel pointer dereference in virtual kernel address space
        Failing address: 0000000000000000 TEID: 0000000000000483
        Fault in home space mode while using kernel ASCE.
        AS:0000000002a2000b R2:00000001bff8c00b R3:00000001bff88007 S:00000001bff91000 P:000000000000003d
        Oops: 0004 ilc:2 [#1] SMP
        ...
        Call Trace:
        ([<001fffff804ec552>] kvm_s390_vm_set_attr+0x347a/0x3828 [kvm])
         [<001fffff804ecfc0>] kvm_arch_vm_ioctl+0x6c0/0x1998 [kvm]
         [<001fffff804b67e4>] kvm_vm_ioctl+0x51c/0x11a8 [kvm]
         [<00000000008ba572>] do_vfs_ioctl+0x1d2/0xe58
         [<00000000008bb284>] ksys_ioctl+0x8c/0xb8
         [<00000000008bb2e2>] sys_ioctl+0x32/0x40
         [<000000000175552c>] system_call+0x2b8/0x2d8
        INFO: lockdep is turned off.
        Last Breaking-Event-Address:
         [<0000000000dbaf60>] __memset+0xc/0xa0
      
      due to ms->dirty_bitmap being NULL, which might crash the host.
      
      Make sure that ms->dirty_bitmap is set before using it or
      return -EINVAL otherwise.
      
      Cc: <stable@vger.kernel.org>
      Fixes: afdad616 ("KVM: s390: Fix storage attributes migration with memory slots")
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Link: https://lore.kernel.org/kvm/20190911075218.29153-1-imammedo@redhat.com/Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: default avatarClaudio Imbrenda <imbrenda@linux.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Reviewed-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Signed-off-by: default avatarJanosch Frank <frankja@linux.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      13a17cc0
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · ad32b480
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "Last minute bugfixes.
      
        A couple of security things.
      
        And an error handling bugfix that is never encountered by most people,
        but that also makes it kind of safe to push at the last minute, and it
        helps push the fix to stable a bit sooner"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vhost: make sure log_num < in_num
        vhost: block speculation of translated descriptors
        virtio_ring: fix unmap of indirect descriptors
      ad32b480
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6dcf6a4e
      Linus Torvalds authored
      Pull perf fix from Ingo Molnar:
       "Fix an initialization bug in the hw-breakpoints, which triggered on
        the ARM platform"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization
      6dcf6a4e
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 95779fe8
      Linus Torvalds authored
      Pull irq fix from Ingo Molnar:
       "Fix a race in the IRQ resend mechanism, which can result in a NULL
        dereference crash"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Prevent NULL pointer dereference in resend_irqs()
      95779fe8
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 840ce8f8
      Linus Torvalds authored
      Pull pin control fix from Linus Walleij:
       "Hopefully last pin control fix: a single patch for some Aspeed
        problems. The BMCs are much happier now"
      
      * tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: aspeed: Fix spurious mux failures on the AST2500
      840ce8f8
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 9c09f623
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
       "I don't really like to send so many fixes at the very last minute, but
        the bug-sport activity is unpredictable.
      
        Four fixes, three are -stable material that will go everywhere, one is
        for the current cycle:
      
         - An ACPI DSDT error fixup of the type we always see and Hans
           invariably gets to fix.
      
         - A OF quirk fix for the current release (v5.3)
      
         - Some consistency checks on the userspace ABI.
      
         - A memory leak"
      
      * tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist
        gpiolib: of: fix fallback quirks handling
        gpio: fix line flag validation in lineevent_create
        gpio: fix line flag validation in linehandle_create
        gpio: mockup: add missing single_release()
      9c09f623
  4. 11 Sep, 2019 4 commits
  5. 10 Sep, 2019 3 commits
  6. 09 Sep, 2019 5 commits
  7. 08 Sep, 2019 5 commits
  8. 07 Sep, 2019 2 commits
    • Linus Torvalds's avatar
      Revert "x86/apic: Include the LDR when clearing out APIC registers" · 950b07c1
      Linus Torvalds authored
      This reverts commit 558682b5.
      
      Chris Wilson reports that it breaks his CPU hotplug test scripts.  In
      particular, it breaks offlining and then re-onlining the boot CPU, which
      we treat specially (and the BIOS does too).
      
      The symptoms are that we can offline the CPU, but it then does not come
      back online again:
      
          smpboot: CPU 0 is now offline
          smpboot: Booting Node 0 Processor 0 APIC 0x0
          smpboot: do_boot_cpu failed(-1) to wakeup CPU#0
      
      Thomas says he knows why it's broken (my personal suspicion: our magic
      handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix
      is to just revert it, since we've never touched the LDR bits before, and
      it's not worth the risk to do anything else at this stage.
      
      [ Hotpluging of the boot CPU is special anyway, and should be off by
        default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the
        cpu0_hotplug kernel parameter.
      
        In general you should not do it, and it has various known limitations
        (hibernate and suspend require the boot CPU, for example).
      
        But it should work, even if the boot CPU is special and needs careful
        treatment       - Linus ]
      
      Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake-alporthouse-com/Reported-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Bandan Das <bsd@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      950b07c1
    • Arnd Bergmann's avatar
      ipc: fix sparc64 ipc() wrapper · fb377eb8
      Arnd Bergmann authored
      Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl
      to a commit from my y2038 series in linux-5.1, as I missed the custom
      sys_ipc() wrapper that sparc64 uses in place of the generic version that
      I patched.
      
      The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel
      now do not allow being called with the IPC_64 flag any more, resulting
      in a -EINVAL error when they don't recognize the command.
      
      Instead, the correct way to do this now is to call the internal
      ksys_old_{sem,shm,msg}ctl() functions to select the API version.
      
      As we generally move towards these functions anyway, change all of
      sparc_ipc() to consistently use those in place of the sys_*() versions,
      and move the required ksys_*() declarations into linux/syscalls.h
      
      The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link
      errors when ipc is disabled.
      Reported-by: default avatarMatt Turner <mattst88@gmail.com>
      Fixes: 275f2214 ("ipc: rename old-style shmctl/semctl/msgctl syscalls")
      Cc: stable@vger.kernel.org
      Tested-by: default avatarMatt Turner <mattst88@gmail.com>
      Tested-by: default avatarAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      fb377eb8