1. 09 Jun, 2019 40 commits
    • Roberto Sassu's avatar
      ima: show rules with IMA_INMASK correctly · b2164f6c
      Roberto Sassu authored
      commit 8cdc23a3 upstream.
      
      Show the '^' character when a policy rule has flag IMA_INMASK.
      
      Fixes: 80eae209 ("IMA: allow reading back the current IMA policy")
      Signed-off-by: default avatarRoberto Sassu <roberto.sassu@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2164f6c
    • Petr Vorel's avatar
      ima: fix wrong signed policy requirement when not appraising · 407eb63f
      Petr Vorel authored
      commit f4001947 upstream.
      
      Kernel booted just with ima_policy=tcb (not with
      ima_policy=appraise_tcb) shouldn't require signed policy.
      
      Regression found with LTP test ima_policy.sh.
      
      Fixes: c52657d9 ("ima: refactor ima_init_policy()")
      Cc: stable@vger.kernel.org  (linux-5.0)
      Signed-off-by: default avatarPetr Vorel <pvorel@suse.cz>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      407eb63f
    • Scott Wood's avatar
      x86/ima: Check EFI_RUNTIME_SERVICES before using · c59e2a64
      Scott Wood authored
      commit 558b523d upstream.
      
      Checking efi_enabled(EFI_BOOT) is not sufficient to ensure that
      EFI runtime services are available, e.g. if efi=noruntime is used.
      
      Without this, I get an oops on a PREEMPT_RT kernel where efi=noruntime is
      the default.
      
      Fixes: 399574c6 ("x86/ima: retry detecting secure boot mode")
      Cc: stable@vger.kernel.org  (linux-5.0)
      Signed-off-by: default avatarScott Wood <swood@redhat.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c59e2a64
    • Jonathan Corbet's avatar
      doc: Cope with Sphinx logging deprecations · bd411a62
      Jonathan Corbet authored
      commit 096ea522 upstream.
      
      Recent versions of sphinx will emit messages like:
      
        Documentation/sphinx/kerneldoc.py:103:
           RemovedInSphinx20Warning: app.warning() is now deprecated.
           Use sphinx.util.logging instead.
      
      Switch to sphinx.util.logging to make this unsightly message go away.
      Alas, that interface was only added in version 1.6, so we have to add a
      version check to keep things working with older sphinxes.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd411a62
    • Jonathan Corbet's avatar
      doc: Cope with the deprecation of AutoReporter · 32a4c9e0
      Jonathan Corbet authored
      commit 2404dad1 upstream.
      
      AutoReporter is going away; recent versions of sphinx emit a warning like:
      
        Documentation/sphinx/kerneldoc.py:125:
            RemovedInSphinx20Warning: AutodocReporter is now deprecated.
            Use sphinx.util.docutils.switch_source_input() instead.
      
      Make the switch.  But switch_source_input() only showed up in 1.7, so we
      have to do ugly version checks to keep things working in older versions.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32a4c9e0
    • Jonathan Corbet's avatar
      docs: Fix conf.py for Sphinx 2.0 · d6fe20f2
      Jonathan Corbet authored
      commit 3bc80884 upstream.
      
      Our version check in Documentation/conf.py never envisioned a world where
      Sphinx moved beyond 1.x.  Now that the unthinkable has happened, fix our
      version check to handle higher version numbers correctly.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6fe20f2
    • Catalin Marinas's avatar
      arm64: Fix the arm64_personality() syscall wrapper redirection · ff6503c8
      Catalin Marinas authored
      commit 00377277 upstream.
      
      Following commit 4378a7d4 ("arm64: implement syscall wrappers"), the
      syscall function names gained the '__arm64_' prefix. Ensure that we
      have the correct #define for redirecting a default syscall through a
      wrapper.
      
      Fixes: 4378a7d4 ("arm64: implement syscall wrappers")
      Cc: <stable@vger.kernel.org> # 4.19.x-
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff6503c8
    • Suzuki K Poulose's avatar
      mm, compaction: make sure we isolate a valid PFN · 6b036c70
      Suzuki K Poulose authored
      commit e577c8b6 upstream.
      
      When we have holes in a normal memory zone, we could endup having
      cached_migrate_pfns which may not necessarily be valid, under heavy memory
      pressure with swapping enabled ( via __reset_isolation_suitable(),
      triggered by kswapd).
      
      Later if we fail to find a page via fast_isolate_freepages(), we may end
      up using the migrate_pfn we started the search with, as valid page.  This
      could lead to accessing NULL pointer derefernces like below, due to an
      invalid mem_section pointer.
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825]
       Mem abort info:
         ESR = 0x96000004
         Exception class = DABT (current EL), IL = 32 bits
         SET = 0, FnV = 0
         EA = 0, S1PTW = 0
       Data abort info:
         ISV = 0, ISS = 0x00000004
         CM = 0, WnR = 0
       user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9
       [0000000000000008] pgd=0000000000000000
       Internal error: Oops: 96000004 [#1] SMP
       ...
       CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
       Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018
       pstate: 60000005 (nZCv daif -PAN -UAO)
       pc : set_pfnblock_flags_mask+0x58/0xe8
       lr : compaction_alloc+0x300/0x950
       [...]
       Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5)
       Call trace:
        set_pfnblock_flags_mask+0x58/0xe8
        compaction_alloc+0x300/0x950
        migrate_pages+0x1a4/0xbb0
        compact_zone+0x750/0xde8
        compact_zone_order+0xd8/0x118
        try_to_compact_pages+0xb4/0x290
        __alloc_pages_direct_compact+0x84/0x1e0
        __alloc_pages_nodemask+0x5e0/0xe18
        alloc_pages_vma+0x1cc/0x210
        do_huge_pmd_anonymous_page+0x108/0x7c8
        __handle_mm_fault+0xdd4/0x1190
        handle_mm_fault+0x114/0x1c0
        __get_user_pages+0x198/0x3c0
        get_user_pages_unlocked+0xb4/0x1d8
        __gfn_to_pfn_memslot+0x12c/0x3b8
        gfn_to_pfn_prot+0x4c/0x60
        kvm_handle_guest_abort+0x4b0/0xcd8
        handle_exit+0x140/0x1b8
        kvm_arch_vcpu_ioctl_run+0x260/0x768
        kvm_vcpu_ioctl+0x490/0x898
        do_vfs_ioctl+0xc4/0x898
        ksys_ioctl+0x8c/0xa0
        __arm64_sys_ioctl+0x28/0x38
        el0_svc_common+0x74/0x118
        el0_svc_handler+0x38/0x78
        el0_svc+0x8/0xc
       Code: f8607840 f100001f 8b011401 9a801020 (f9400400)
       ---[ end trace af6a35219325a9b6 ]---
      
      The issue was reported on an arm64 server with 128GB with holes in the
      zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while
      running 100 KVM guest instances.
      
      This patch fixes the issue by ensuring that the page belongs to a valid
      PFN when we fallback to using the lower limit of the scan range upon
      failure in fast_isolate_freepages().
      
      Link: http://lkml.kernel.org/r/1558711908-15688-1-git-send-email-suzuki.poulose@arm.com
      Fixes: 5a811889 ("mm, compaction: use free lists to quickly locate a migration target")
      Signed-off-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Reported-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b036c70
    • Eric W. Biederman's avatar
      signal/arm64: Use force_sig not force_sig_fault for SIGKILL · 013d8fcb
      Eric W. Biederman authored
      commit d76cac67 upstream.
      
      I don't think this is userspace visible but SIGKILL does not have
      any si_codes that use the fault member of the siginfo union.  Correct
      this the simple way and call force_sig instead of force_sig_fault when
      the signal is SIGKILL.
      
      The two know places where synchronous SIGKILL are generated are
      do_bad_area and fpsimd_save.  The call paths to force_sig_fault are:
      do_bad_area
        arm64_force_sig_fault
          force_sig_fault
      force_signal_inject
        arm64_notify_die
          arm64_force_sig_fault
             force_sig_fault
      
      Which means correcting this in arm64_force_sig_fault is enough
      to ensure the arm64 code is not misusing the generic code, which
      could lead to maintenance problems later.
      
      Cc: stable@vger.kernel.org
      Cc: Dave Martin <Dave.Martin@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: af40ff68 ("arm64: signal: Ensure si_code is valid for all fault signals")
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      013d8fcb
    • Zhenliang Wei's avatar
      kernel/signal.c: trace_signal_deliver when signal_group_exit · 02ea8867
      Zhenliang Wei authored
      commit 98af37d6 upstream.
      
      In the fixes commit, removing SIGKILL from each thread signal mask and
      executing "goto fatal" directly will skip the call to
      "trace_signal_deliver".  At this point, the delivery tracking of the
      SIGKILL signal will be inaccurate.
      
      Therefore, we need to add trace_signal_deliver before "goto fatal" after
      executing sigdelset.
      
      Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info.
      
      Link: http://lkml.kernel.org/r/20190425025812.91424-1-weizhenliang@huawei.com
      Fixes: cf43a757 ("signal: Restore the stop PTRACE_EVENT_EXIT")
      Signed-off-by: default avatarZhenliang Wei <weizhenliang@huawei.com>
      Reviewed-by: default avatarChristian Brauner <christian@brauner.io>
      Reviewed-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Ivan Delalande <colona@arista.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Deepa Dinamani <deepa.kernel@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02ea8867
    • Nathan Chancellor's avatar
      kasan: initialize tag to 0xff in __kasan_kmalloc · 2257b040
      Nathan Chancellor authored
      commit 0600597c upstream.
      
      When building with -Wuninitialized and CONFIG_KASAN_SW_TAGS unset, Clang
      warns:
      
      mm/kasan/common.c:484:40: warning: variable 'tag' is uninitialized when
      used here [-Wuninitialized]
              kasan_unpoison_shadow(set_tag(object, tag), size);
                                                    ^~~
      
      set_tag ignores tag in this configuration but clang doesn't realize it at
      this point in its pipeline, as it points to arch_kasan_set_tag as being
      the point where it is used, which will later be expanded to (void
      *)(object) without a use of tag.  Initialize tag to 0xff, as it removes
      this warning and doesn't change the meaning of the code.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/465
      Link: http://lkml.kernel.org/r/20190502163057.6603-1-natechancellor@gmail.com
      Fixes: 7f94ffbc ("kasan: add hooks implementation for tag-based mode")
      Signed-off-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2257b040
    • Jiri Slaby's avatar
      memcg: make it work on sparse non-0-node systems · 77036834
      Jiri Slaby authored
      commit 3e858996 upstream.
      
      We have a single node system with node 0 disabled:
        Scanning NUMA topology in Northbridge 24
        Number of physical nodes 2
        Skipping disabled node 0
        Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
        NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]
      
      This causes crashes in memcg when system boots:
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
        #PF error: [normal kernel read fault]
      ...
        RIP: 0010:list_lru_add+0x94/0x170
      ...
        Call Trace:
         d_lru_add+0x44/0x50
         dput.part.34+0xfc/0x110
         __fput+0x108/0x230
         task_work_run+0x9f/0xc0
         exit_to_usermode_loop+0xf5/0x100
      
      It is reproducible as far as 4.12.  I did not try older kernels.  You have
      to have a new enough systemd, e.g.  241 (the reason is unknown -- was not
      investigated).  Cannot be reproduced with systemd 234.
      
      The system crashes because the size of lru array is never updated in
      memcg_update_all_list_lrus and the reads are past the zero-sized array,
      causing dereferences of random memory.
      
      The root cause are list_lru_memcg_aware checks in the list_lru code.  The
      test in list_lru_memcg_aware is broken: it assumes node 0 is always
      present, but it is not true on some systems as can be seen above.
      
      So fix this by avoiding checks on node 0.  Remember the memcg-awareness by
      a bool flag in struct list_lru.
      
      Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
      Fixes: 60d3fd32 ("list_lru: introduce per-memcg lists")
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Suggested-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Acked-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      77036834
    • Chris Down's avatar
      mm, memcg: consider subtrees in memory.events · f9610998
      Chris Down authored
      commit 9852ae3f upstream.
      
      memory.stat and other files already consider subtrees in their output, and
      we should too in order to not present an inconsistent interface.
      
      The current situation is fairly confusing, because people interacting with
      cgroups expect hierarchical behaviour in the vein of memory.stat,
      cgroup.events, and other files.  For example, this causes confusion when
      debugging reclaim events under low, as currently these always read "0" at
      non-leaf memcg nodes, which frequently causes people to misdiagnose breach
      behaviour.  The same confusion applies to other counters in this file when
      debugging issues.
      
      Aggregation is done at write time instead of at read-time since these
      counters aren't hot (unlike memory.stat which is per-page, so it does it
      at read time), and it makes sense to bundle this with the file
      notifications.
      
      After this patch, events are propagated up the hierarchy:
      
          [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
          low 0
          high 0
          max 0
          oom 0
          oom_kill 0
          [root@ktst ~]# systemd-run -p MemoryMax=1 true
          Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
          [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
          low 0
          high 0
          max 7
          oom 1
          oom_kill 1
      
      As this is a change in behaviour, this can be reverted to the old
      behaviour by mounting with the `memory_localevents' flag set.  However, we
      use the new behaviour by default as there's a lack of evidence that there
      are any current users of memory.events that would find this change
      undesirable.
      
      akpm: this is a behaviour change, so Cc:stable.  THis is so that
      forthcoming distros which use cgroup v2 are more likely to pick up the
      revised behaviour.
      
      Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.nameSigned-off-by: default avatarChris Down <chris@chrisdown.name>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9610998
    • Joe Burmeister's avatar
      tty: max310x: Fix external crystal register setup · 15ff2c6a
      Joe Burmeister authored
      commit 5d24f455 upstream.
      
      The datasheet states:
      
        Bit 4: ClockEnSet the ClockEn bit high to enable an external clocking
      (crystal or clock generator at XIN). Set the ClockEn bit to 0 to disable
      clocking
        Bit 1: CrystalEnSet the CrystalEn bit high to enable the crystal
      oscillator. When using an external clock source at XIN, CrystalEn must
      be set low.
      
      The bit 4, MAX310X_CLKSRC_EXTCLK_BIT, should be set and was not.
      
      This was required to make the MAX3107 with an external crystal on our
      board able to send or receive data.
      Signed-off-by: default avatarJoe Burmeister <joe.burmeister@devtank.co.uk>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15ff2c6a
    • Jorge Ramirez-Ortiz's avatar
      tty: serial: msm_serial: Fix XON/XOFF · 5f436d8d
      Jorge Ramirez-Ortiz authored
      commit 61c0e379 upstream.
      
      When the tty layer requests the uart to throttle, the current code
      executing in msm_serial will trigger "Bad mode in Error Handler" and
      generate an invalid stack frame in pstore before rebooting (that is if
      pstore is indeed configured: otherwise the user shall just notice a
      reboot with no further information dumped to the console).
      
      This patch replaces the PIO byte accessor with the word accessor
      already used in PIO mode.
      
      Fixes: 68252424 ("tty: serial: msm: Support big-endian CPUs")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
      Reviewed-by: default avatarBjorn Andersson <bjorn.andersson@linaro.org>
      Reviewed-by: default avatarStephen Boyd <swboyd@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f436d8d
    • Masahisa Kojima's avatar
      i2c: synquacer: fix synquacer_i2c_doxfer() return value · 38338dca
      Masahisa Kojima authored
      commit ff937890 upstream.
      
      master_xfer should return the number of messages successfully
      processed.
      
      Fixes: 0d676a6c ("i2c: add support for Socionext SynQuacer I2C controller")
      Cc: <stable@vger.kernel.org> # v4.19+
      Signed-off-by: default avatarOkamoto Satoru <okamoto.satoru@socionext.com>
      Signed-off-by: default avatarMasahisa Kojima <masahisa.kojima@linaro.org>
      Acked-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38338dca
    • Vadim Pasternak's avatar
      i2c: mlxcpld: Fix wrong initialization order in probe · 1d7920d3
      Vadim Pasternak authored
      commit 13067ef7 upstream.
      
      Fix wrong order in probing routine initialization - field `base_addr'
      is used before it's initialized. Move assignment of 'priv->base_addr`
      to the beginning, prior the call to mlxcpld_i2c_read_comm().
      Wrong order caused the first read of capability register to be executed
      at wrong offset 0x0 instead of 0x2000. By chance it was a "good
      garbage" at 0x0 offset.
      
      Fixes: 313ce648 ("i2c: mlxcpld: Add support for extended transaction length for i2c-mlxcpld")
      Signed-off-by: default avatarVadim Pasternak <vadimp@mellanox.com>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d7920d3
    • Lyude Paul's avatar
      drm/nouveau/i2c: Disable i2c bus access after ->fini() · cd68344b
      Lyude Paul authored
      commit 342406e4 upstream.
      
      For a while, we've had the problem of i2c bus access not grabbing
      a runtime PM ref when it's being used in userspace by i2c-dev, resulting
      in nouveau spamming the kernel log with errors if anything attempts to
      access the i2c bus while the GPU is in runtime suspend. An example:
      
      [  130.078386] nouveau 0000:01:00.0: i2c: aux 000d: begin idle timeout ffffffff
      
      Since the GPU is in runtime suspend, the MMIO region that the i2c bus is
      on isn't accessible. On x86, the standard behavior for accessing an
      unavailable MMIO region is to just return ~0.
      
      Except, that turned out to be a lie. While computers with a clean
      concious will return ~0 in this scenario, some machines will actually
      completely hang a CPU on certian bad MMIO accesses. This was witnessed
      with someone's Lenovo ThinkPad P50, where sensors-detect attempting to
      access the i2c bus while the GPU was suspended would result in a CPU
      hang:
      
        CPU: 5 PID: 12438 Comm: sensors-detect Not tainted 5.0.0-0.rc4.git3.1.fc30.x86_64 #1
        Hardware name: LENOVO 20EQS64N17/20EQS64N17, BIOS N1EET74W (1.47 ) 11/21/2017
        RIP: 0010:ioread32+0x2b/0x30
        Code: 81 ff ff ff 03 00 77 20 48 81 ff 00 00 01 00 76 05 0f b7 d7 ed c3
        48 c7 c6 e1 0c 36 96 e8 2d ff ff ff b8 ff ff ff ff c3 8b 07 <c3> 0f 1f
        40 00 49 89 f0 48 81 fe ff ff 03 00 76 04 40 88 3e c3 48
        RSP: 0018:ffffaac3c5007b48 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13
        RAX: 0000000001111000 RBX: 0000000001111000 RCX: 0000043017a97186
        RDX: 0000000000000aaa RSI: 0000000000000005 RDI: ffffaac3c400e4e4
        RBP: ffff9e6443902c00 R08: ffffaac3c400e4e4 R09: ffffaac3c5007be7
        R10: 0000000000000004 R11: 0000000000000001 R12: ffff9e6445dd0000
        R13: 000000000000e4e4 R14: 00000000000003c4 R15: 0000000000000000
        FS:  00007f253155a740(0000) GS:ffff9e644f600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00005630d1500358 CR3: 0000000417c44006 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         g94_i2c_aux_xfer+0x326/0x850 [nouveau]
         nvkm_i2c_aux_i2c_xfer+0x9e/0x140 [nouveau]
         __i2c_transfer+0x14b/0x620
         i2c_smbus_xfer_emulated+0x159/0x680
         ? _raw_spin_unlock_irqrestore+0x1/0x60
         ? rt_mutex_slowlock.constprop.0+0x13d/0x1e0
         ? __lock_is_held+0x59/0xa0
         __i2c_smbus_xfer+0x138/0x5a0
         i2c_smbus_xfer+0x4f/0x80
         i2cdev_ioctl_smbus+0x162/0x2d0 [i2c_dev]
         i2cdev_ioctl+0x1db/0x2c0 [i2c_dev]
         do_vfs_ioctl+0x408/0x750
         ksys_ioctl+0x5e/0x90
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x60/0x1e0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7f25317f546b
        Code: 0f 1e fa 48 8b 05 1d da 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
        ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
        f0 ff ff 73 01 c3 48 8b 0d ed d9 0c 00 f7 d8 64 89 01 48
        RSP: 002b:00007ffc88caab68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        RAX: ffffffffffffffda RBX: 00005630d0fe7260 RCX: 00007f25317f546b
        RDX: 00005630d1598e80 RSI: 0000000000000720 RDI: 0000000000000003
        RBP: 00005630d155b968 R08: 0000000000000001 R09: 00005630d15a1da0
        R10: 0000000000000070 R11: 0000000000000246 R12: 00005630d1598e80
        R13: 00005630d12f3d28 R14: 0000000000000720 R15: 00005630d12f3ce0
        watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sensors-detect:12438]
      
      Yikes! While I wanted to try to make it so that accessing an i2c bus on
      nouveau would wake up the GPU as needed, airlied pointed out that pretty
      much any usecase for userspace accessing an i2c bus on a GPU (mainly for
      the DDC brightness control that some displays have) is going to only be
      useful while there's at least one display enabled on the GPU anyway, and
      the GPU never sleeps while there's displays running.
      
      Since teaching the i2c bus to wake up the GPU on userspace accesses is a
      good deal more difficult than it might seem, mostly due to the fact that
      we have to use the i2c bus during runtime resume of the GPU, we instead
      opt for the easiest solution: don't let userspace access i2c busses on
      the GPU at all while it's in runtime suspend.
      
      Changes since v1:
      * Also disable i2c busses that run over DP AUX
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd68344b
    • Thomas Huth's avatar
      KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID · 1f41c93a
      Thomas Huth authored
      commit a86cb413 upstream.
      
      KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all
      architectures. However, on s390x, the amount of usable CPUs is determined
      during runtime - it is depending on the features of the machine the code
      is running on. Since we are using the vcpu_id as an index into the SCA
      structures that are defined by the hardware (see e.g. the sca_add_vcpu()
      function), it is not only the amount of CPUs that is limited by the hard-
      ware, but also the range of IDs that we can use.
      Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too.
      So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common
      code into the architecture specific code, and on s390x we have to return
      the same value here as for KVM_CAP_MAX_VCPUS.
      This problem has been discovered with the kvm_create_max_vcpus selftest.
      With this change applied, the selftest now passes on s390x, too.
      Reviewed-by: default avatarAndrew Jones <drjones@redhat.com>
      Reviewed-by: default avatarCornelia Huck <cohuck@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Message-Id: <20190523164309.13345-9-thuth@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      1f41c93a
    • Hui Wang's avatar
      ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops · 8f3b8eea
      Hui Wang authored
      commit 9cb40eb1 upstream.
      
      We met another Acer Aspire laptop which has the problem on the
      headset-mic, the Pin 0x19 is not set the corret configuration for a
      mic and the pin presence can't be detected too after plugging a
      headset. Kailang suggested that we should set the coeff to enable the
      mic and apply the ALC269_FIXUP_LIFEBOOK_EXTMIC. After doing that,
      both headset-mic presence and headset-mic work well.
      
      The existing ALC255_FIXUP_ACER_MIC_NO_PRESENCE set the headset-mic
      jack to be a phantom jack. Now since the jack can support presence
      unsol event, let us imporve it to set the jack to be a normal jack.
      
      https://bugs.launchpad.net/bugs/1821269
      Fixes: 5824ce8d ("ALSA: hda/realtek - Add support for Acer Aspire E5-475 headset mic")
      Cc: Chris Chiu <chiu@endlessm.com>
      CC: Daniel Drake <drake@endlessm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f3b8eea
    • Kailang Yang's avatar
      ALSA: hda/realtek - Set default power save node to 0 · 0031bae9
      Kailang Yang authored
      commit 317d9313 upstream.
      
      I measured power consumption between power_save_node=1 and power_save_node=0.
      It's almost the same.
      Codec will enter to runtime suspend and suspend.
      That pin also will enter to D3. Don't need to enter to D3 by single pin.
      So, Disable power_save_node as default. It will avoid more issues.
      Windows Driver also has not this option at runtime PM.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0031bae9
    • Takashi Iwai's avatar
      ALSA: line6: Assure canceling delayed work at disconnection · cf02204f
      Takashi Iwai authored
      commit 0b074ab7 upstream.
      
      The current code performs the cancel of a delayed work at the late
      stage of disconnection procedure, which may lead to the access to the
      already cleared state.
      
      This patch assures to call cancel_delayed_work_sync() at the beginning
      of the disconnection procedure for avoiding that race.  The delayed
      work object is now assigned in the common line6 object instead of its
      derivative, so that we can call cancel_delayed_work_sync().
      
      Along with the change, the startup function is called via the new
      callback instead.  This will make it easier to port other LINE6
      drivers to use the delayed work for startup in later patches.
      
      Reported-by: syzbot+5255458d5e0a2b10bbb9@syzkaller.appspotmail.com
      Fixes: 7f84ff68 ("ALSA: line6: toneport: Fix broken usage of timer for delayed execution")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cf02204f
    • Thiago Jung Bauermann's avatar
      powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load() · e1801d51
      Thiago Jung Bauermann authored
      commit 8b909e35 upstream.
      
      Commit b6664ba4 ("s390, kexec_file: drop arch_kexec_mem_walk()")
      changed kexec_add_buffer() to skip searching for a memory location if
      kexec_buf.mem is already set, and use the address that is there.
      
      In powerpc code we reuse a kexec_buf variable for loading both the
      kernel and the initramfs by resetting some of the fields between those
      uses, but not mem. This causes kexec_add_buffer() to try to load the
      kernel at the same address where initramfs will be loaded, which is
      naturally rejected:
      
        # kexec -s -l --initrd initramfs vmlinuz
        kexec_file_load failed: Invalid argument
      
      Setting the mem field before every call to kexec_add_buffer() fixes
      this regression.
      
      Fixes: b6664ba4 ("s390, kexec_file: drop arch_kexec_mem_walk()")
      Cc: stable@vger.kernel.org # v5.0+
      Signed-off-by: default avatarThiago Jung Bauermann <bauerman@linux.ibm.com>
      Reviewed-by: default avatarDave Young <dyoung@redhat.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e1801d51
    • Ravi Bangoria's avatar
      powerpc/perf: Fix MMCRA corruption by bhrb_filter · 72294049
      Ravi Bangoria authored
      commit 3202e35e upstream.
      
      Consider a scenario where user creates two events:
      
        1st event:
          attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
          attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
          fd = perf_event_open(attr, 0, 1, -1, 0);
      
        This sets cpuhw->bhrb_filter to 0 and returns valid fd.
      
        2nd event:
          attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
          attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
          fd = perf_event_open(attr, 0, 1, -1, 0);
      
        It overrides cpuhw->bhrb_filter to -1 and returns with error.
      
      Now if power_pmu_enable() gets called by any path other than
      power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1.
      
      Fixes: 3925f46b ("powerpc/perf: Enable branch stack sampling framework")
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: default avatarRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Reviewed-by: default avatarMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72294049
    • Suraj Jitindar Singh's avatar
      KVM: PPC: Book3S HV: Restore SPRG3 in kvmhv_p9_guest_entry() · effe3c9c
      Suraj Jitindar Singh authored
      commit d724c9e5 upstream.
      
      The sprgs are a set of 4 general purpose sprs provided for software use.
      SPRG3 is special in that it can also be read from userspace. Thus it is
      used on linux to store the cpu and numa id of the process to speed up
      syscall access to this information.
      
      This register is overwritten with the guest value on kvm guest entry,
      and so needs to be restored on exit again. Thus restore the value on
      the guest exit path in kvmhv_p9_guest_entry().
      
      Cc: stable@vger.kernel.org # v4.20+
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Signed-off-by: default avatarSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      effe3c9c
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Fix lockdep warning when entering guest on POWER9 · b297e5b0
      Paul Mackerras authored
      commit 1b28d553 upstream.
      
      Commit 3309bec8 ("KVM: PPC: Book3S HV: Fix lockdep warning when
      entering the guest") moved calls to trace_hardirqs_{on,off} in the
      entry path used for HPT guests.  Similar code exists in the new
      streamlined entry path used for radix guests on POWER9.  This makes
      the same change there, so as to avoid lockdep warnings such as this:
      
      [  228.686461] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      [  228.686480] WARNING: CPU: 116 PID: 3803 at ../kernel/locking/lockdep.c:4219 check_flags.part.23+0x21c/0x270
      [  228.686544] Modules linked in: vhost_net vhost xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat
      +xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter
      +ebtables ip6table_filter ip6_tables iptable_filter fuse kvm_hv kvm at24 ipmi_powernv regmap_i2c ipmi_devintf
      +uio_pdrv_genirq ofpart ipmi_msghandler uio powernv_flash mtd ibmpowernv opal_prd ip_tables ext4 mbcache jbd2 btrfs
      +zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor
      +raid6_pq raid1 raid0 ses sd_mod enclosure scsi_transport_sas ast i2c_opal i2c_algo_bit drm_kms_helper syscopyarea
      +sysfillrect sysimgblt fb_sys_fops ttm drm i40e e1000e cxl aacraid tg3 drm_panel_orientation_quirks i2c_core
      [  228.686859] CPU: 116 PID: 3803 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc1-xive+ #42
      [  228.686911] NIP:  c0000000001b394c LR: c0000000001b3948 CTR: c000000000bfad20
      [  228.686963] REGS: c000200cdb50f570 TRAP: 0700   Not tainted  (5.2.0-rc1-xive+)
      [  228.687001] MSR:  9000000002823033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 48222222  XER: 20040000
      [  228.687060] CFAR: c000000000116db0 IRQMASK: 1
      [  228.687060] GPR00: c0000000001b3948 c000200cdb50f800 c0000000015e7600 000000000000002e
      [  228.687060] GPR04: 0000000000000001 c0000000001c71a0 000000006e655f73 72727563284e4f5f
      [  228.687060] GPR08: 0000200e60680000 0000000000000000 c000200cdb486180 0000000000000000
      [  228.687060] GPR12: 0000000000002000 c000200fff61a680 0000000000000000 00007fffb75c0000
      [  228.687060] GPR16: 0000000000000000 0000000000000000 c0000000017d6900 c000000001124900
      [  228.687060] GPR20: 0000000000000074 c008000006916f68 0000000000000074 0000000000000074
      [  228.687060] GPR24: ffffffffffffffff ffffffffffffffff 0000000000000003 c000200d4b600000
      [  228.687060] GPR28: c000000001627e58 c000000001489908 c000000001627e58 c000000002304de0
      [  228.687377] NIP [c0000000001b394c] check_flags.part.23+0x21c/0x270
      [  228.687415] LR [c0000000001b3948] check_flags.part.23+0x218/0x270
      [  228.687466] Call Trace:
      [  228.687488] [c000200cdb50f800] [c0000000001b3948] check_flags.part.23+0x218/0x270 (unreliable)
      [  228.687542] [c000200cdb50f870] [c0000000001b6548] lock_is_held_type+0x188/0x1c0
      [  228.687595] [c000200cdb50f8d0] [c0000000001d939c] rcu_read_lock_sched_held+0xdc/0x100
      [  228.687646] [c000200cdb50f900] [c0000000001dd704] rcu_note_context_switch+0x304/0x340
      [  228.687701] [c000200cdb50f940] [c0080000068fcc58] kvmhv_run_single_vcpu+0xdb0/0x1120 [kvm_hv]
      [  228.687756] [c000200cdb50fa20] [c0080000068fd5b0] kvmppc_vcpu_run_hv+0x5e8/0xe40 [kvm_hv]
      [  228.687816] [c000200cdb50faf0] [c0080000071797dc] kvmppc_vcpu_run+0x34/0x48 [kvm]
      [  228.687863] [c000200cdb50fb10] [c0080000071755dc] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
      [  228.687916] [c000200cdb50fba0] [c008000007165ccc] kvm_vcpu_ioctl+0x424/0x838 [kvm]
      [  228.687957] [c000200cdb50fd10] [c000000000433a24] do_vfs_ioctl+0xd4/0xcd0
      [  228.687995] [c000200cdb50fdb0] [c000000000434724] ksys_ioctl+0x104/0x120
      [  228.688033] [c000200cdb50fe00] [c000000000434768] sys_ioctl+0x28/0x80
      [  228.688072] [c000200cdb50fe20] [c00000000000b888] system_call+0x5c/0x70
      [  228.688109] Instruction dump:
      [  228.688142] 4bf6342d 60000000 0fe00000 e8010080 7c0803a6 4bfffe60 3c82ff87 3c62ff87
      [  228.688196] 388472d0 3863d738 4bf63405 60000000 <0fe00000> 4bffff4c 3c82ff87 3c62ff87
      [  228.688251] irq event stamp: 205
      [  228.688287] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
      [  228.688344] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
      [  228.688412] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
      [  228.688464] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210
      [  228.688513] ---[ end trace eb16f6260022a812 ]---
      [  228.688548] possible reason: unannotated irqs-off.
      [  228.688571] irq event stamp: 205
      [  228.688607] hardirqs last  enabled at (205): [<c0080000068fc1b4>] kvmhv_run_single_vcpu+0x30c/0x1120 [kvm_hv]
      [  228.688664] hardirqs last disabled at (204): [<c0080000068fbff0>] kvmhv_run_single_vcpu+0x148/0x1120 [kvm_hv]
      [  228.688719] softirqs last  enabled at (180): [<c000000000c0b2ac>] __do_softirq+0x4ac/0x5d4
      [  228.688758] softirqs last disabled at (169): [<c000000000122aa8>] irq_exit+0x1f8/0x210
      
      Cc: stable@vger.kernel.org # v4.20+
      Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests")
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: default avatarCédric Le Goater <clg@kaod.org>
      Tested-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b297e5b0
    • Cédric Le Goater's avatar
      KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interrupts · b625c158
      Cédric Le Goater authored
      commit ef974020 upstream.
      
      The passthrough interrupts are defined at the host level and their IRQ
      data should not be cleared unless specifically deconfigured (shutdown)
      by the host. They differ from the IPI interrupts which are allocated
      by the XIVE KVM device and reserved to the guest usage only.
      
      This fixes a host crash when destroying a VM in which a PCI adapter
      was passed-through. In this case, the interrupt is cleared and freed
      by the KVM device and then shutdown by vfio at the host level.
      
      [ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00
      [ 1007.360285] Faulting instruction address: 0xc00000000009da34
      [ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1]
      [ 1007.360303] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
      [ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core
      [ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4
      [ 1007.360454] NIP:  c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0
      [ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300   Not tainted  (5.1.0-gad7e7d0ef)
      [ 1007.360500] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
      [ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1
      [ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0
      [ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200
      [ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd
      [ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718
      [ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100
      [ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027
      [ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c
      [ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0
      [ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120
      [ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50
      [ 1007.360732] Call Trace:
      [ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable)
      [ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0
      [ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0
      [ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420
      [ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0
      [ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350
      [ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0
      [ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110
      [ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0
      [ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170
      [ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0
      [ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90
      [ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70
      [ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0
      [ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0
      [ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00
      [ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100
      [ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0
      [ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430
      [ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74
      [ 1007.361159] Instruction dump:
      [ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020
      [ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022
      
      Cc: stable@vger.kernel.org # v4.12+
      Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
      Signed-off-by: default avatarCédric Le Goater <clg@kaod.org>
      Signed-off-by: default avatarGreg Kurz <groug@kaod.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b625c158
    • Harald Freudenberger's avatar
      s390/crypto: fix possible sleep during spinlock aquired · c0fc8aff
      Harald Freudenberger authored
      commit 1c2c7029 upstream.
      
      This patch fixes a complain about possible sleep during
      spinlock aquired
      "BUG: sleeping function called from invalid context at
      include/crypto/algapi.h:426"
      for the ctr(aes) and ctr(des) s390 specific ciphers.
      
      Instead of using a spinlock this patch introduces a mutex
      which is save to be held in sleeping context. Please note
      a deadlock is not possible as mutex_trylock() is used.
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Reported-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0fc8aff
    • Harald Freudenberger's avatar
      s390/crypto: fix gcm-aes-s390 selftest failures · 0de686f7
      Harald Freudenberger authored
      commit bef9f0ba upstream.
      
      The current kernel uses improved crypto selftests. These
      tests showed that the current implementation of gcm-aes-s390
      is not able to deal with chunks of output buffers which are
      not a multiple of 16 bytes. This patch introduces a rework
      of the gcm aes s390 scatter walk handling which now is able
      to handle any input and output scatter list chunk sizes
      correctly.
      
      Code has been verified by the crypto selftests, the tcrypt
      kernel module and additional tests ran via the af_alg interface.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarPatrick Steuer <steuer@linux.ibm.com>
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0de686f7
    • Sean Nyekjaer's avatar
      iio: adc: ti-ads8688: fix timestamp is not updated in buffer · 273ec0ff
      Sean Nyekjaer authored
      commit e6d12298 upstream.
      
      When using the hrtimer iio trigger timestamp isn't updated.
      If we use iio_get_time_ns it is updated correctly.
      
      Fixes: 2a864877 ("iio: adc: ti-ads8688: add trigger and buffer support")
      Signed-off-by: default avatarSean Nyekjaer <sean@geanix.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      273ec0ff
    • Tomer Maimon's avatar
      iio: adc: modify NPCM ADC read reference voltage · fb304841
      Tomer Maimon authored
      commit 4e63ed6b upstream.
      
      Checking if regulator is valid before reading
      NPCM ADC regulator voltage to avoid system crash
      in a case the regulator is not valid.
      Signed-off-by: default avatarTomer Maimon <tmaimon77@gmail.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb304841
    • Vincent Stehlé's avatar
      iio: adc: ads124: avoid buffer overflow · 6b84e38b
      Vincent Stehlé authored
      commit 0db8aa49 upstream.
      
      When initializing the priv->data array starting from index 1, there is one
      less element to consider than when initializing the full array.
      
      Fixes: e717f8c6 ("iio: adc: Add the TI ads124s08 ADC code")
      Signed-off-by: default avatarVincent Stehlé <vincent.stehle@laposte.net>
      Reviewed-by: default avatarMukesh Ojha <mojha@codeaurora.org>
      Reviewed-by: default avatarDan Murphy <dmurphy@ti.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b84e38b
    • Ruslan Babayev's avatar
      iio: dac: ds4422/ds4424 fix chip verification · 642ec93f
      Ruslan Babayev authored
      commit 60f22086 upstream.
      
      The ds4424_get_value function takes channel number as it's 3rd
      argument and translates it internally into I2C address using
      DS4424_DAC_ADDR macro. The caller ds4424_verify_chip was passing an
      already translated I2C address as its last argument.
      Signed-off-by: default avatarRuslan Babayev <ruslan@babayev.com>
      Cc: xe-linux-external@cisco.com
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <Jonathan.Cameron@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      642ec93f
    • Qu Wenruo's avatar
      btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON() · 19b65aac
      Qu Wenruo authored
      commit 30d40577 upstream.
      
      [BUG]
      When a fs has orphan reloc tree along with unfinished balance:
        ...
              item 16 key (TREE_RELOC ROOT_ITEM FS_TREE) itemoff 12090 itemsize 439
                      generation 12 root_dirid 256 bytenr 300400640 level 1 refs 0 <<<
                      lastsnap 8 byte_limit 0 bytes_used 1359872 flags 0x0(none)
                      uuid 7c48d938-33a3-4aae-ab19-6e5c9d406e46
              item 17 key (BALANCE TEMPORARY_ITEM 0) itemoff 11642 itemsize 448
                      temporary item objectid BALANCE offset 0
                      balance status flags 14
      
      Then at mount time, we can hit the following kernel BUG_ON():
        BTRFS info (device dm-3): relocating block group 298844160 flags metadata|dup
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/relocation.c:1413!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 1 PID: 897 Comm: btrfs-balance Tainted: G           O      5.2.0-rc1-custom #15
        RIP: 0010:create_reloc_root+0x1eb/0x200 [btrfs]
        Call Trace:
         btrfs_init_reloc_root+0x96/0xb0 [btrfs]
         record_root_in_trans+0xb2/0xe0 [btrfs]
         btrfs_record_root_in_trans+0x55/0x70 [btrfs]
         select_reloc_root+0x7e/0x230 [btrfs]
         do_relocation+0xc4/0x620 [btrfs]
         relocate_tree_blocks+0x592/0x6a0 [btrfs]
         relocate_block_group+0x47b/0x5d0 [btrfs]
         btrfs_relocate_block_group+0x183/0x2f0 [btrfs]
         btrfs_relocate_chunk+0x4e/0xe0 [btrfs]
         btrfs_balance+0x864/0xfa0 [btrfs]
         balance_kthread+0x3b/0x50 [btrfs]
         kthread+0x123/0x140
         ret_from_fork+0x27/0x50
      
      [CAUSE]
      In btrfs, reloc trees are used to record swapped tree blocks during
      balance.
      Reloc tree either get merged (replace old tree blocks of its parent
      subvolume) in next transaction if its ref is 1 (fresh).
      Or is already merged and will be cleaned up if its ref is 0 (orphan).
      
      After commit d2311e69 ("btrfs: relocation: Delay reloc tree deletion
      after merge_reloc_roots"), reloc tree cleanup is delayed until one block
      group is balanced.
      
      Since fresh reloc roots are recorded during merge, as long as there
      is no power loss, those orphan reloc roots converted from fresh ones are
      handled without problem.
      
      However when power loss happens, orphan reloc roots can be recorded
      on-disk, thus at next mount time, we will have orphan reloc roots from
      on-disk data directly, and ignored by clean_dirty_subvols() routine.
      
      Then when background balance starts to balance another block group, and
      needs to create new reloc root for the same root, btrfs_insert_item()
      returns -EEXIST, and trigger that BUG_ON().
      
      [FIX]
      For orphan reloc roots, also queue them to rc->dirty_subvol_roots, so
      all reloc roots no matter orphan or not, can be cleaned up properly and
      avoid above BUG_ON().
      
      And to cooperate with above change, clean_dirty_subvols() will check if
      the queued root is a reloc root or a subvol root.
      For a subvol root, do the old work, and for a orphan reloc root, clean it
      up.
      
      Fixes: d2311e69 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
      CC: stable@vger.kernel.org # 5.1
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19b65aac
    • Filipe Manana's avatar
      Btrfs: incremental send, fix file corruption when no-holes feature is enabled · 8a7c6f19
      Filipe Manana authored
      commit 6b1f72e5 upstream.
      
      When using the no-holes feature, if we have a file with prealloc extents
      with a start offset beyond the file's eof, doing an incremental send can
      cause corruption of the file due to incorrect hole detection. Such case
      requires that the prealloc extent(s) exist in both the parent and send
      snapshots, and that a hole is punched into the file that covers all its
      extents that do not cross the eof boundary.
      
      Example reproducer:
      
        $ mkfs.btrfs -f -O no-holes /dev/sdb
        $ mount /dev/sdb /mnt/sdb
      
        $ xfs_io -f -c "pwrite -S 0xab 0 500K" /mnt/sdb/foobar
        $ xfs_io -c "falloc -k 1200K 800K" /mnt/sdb/foobar
      
        $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/base
      
        $ btrfs send -f /tmp/base.snap /mnt/sdb/base
      
        $ xfs_io -c "fpunch 0 500K" /mnt/sdb/foobar
      
        $ btrfs subvolume snapshot -r /mnt/sdb /mnt/sdb/incr
      
        $ btrfs send -p /mnt/sdb/base -f /tmp/incr.snap /mnt/sdb/incr
      
        $ md5sum /mnt/sdb/incr/foobar
        816df6f64deba63b029ca19d880ee10a   /mnt/sdb/incr/foobar
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt/sdc
      
        $ btrfs receive -f /tmp/base.snap /mnt/sdc
        $ btrfs receive -f /tmp/incr.snap /mnt/sdc
      
        $ md5sum /mnt/sdc/incr/foobar
        cf2ef71f4a9e90c2f6013ba3b2257ed2   /mnt/sdc/incr/foobar
      
          --> Different checksum, because the prealloc extent beyond the
              file's eof confused the hole detection code and it assumed
              a hole starting at offset 0 and ending at the offset of the
              prealloc extent (1200Kb) instead of ending at the offset
              500Kb (the file's size).
      
      Fix this by ensuring we never cross the file's size when issuing the
      write operations for a hole.
      
      Fixes: 16e7549f ("Btrfs: incompatible format change to remove hole extents")
      CC: stable@vger.kernel.org # 3.14+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a7c6f19
    • Qu Wenruo's avatar
      btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer dereference · afd72278
      Qu Wenruo authored
      commit 57949d03 upstream.
      
      [BUG]
      When mounting a fs with reloc tree and has qgroup enabled, it can cause
      NULL pointer dereference at mount time:
      
        BUG: kernel NULL pointer dereference, address: 00000000000000a8
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP NOPTI
        RIP: 0010:btrfs_qgroup_add_swapped_blocks+0x186/0x300 [btrfs]
        Call Trace:
         replace_path.isra.23+0x685/0x900 [btrfs]
         merge_reloc_root+0x26e/0x5f0 [btrfs]
         merge_reloc_roots+0x10a/0x1a0 [btrfs]
         btrfs_recover_relocation+0x3cd/0x420 [btrfs]
         open_ctree+0x1bc8/0x1ed0 [btrfs]
         btrfs_mount_root+0x544/0x680 [btrfs]
         legacy_get_tree+0x34/0x60
         vfs_get_tree+0x2d/0xf0
         fc_mount+0x12/0x40
         vfs_kern_mount.part.12+0x61/0xa0
         vfs_kern_mount+0x13/0x20
         btrfs_mount+0x16f/0x860 [btrfs]
         legacy_get_tree+0x34/0x60
         vfs_get_tree+0x2d/0xf0
         do_mount+0x81f/0xac0
         ksys_mount+0xbf/0xe0
         __x64_sys_mount+0x25/0x30
         do_syscall_64+0x65/0x240
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [CAUSE]
      In btrfs_recover_relocation(), we don't have enough info to determine
      which block group we're relocating, but only to merge existing reloc
      trees.
      
      Thus in btrfs_recover_relocation(), rc->block_group is NULL.
      btrfs_qgroup_add_swapped_blocks() hasn't taken this into consideration,
      and causes a NULL pointer dereference.
      
      The bug is introduced by commit 3d0174f7 ("btrfs: qgroup: Only trace
      data extents in leaves if we're relocating data block group"), and
      later qgroup refactoring still keeps this optimization.
      
      [FIX]
      Thankfully in the context of btrfs_recover_relocation(), there is no
      other progress can modify tree blocks, thus those swapped tree blocks
      pair will never affect qgroup numbers, no matter whatever we set for
      block->trace_leaf.
      
      So we only need to check if @bg is NULL before accessing @bg->flags.
      Reported-by: default avatarJuan Erbes <jerbes@gmail.com>
      Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1134806
      Fixes: 3d0174f7 ("btrfs: qgroup: Only trace data extents in leaves if we're relocating data block group")
      CC: stable@vger.kernel.org # 4.20+
      Signed-off-by: default avatarQu Wenruo <wqu@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afd72278
    • Dennis Zhou's avatar
      btrfs: correct zstd workspace manager lock to use spin_lock_bh() · c2d03b61
      Dennis Zhou authored
      commit fee13fe9 upstream.
      
      The btrfs zstd workspace manager uses a background timer to reclaim not
      recently used workspaces. I used spin_lock() from this context which
      should have been caught with lockdep, but was not. This deadlock was
      reported in bugzilla. The fix is to switch the zstd wsm lock to use
      spin_lock_bh() from the softirq context.
      
      This happened quite relibably on ppc64, unlike on other architectures.
      
        [  313.402874] ================================
        [  313.402875] WARNING: inconsistent lock state
        [  313.402879] 5.1.0-rc7 #1 Not tainted
        [  313.402880] --------------------------------
        [  313.402882] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
        [  313.402885] swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
        [  313.402888] 0000000080d1120c (&(&wsm.lock)->rlock){+.?.}, at: .zstd_reclaim_timer_fn+0x40/0x230
        [  313.402895] {SOFTIRQ-ON-W} state was registered at:
        [  313.402899]   .lock_acquire+0xd0/0x240
        [  313.402903]   ._raw_spin_lock+0x34/0x60
        [  313.402906]   .zstd_get_workspace+0xd0/0x360
        [  313.402908]   .end_compressed_bio_read+0x3b8/0x540
        [  313.402911]   .bio_endio+0x174/0x2c0
        [  313.402914]   .end_workqueue_fn+0x4c/0x70
        [  313.402917]   .normal_work_helper+0x138/0x7e0
        [  313.402920]   .process_one_work+0x324/0x790
        [  313.402922]   .worker_thread+0x68/0x570
        [  313.402925]   .kthread+0x19c/0x1b0
        [  313.402928]   .ret_from_kernel_thread+0x58/0x78
        [  313.402930] irq event stamp: 2629216
        [  313.402933] hardirqs last  enabled at (2629216): [<c0000000009da738>] ._raw_spin_unlock_irq+0x38/0x60
        [  313.402936] hardirqs last disabled at (2629215): [<c0000000009da4c4>] ._raw_spin_lock_irq+0x24/0x70
        [  313.402939] softirqs last  enabled at (2629212): [<c0000000000af9fc>] .irq_enter+0x8c/0xd0
        [  313.402942] softirqs last disabled at (2629213): [<c0000000000afb58>] .irq_exit+0x118/0x170
        [  313.402944]
      		 other info that might help us debug this:
        [  313.402945]  Possible unsafe locking scenario:
      
        [  313.402947]        CPU0
        [  313.402948]        ----
        [  313.402949]   lock(&(&wsm.lock)->rlock);
        [  313.402951]   <Interrupt>
        [  313.402952]     lock(&(&wsm.lock)->rlock);
        [  313.402954]
      		  *** DEADLOCK ***
      
        [  313.402957] 1 lock held by swapper/5/0:
        [  313.402958]  #0: 000000004b612042 ((&wsm.timer)){+.-.}, at: .call_timer_fn+0x0/0x3c0
        [  313.402963]
      		 stack backtrace:
        [  313.402967] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.1.0-rc7 #1
        [  313.402968] Call Trace:
        [  313.402972] [c0000007fa262e70] [c0000000009b3294] .dump_stack+0xe0/0x15c (unreliable)
        [  313.402975] [c0000007fa262f10] [c000000000125548] .print_usage_bug+0x348/0x390
        [  313.402978] [c0000007fa262fd0] [c000000000125cb4] .mark_lock+0x724/0x930
        [  313.402981] [c0000007fa263080] [c000000000126c20] .__lock_acquire+0xc90/0x16a0
        [  313.402984] [c0000007fa2631b0] [c000000000128040] .lock_acquire+0xd0/0x240
        [  313.402987] [c0000007fa263280] [c0000000009da2b4] ._raw_spin_lock+0x34/0x60
        [  313.402990] [c0000007fa263300] [c00000000054b0b0] .zstd_reclaim_timer_fn+0x40/0x230
        [  313.402993] [c0000007fa2633d0] [c000000000158b38] .call_timer_fn+0xc8/0x3c0
        [  313.402996] [c0000007fa2634a0] [c000000000158f74] .expire_timers+0x144/0x260
        [  313.402999] [c0000007fa263550] [c000000000159178] .run_timer_softirq+0xe8/0x230
        [  313.403002] [c0000007fa263680] [c0000000009db288] .__do_softirq+0x188/0x5d4
        [  313.403004] [c0000007fa263790] [c0000000000afb58] .irq_exit+0x118/0x170
        [  313.403008] [c0000007fa263800] [c000000000028d88] .timer_interrupt+0x158/0x430
        [  313.403012] [c0000007fa2638b0] [c0000000000091d4] decrementer_common+0x134/0x140
        [  313.403017] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
      		     LR = .arch_local_irq_restore.part.0+0x68/0x80
        [  313.403020] [c0000007fa263bb0] [c00000000001a3ac] .arch_local_irq_restore.part.0+0x2c/0x80 (unreliable)
        [  313.403024] [c0000007fa263c30] [c0000000007bbbcc] .cpuidle_enter_state+0xec/0x670
        [  313.403027] [c0000007fa263d00] [c0000000000f5130] .call_cpuidle+0x40/0x90
        [  313.403031] [c0000007fa263d70] [c0000000000f554c] .do_idle+0x2dc/0x3a0
        [  313.403034] [c0000007fa263e30] [c0000000000f59ac] .cpu_startup_entry+0x2c/0x30
        [  313.403037] [c0000007fa263ea0] [c000000000045674] .start_secondary+0x644/0x650
        [  313.403041] [c0000007fa263f90] [c00000000000ad5c] start_secondary_prolog+0x10/0x14
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203517
      Fixes: 3f93aef5 ("btrfs: add zstd compression level support")
      CC: stable@vger.kernel.org # 5.1+
      Signed-off-by: default avatarDennis Zhou <dennis@kernel.org>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2d03b61
    • Filipe Manana's avatar
      Btrfs: fix fsync not persisting changed attributes of a directory · a1d13d87
      Filipe Manana authored
      commit 60d9f503 upstream.
      
      While logging an inode we follow its ancestors and for each one we mark
      it as logged in the current transaction, even if we have not logged it.
      As a consequence if we change an attribute of an ancestor, such as the
      UID or GID for example, and then explicitly fsync it, we end up not
      logging the inode at all despite returning success to user space, which
      results in the attribute being lost if a power failure happens after
      the fsync.
      
      Sample reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/dir
        $ chown 6007:6007 /mnt/dir
      
        $ sync
      
        $ chown 9003:9003 /mnt/dir
        $ touch /mnt/dir/file
        $ xfs_io -c fsync /mnt/dir/file
      
        # fsync our directory after fsync'ing the new file, should persist the
        # new values for the uid and gid.
        $ xfs_io -c fsync /mnt/dir
      
        <power failure>
      
        $ mount /dev/sdb /mnt
        $ stat -c %u:%g /mnt/dir
        6007:6007
      
          --> should be 9003:9003, the uid and gid were not persisted, despite
              the explicit fsync on the directory prior to the power failure
      
      Fix this by not updating the logged_trans field of ancestor inodes when
      logging an inode, since we have not logged them. Let only future calls to
      btrfs_log_inode() to mark inodes as logged.
      
      This could be triggered by my recent fsync fuzz tester for fstests, for
      which an fstests patch exists titled "fstests: generic, fsync fuzz tester
      with fsstress".
      
      Fixes: 12fcfd22 ("Btrfs: tree logging unlink/rename fixes")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1d13d87
    • Filipe Manana's avatar
      Btrfs: fix race updating log root item during fsync · 1e6d76d4
      Filipe Manana authored
      commit 06989c79 upstream.
      
      When syncing the log, the final phase of a fsync operation, we need to
      either create a log root's item or update the existing item in the log
      tree of log roots, and that depends on the current value of the log
      root's log_transid - if it's 1 we need to create the log root item,
      otherwise it must exist already and we update it. Since there is no
      synchronization between updating the log_transid and checking it for
      deciding whether the log root's item needs to be created or updated, we
      end up with a tiny race window that results in attempts to update the
      item to fail because the item was not yet created:
      
                    CPU 1                                    CPU 2
      
        btrfs_sync_log()
      
          lock root->log_mutex
      
          set log root's log_transid to 1
      
          unlock root->log_mutex
      
                                                     btrfs_sync_log()
      
                                                       lock root->log_mutex
      
                                                       sets log root's
                                                       log_transid to 2
      
                                                       unlock root->log_mutex
      
          update_log_root()
      
            sees log root's log_transid
            with a value of 2
      
              calls btrfs_update_root(),
              which fails with -EUCLEAN
              and causes transaction abort
      
      Until recently the race lead to a BUG_ON at btrfs_update_root(), but after
      the recent commit 7ac1e464 ("btrfs: Don't panic when we can't find a
      root key") we just abort the current transaction.
      
      A sample trace of the BUG_ON() on a SLE12 kernel:
      
        ------------[ cut here ]------------
        kernel BUG at ../fs/btrfs/root-tree.c:157!
        Oops: Exception in kernel mode, sig: 5 [#1]
        SMP NR_CPUS=2048 NUMA pSeries
        (...)
        Supported: Yes, External
        CPU: 78 PID: 76303 Comm: rtas_errd Tainted: G                 X 4.4.156-94.57-default #1
        task: c00000ffa906d010 ti: c00000ff42b08000 task.ti: c00000ff42b08000
        NIP: d000000036ae5cdc LR: d000000036ae5cd8 CTR: 0000000000000000
        REGS: c00000ff42b0b860 TRAP: 0700   Tainted: G                 X  (4.4.156-94.57-default)
        MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 22444484  XER: 20000000
        CFAR: d000000036aba66c SOFTE: 1
        GPR00: d000000036ae5cd8 c00000ff42b0bae0 d000000036bda220 0000000000000054
        GPR04: 0000000000000001 0000000000000000 c00007ffff8d37c8 0000000000000000
        GPR08: c000000000e19c00 0000000000000000 0000000000000000 3736343438312079
        GPR12: 3930373337303434 c000000007a3a800 00000000007fffff 0000000000000023
        GPR16: c00000ffa9d26028 c00000ffa9d261f8 0000000000000010 c00000ffa9d2ab28
        GPR20: c00000ff42b0bc48 0000000000000001 c00000ff9f0d9888 0000000000000001
        GPR24: c00000ffa9d26000 c00000ffa9d261e8 c00000ffa9d2a800 c00000ff9f0d9888
        GPR28: c00000ffa9d26028 c00000ffa9d2aa98 0000000000000001 c00000ffa98f5b20
        NIP [d000000036ae5cdc] btrfs_update_root+0x25c/0x4e0 [btrfs]
        LR [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs]
        Call Trace:
        [c00000ff42b0bae0] [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs] (unreliable)
        [c00000ff42b0bba0] [d000000036b53610] btrfs_sync_log+0x2d0/0xc60 [btrfs]
        [c00000ff42b0bce0] [d000000036b1785c] btrfs_sync_file+0x44c/0x4e0 [btrfs]
        [c00000ff42b0bd80] [c00000000032e300] vfs_fsync_range+0x70/0x120
        [c00000ff42b0bdd0] [c00000000032e44c] do_fsync+0x5c/0xb0
        [c00000ff42b0be10] [c00000000032e8dc] SyS_fdatasync+0x2c/0x40
        [c00000ff42b0be30] [c000000000009488] system_call+0x3c/0x100
        Instruction dump:
        7f43d378 4bffebb9 60000000 88d90008 3d220000 e8b90000 3b390009 e87a01f0
        e8898e08 e8f90000 4bfd48e5 60000000 <0fe00000> e95b0060 39200004 394a0ea0
        ---[ end trace 8f2dc8f919cabab8 ]---
      
      So fix this by doing the check of log_transid and updating or creating the
      log root's item while holding the root's log_mutex.
      
      Fixes: 7237f183 ("Btrfs: fix tree logs parallel sync")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e6d76d4
    • Filipe Manana's avatar
      Btrfs: fix wrong ctime and mtime of a directory after log replay · a48f8ac4
      Filipe Manana authored
      commit 5338e43a upstream.
      
      When replaying a log that contains a new file or directory name that needs
      to be added to its parent directory, we end up updating the mtime and the
      ctime of the parent directory to the current time after we have set their
      values to the correct ones (set at fsync time), efectivelly losing them.
      
      Sample reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        $ mkdir /mnt/dir
        $ touch /mnt/dir/file
      
        # fsync of the directory is optional, not needed
        $ xfs_io -c fsync /mnt/dir
        $ xfs_io -c fsync /mnt/dir/file
      
        $ stat -c %Y /mnt/dir
        1557856079
      
        <power failure>
      
        $ sleep 3
        $ mount /dev/sdb /mnt
        $ stat -c %Y /mnt/dir
        1557856082
      
          --> should have been 1557856079, the mtime is updated to the current
              time when replaying the log
      
      Fix this by not updating the mtime and ctime to the current time at
      btrfs_add_link() when we are replaying a log tree.
      
      This could be triggered by my recent fsync fuzz tester for fstests, for
      which an fstests patch exists titled "fstests: generic, fsync fuzz tester
      with fsstress".
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: default avatarNikolay Borisov <nborisov@suse.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a48f8ac4