1. 10 Jul, 2019 40 commits
    • Paul Menzel's avatar
      nfsd: Fix overflow causing non-working mounts on 1 TB machines · 94f536c7
      Paul Menzel authored
      commit 3b2d4dcf upstream.
      
      Since commit 10a68cdf (nfsd: fix performance-limiting session
      calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with
      1 TB of memory cannot be mounted anymore. The mount just hangs on the
      client.
      
      The gist of commit 10a68cdf is the change below.
      
          -avail = clamp_t(int, avail, slotsize, avail/3);
          +avail = clamp_t(int, avail, slotsize, total_avail/3);
      
      Here are the macros.
      
          #define min_t(type, x, y)       __careful_cmp((type)(x), (type)(y), <)
          #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi)
      
      `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts
      the values to `int`, which for 32-bit integers can only hold values
      −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1).
      
      `avail` (in the function signature) is just 65536, so that no overflow
      was happening. Before the commit the assignment would result in 21845,
      and `num = 4`.
      
      When using `total_avail`, it is causing the assignment to be
      18446744072226137429 (printed as %lu), and `num` is then 4164608182.
      
      My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the
      server thinks there is no memory available any more for this client.
      
      Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long`
      fixes the issue.
      
      Now, `avail = 65536` (before commit 10a68cdf `avail = 21845`), but
      `num = 4` remains the same.
      
      Fixes: c54f24e3 (nfsd: fix performance-limiting session calculation)
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94f536c7
    • Wanpeng Li's avatar
      KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC · 03dbbec3
      Wanpeng Li authored
      commit bb34e690 upstream.
      
      Thomas reported that:
      
       | Background:
       |
       |    In preparation of supporting IPI shorthands I changed the CPU offline
       |    code to software disable the local APIC instead of just masking it.
       |    That's done by clearing the APIC_SPIV_APIC_ENABLED bit in the APIC_SPIV
       |    register.
       |
       | Failure:
       |
       |    When the CPU comes back online the startup code triggers occasionally
       |    the warning in apic_pending_intr_clear(). That complains that the IRRs
       |    are not empty.
       |
       |    The offending vector is the local APIC timer vector who's IRR bit is set
       |    and stays set.
       |
       | It took me quite some time to reproduce the issue locally, but now I can
       | see what happens.
       |
       | It requires apicv_enabled=0, i.e. full apic emulation. With apicv_enabled=1
       | (and hardware support) it behaves correctly.
       |
       | Here is the series of events:
       |
       |     Guest CPU
       |
       |     goes down
       |
       |       native_cpu_disable()
       |
       | 			apic_soft_disable();
       |
       |     play_dead()
       |
       |     ....
       |
       |     startup()
       |
       |       if (apic_enabled())
       |         apic_pending_intr_clear()	<- Not taken
       |
       |      enable APIC
       |
       |         apic_pending_intr_clear()	<- Triggers warning because IRR is stale
       |
       | When this happens then the deadline timer or the regular APIC timer -
       | happens with both, has fired shortly before the APIC is disabled, but the
       | interrupt was not serviced because the guest CPU was in an interrupt
       | disabled region at that point.
       |
       | The state of the timer vector ISR/IRR bits:
       |
       |     	     	       	        ISR     IRR
       | before apic_soft_disable()    0	      1
       | after apic_soft_disable()     0	      1
       |
       | On startup		      		 0	      1
       |
       | Now one would assume that the IRR is cleared after the INIT reset, but this
       | happens only on CPU0.
       |
       | Why?
       |
       | Because our CPU0 hotplug is just for testing to make sure nothing breaks
       | and goes through an NMI wakeup vehicle because INIT would send it through
       | the boots-trap code which is not really working if that CPU was not
       | physically unplugged.
       |
       | Now looking at a real world APIC the situation in that case is:
       |
       |     	     	       	      	ISR     IRR
       | before apic_soft_disable()    0	      1
       | after apic_soft_disable()     0	      1
       |
       | On startup		      		 0	      0
       |
       | Why?
       |
       | Once the dying CPU reenables interrupts the pending interrupt gets
       | delivered as a spurious interupt and then the state is clear.
       |
       | While that CPU0 hotplug test case is surely an esoteric issue, the APIC
       | emulation is still wrong, Even if the play_dead() code would not enable
       | interrupts then the pending IRR bit would turn into an ISR .. interrupt
       | when the APIC is reenabled on startup.
      
      From SDM 10.4.7.2 Local APIC State After It Has Been Software Disabled
      * Pending interrupts in the IRR and ISR registers are held and require
        masking or handling by the CPU.
      
      In Thomas's testing, hardware cpu will not respect soft disable LAPIC
      when IRR has already been set or APICv posted-interrupt is in flight,
      so we can skip soft disable APIC checking when clearing IRR and set ISR,
      continue to respect soft disable APIC when attempting to set IRR.
      Reported-by: default avatarRong Chen <rong.a.chen@intel.com>
      Reported-by: default avatarFeng Tang <feng.tang@intel.com>
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Rong Chen <rong.a.chen@intel.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      03dbbec3
    • Paolo Bonzini's avatar
      KVM: x86: degrade WARN to pr_warn_ratelimited · d2bacadc
      Paolo Bonzini authored
      commit 3f16a5c3 upstream.
      
      This warning can be triggered easily by userspace, so it should certainly not
      cause a panic if panic_on_warn is set.
      
      Reported-by: syzbot+c03f30b4f4c46bdf8575@syzkaller.appspotmail.com
      Suggested-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2bacadc
    • Martin Schwidefsky's avatar
      s390/mm: fix pxd_bad with folded page tables · 06157573
      Martin Schwidefsky authored
      [ Upstream commit c9f62152 ]
      
      With git commit d1874a0c
      "s390/mm: make the pxd_offset functions more robust" and a 2-level page
      table it can now happen that pgd_bad() gets asked to verify a large
      segment table entry. If the entry is marked as dirty pgd_bad() will
      incorrectly return true.
      
      Change the pgd_bad(), p4d_bad(), pud_bad() and pmd_bad() functions to
      first verify the table type, return false if the table level is lower
      than what the function is suppossed to check, return true if the table
      level is too high, and otherwise check the relevant region and segment
      table bits. pmd_bad() has to check against ~SEGMENT_ENTRY_BITS for
      normal page table pointers or ~SEGMENT_ENTRY_BITS_LARGE for large
      segment table entries. Same for pud_bad() which has to check against
      ~_REGION_ENTRY_BITS or ~_REGION_ENTRY_BITS_LARGE.
      
      Fixes: d1874a0c ("s390/mm: make the pxd_offset functions more robust")
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      06157573
    • Linus Torvalds's avatar
      tty: rocket: fix incorrect forward declaration of 'rp_init()' · 3e9e67d0
      Linus Torvalds authored
      [ Upstream commit 423ea325 ]
      
      Make the forward declaration actually match the real function
      definition, something that previous versions of gcc had just ignored.
      
      This is another patch to fix new warnings from gcc-9 before I start the
      merge window pulls.  I don't want to miss legitimate new warnings just
      because my system update brought a new compiler with new warnings.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3e9e67d0
    • Nikolay Borisov's avatar
      btrfs: Ensure replaced device doesn't have pending chunk allocation · 723e3866
      Nikolay Borisov authored
      commit debd1c06 upstream.
      
      Recent FITRIM work, namely bbbf7243 ("btrfs: combine device update
      operations during transaction commit") combined the way certain
      operations are recoded in a transaction. As a result an ASSERT was added
      in dev_replace_finish to ensure the new code works correctly.
      Unfortunately I got reports that it's possible to trigger the assert,
      meaning that during a device replace it's possible to have an unfinished
      chunk allocation on the source device.
      
      This is supposed to be prevented by the fact that a transaction is
      committed before finishing the replace oepration and alter acquiring the
      chunk mutex. This is not sufficient since by the time the transaction is
      committed and the chunk mutex acquired it's possible to allocate a chunk
      depending on the workload being executed on the replaced device. This
      bug has been present ever since device replace was introduced but there
      was never code which checks for it.
      
      The correct way to fix is to ensure that there is no pending device
      modification operation when the chunk mutex is acquire and if there is
      repeat transaction commit. Unfortunately it's not possible to just
      exclude the source device from btrfs_fs_devices::dev_alloc_list since
      this causes ENOSPC to be hit in transaction commit.
      
      Fixing that in another way would need to add special cases to handle the
      last writes and forbid new ones. The looped transaction fix is more
      obvious, and can be easily backported. The runtime of dev-replace is
      long so there's no noticeable delay caused by that.
      Reported-by: default avatarDavid Sterba <dsterba@suse.com>
      Fixes: 391cd9df ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      723e3866
    • Shakeel Butt's avatar
      mm/vmscan.c: prevent useless kswapd loops · d4ad26ed
      Shakeel Butt authored
      commit dffcac2c upstream.
      
      In production we have noticed hard lockups on large machines running
      large jobs due to kswaps hoarding lru lock within isolate_lru_pages when
      sc->reclaim_idx is 0 which is a small zone.  The lru was couple hundred
      GiBs and the condition (page_zonenum(page) > sc->reclaim_idx) in
      isolate_lru_pages() was basically skipping GiBs of pages while holding
      the LRU spinlock with interrupt disabled.
      
      On further inspection, it seems like there are two issues:
      
      (1) If kswapd on the return from balance_pgdat() could not sleep (i.e.
          node is still unbalanced), the classzone_idx is unintentionally set
          to 0 and the whole reclaim cycle of kswapd will try to reclaim only
          the lowest and smallest zone while traversing the whole memory.
      
      (2) Fundamentally isolate_lru_pages() is really bad when the
          allocation has woken kswapd for a smaller zone on a very large machine
          running very large jobs.  It can hoard the LRU spinlock while skipping
          over 100s of GiBs of pages.
      
      This patch only fixes (1).  (2) needs a more fundamental solution.  To
      fix (1), in the kswapd context, if pgdat->kswapd_classzone_idx is
      invalid use the classzone_idx of the previous kswapd loop otherwise use
      the one the waker has requested.
      
      Link: http://lkml.kernel.org/r/20190701201847.251028-1-shakeelb@google.com
      Fixes: e716f2eb ("mm, vmscan: prevent kswapd sleeping prematurely due to mismatched classzone_idx")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Roman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4ad26ed
    • Petr Mladek's avatar
      ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code() · d0a2a510
      Petr Mladek authored
      commit d5b844a2 upstream.
      
      The commit 9f255b63 ("module: Fix livepatch/ftrace module text
      permissions race") causes a possible deadlock between register_kprobe()
      and ftrace_run_update_code() when ftrace is using stop_machine().
      
      The existing dependency chain (in reverse order) is:
      
      -> #1 (text_mutex){+.+.}:
             validate_chain.isra.21+0xb32/0xd70
             __lock_acquire+0x4b8/0x928
             lock_acquire+0x102/0x230
             __mutex_lock+0x88/0x908
             mutex_lock_nested+0x32/0x40
             register_kprobe+0x254/0x658
             init_kprobes+0x11a/0x168
             do_one_initcall+0x70/0x318
             kernel_init_freeable+0x456/0x508
             kernel_init+0x22/0x150
             ret_from_fork+0x30/0x34
             kernel_thread_starter+0x0/0xc
      
      -> #0 (cpu_hotplug_lock.rw_sem){++++}:
             check_prev_add+0x90c/0xde0
             validate_chain.isra.21+0xb32/0xd70
             __lock_acquire+0x4b8/0x928
             lock_acquire+0x102/0x230
             cpus_read_lock+0x62/0xd0
             stop_machine+0x2e/0x60
             arch_ftrace_update_code+0x2e/0x40
             ftrace_run_update_code+0x40/0xa0
             ftrace_startup+0xb2/0x168
             register_ftrace_function+0x64/0x88
             klp_patch_object+0x1a2/0x290
             klp_enable_patch+0x554/0x980
             do_one_initcall+0x70/0x318
             do_init_module+0x6e/0x250
             load_module+0x1782/0x1990
             __s390x_sys_finit_module+0xaa/0xf0
             system_call+0xd8/0x2d0
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(text_mutex);
                                     lock(cpu_hotplug_lock.rw_sem);
                                     lock(text_mutex);
        lock(cpu_hotplug_lock.rw_sem);
      
      It is similar problem that has been solved by the commit 2d1e38f5
      ("kprobes: Cure hotplug lock ordering issues"). Many locks are involved.
      To be on the safe side, text_mutex must become a low level lock taken
      after cpu_hotplug_lock.rw_sem.
      
      This can't be achieved easily with the current ftrace design.
      For example, arm calls set_all_modules_text_rw() already in
      ftrace_arch_code_modify_prepare(), see arch/arm/kernel/ftrace.c.
      This functions is called:
      
        + outside stop_machine() from ftrace_run_update_code()
        + without stop_machine() from ftrace_module_enable()
      
      Fortunately, the problematic fix is needed only on x86_64. It is
      the only architecture that calls set_all_modules_text_rw()
      in ftrace path and supports livepatching at the same time.
      
      Therefore it is enough to move text_mutex handling from the generic
      kernel/trace/ftrace.c into arch/x86/kernel/ftrace.c:
      
         ftrace_arch_code_modify_prepare()
         ftrace_arch_code_modify_post_process()
      
      This patch basically reverts the ftrace part of the problematic
      commit 9f255b63 ("module: Fix livepatch/ftrace module
      text permissions race"). And provides x86_64 specific-fix.
      
      Some refactoring of the ftrace code will be needed when livepatching
      is implemented for arm or nds32. These architectures call
      set_all_modules_text_rw() and use stop_machine() at the same time.
      
      Link: http://lkml.kernel.org/r/20190627081334.12793-1-pmladek@suse.com
      
      Fixes: 9f255b63 ("module: Fix livepatch/ftrace module text permissions race")
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reported-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Reviewed-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      [
        As reviewed by Miroslav Benes <mbenes@suse.cz>, removed return value of
        ftrace_run_update_code() as it is a void function.
      ]
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d0a2a510
    • Robert Beckett's avatar
      drm/imx: only send event on crtc disable if kept disabled · 3d2a8000
      Robert Beckett authored
      commit 5aeab2bf upstream.
      
      The event will be sent as part of the vblank enable during the modeset
      if the crtc is not being kept disabled.
      
      Fixes: 5f2f9115 ("drm/imx: atomic phase 3 step 1: Use atomic configuration")
      Signed-off-by: default avatarRobert Beckett <bob.beckett@collabora.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d2a8000
    • Robert Beckett's avatar
      drm/imx: notify drm core before sending event during crtc disable · c61d1981
      Robert Beckett authored
      commit 78c68e8f upstream.
      
      Notify drm core before sending pending events during crtc disable.
      This fixes the first event after disable having an old stale timestamp
      by having drm_crtc_vblank_off update the timestamp to now.
      
      This was seen while debugging weston log message:
      Warning: computed repaint delay is insane: -8212 msec
      
      This occurred due to:
      1. driver starts up
      2. fbcon comes along and restores fbdev, enabling vblank
      3. vblank_disable_fn fires via timer disabling vblank, keeping vblank
      seq number and time set at current value
      (some time later)
      4. weston starts and does a modeset
      5. atomic commit disables crtc while it does the modeset
      6. ipu_crtc_atomic_disable sends vblank with old seq number and time
      
      Fixes: a4744786 ("drm/imx: fix crtc vblank state regression")
      Signed-off-by: default avatarRobert Beckett <bob.beckett@collabora.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarPhilipp Zabel <p.zabel@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c61d1981
    • Lucas Stach's avatar
      drm/etnaviv: add missing failure path to destroy suballoc · 7f426730
      Lucas Stach authored
      commit be132e13 upstream.
      
      When something goes wrong in the GPU init after the cmdbuf suballocator
      has been constructed, we fail to destroy it properly. This causes havok
      later when the GPU is unbound due to a module unload or similar.
      
      Fixes: e66774dd (drm/etnaviv: add cmdbuf suballocator)
      Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Tested-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f426730
    • Gerd Hoffmann's avatar
      drm/virtio: move drm_connector_update_edid_property() call · 9455763a
      Gerd Hoffmann authored
      commit 41de4be6 upstream.
      
      drm_connector_update_edid_property can sleep, we must not
      call it while holding a spinlock.  Move the callsite.
      
      Fixes: b4b01b49 ("drm/virtio: add edid support")
      Reported-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: default avatarGerd Hoffmann <kraxel@redhat.com>
      Tested-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Tested-by: default avatarCornelia Huck <cohuck@redhat.com>
      Acked-by: default avatarCornelia Huck <cohuck@redhat.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20190405044602.2334-1-kraxel@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9455763a
    • Alex Deucher's avatar
      drm/amdgpu/gfx9: use reset default for PA_SC_FIFO_SIZE · f728b21e
      Alex Deucher authored
      commit 25f09f85 upstream.
      
      Recommended by the hw team.
      Reviewed-and-Tested-by: default avatarHuang Rui <ray.huang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f728b21e
    • Lyude Paul's avatar
      drm/amdgpu: Don't skip display settings in hwmgr_resume() · 3f232903
      Lyude Paul authored
      commit 688f3d1e upstream.
      
      I'm not entirely sure why this is, but for some reason:
      
      921935dc ("drm/amd/powerplay: enforce display related settings only on needed")
      
      Breaks runtime PM resume on the Radeon PRO WX 3100 (Lexa) in one the
      pre-production laptops I have. The issue manifests as the following
      messages in dmesg:
      
      [drm] UVD and UVD ENC initialized successfully.
      amdgpu 0000:3b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vce1 test failed (-110)
      [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vce_v3_0> failed -110
      [drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
      
      And happens after about 6-10 runtime PM suspend/resume cycles (sometimes
      sooner, if you're lucky!). Unfortunately I can't seem to pin down
      precisely which part in psm_adjust_power_state_dynamic that is causing
      the issue, but not skipping the display setting setup seems to fix it.
      Hopefully if there is a better fix for this, this patch will spark
      discussion around it.
      
      Fixes: 921935dc ("drm/amd/powerplay: enforce display related settings only on needed")
      Cc: Evan Quan <evan.quan@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Rex Zhu <Rex.Zhu@amd.com>
      Cc: Likun Gao <Likun.Gao@amd.com>
      Cc: <stable@vger.kernel.org> # v5.1+
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f232903
    • Evan Quan's avatar
      drm/amd/powerplay: use hardware fan control if no powerplay fan table · 5150d80f
      Evan Quan authored
      commit f78c581e upstream.
      
      Otherwise, you may get divided-by-zero error or corrput the SMU fan
      control feature.
      Signed-off-by: default avatarEvan Quan <evan.quan@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Tested-by: default avatarSlava Abramov <slava.abramov@amd.com>
      Acked-by: default avatarSlava Abramov <slava.abramov@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5150d80f
    • Chris Wilson's avatar
      drm/i915/ringbuffer: EMIT_INVALIDATE *before* switch context · e7d14cc8
      Chris Wilson authored
      commit c84c9029 upstream.
      
      Despite what I think the prm recommends, commit f2253bd9
      ("drm/i915/ringbuffer: EMIT_INVALIDATE after switch context") turned out
      to be a huge mistake when enabling Ironlake contexts as the GPU would
      hang on either a MI_FLUSH or PIPE_CONTROL immediately following the
      MI_SET_CONTEXT of an active mesa context (more vanilla contexts, e.g.
      simple rendercopies with igt, do not suffer).
      
      Ville found the following clue,
      
        "[DevCTG+]: For the invalidate operation of the pipe control, the
         following pointers are affected. The
         invalidate operation affects the restore of these packets. If the pipe
         control invalidate operation is completed
         before the context save, the indirect pointers will not be restored from
         memory.
         1. Pipeline State Pointer
         2. Media State Pointer
         3. Constant Buffer Packet"
      
      which suggests by us emitting the INVALIDATE prior to the MI_SET_CONTEXT,
      we prevent the context-restore from chasing the dangling pointers within
      the image, and explains why this likely prevents the GPU hang.
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190419111749.3910-1-chris@chris-wilson.co.uk
      (cherry picked from commit 928f8f42 in drm-intel-next)
      Cc: stable@vger.kernel.org
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111014
      Fixes: f2253bd9 ("drm/i915/ringbuffer: EMIT_INVALIDATE after switch context")
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7d14cc8
    • Ard Biesheuvel's avatar
      arm64: kaslr: keep modules inside module region when KASAN is enabled · 18cd20c0
      Ard Biesheuvel authored
      commit 6f496a55 upstream.
      
      When KASLR and KASAN are both enabled, we keep the modules where they
      are, and randomize the placement of the kernel so it is within 2 GB
      of the module region. The reason for this is that putting modules in
      the vmalloc region (like we normally do when KASLR is enabled) is not
      possible in this case, given that the entire vmalloc region is already
      backed by KASAN zero shadow pages, and so allocating dedicated KASAN
      shadow space as required by loaded modules is not possible.
      
      The default module allocation window is set to [_etext - 128MB, _etext]
      in kaslr.c, which is appropriate for KASLR kernels booted without a
      seed or with 'nokaslr' on the command line. However, as it turns out,
      it is not quite correct for the KASAN case, since it still intersects
      the vmalloc region at the top, where attempts to allocate shadow pages
      will collide with the KASAN zero shadow pages, causing a WARN() and all
      kinds of other trouble. So cap the top end to MODULES_END explicitly
      when running with KASAN.
      
      Cc: <stable@vger.kernel.org> # 4.9+
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      18cd20c0
    • Joshua Scott's avatar
      ARM: dts: armada-xp-98dx3236: Switch to armada-38x-uart serial node · 09cc58c8
      Joshua Scott authored
      commit 80031361 upstream.
      
      Switch to the "marvell,armada-38x-uart" driver variant to empty
      the UART buffer before writing to the UART_LCR register.
      Signed-off-by: default avatarJoshua Scott <joshua.scott@alliedtelesis.co.nz>
      Tested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>.
      Cc: stable@vger.kernel.org
      Fixes: 43e28ba8 ("ARM: dts: Use armada-370-xp as a base for armada-xp-98dx3236")
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09cc58c8
    • Eiichi Tsukata's avatar
      tracing/snapshot: Resize spare buffer if size changed · ce3ce345
      Eiichi Tsukata authored
      commit 46cc0b44 upstream.
      
      Current snapshot implementation swaps two ring_buffers even though their
      sizes are different from each other, that can cause an inconsistency
      between the contents of buffer_size_kb file and the current buffer size.
      
      For example:
      
        # cat buffer_size_kb
        7 (expanded: 1408)
        # echo 1 > events/enable
        # grep bytes per_cpu/cpu0/stats
        bytes: 1441020
        # echo 1 > snapshot             // current:1408, spare:1408
        # echo 123 > buffer_size_kb     // current:123,  spare:1408
        # echo 1 > snapshot             // current:1408, spare:123
        # grep bytes per_cpu/cpu0/stats
        bytes: 1443700
        # cat buffer_size_kb
        123                             // != current:1408
      
      And also, a similar per-cpu case hits the following WARNING:
      
      Reproducer:
      
        # echo 1 > per_cpu/cpu0/snapshot
        # echo 123 > buffer_size_kb
        # echo 1 > per_cpu/cpu0/snapshot
      
      WARNING:
      
        WARNING: CPU: 0 PID: 1946 at kernel/trace/trace.c:1607 update_max_tr_single.part.0+0x2b8/0x380
        Modules linked in:
        CPU: 0 PID: 1946 Comm: bash Not tainted 5.2.0-rc6 #20
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
        RIP: 0010:update_max_tr_single.part.0+0x2b8/0x380
        Code: ff e8 dc da f9 ff 0f 0b e9 88 fe ff ff e8 d0 da f9 ff 44 89 ee bf f5 ff ff ff e8 33 dc f9 ff 41 83 fd f5 74 96 e8 b8 da f9 ff <0f> 0b eb 8d e8 af da f9 ff 0f 0b e9 bf fd ff ff e8 a3 da f9 ff 48
        RSP: 0018:ffff888063e4fca0 EFLAGS: 00010093
        RAX: ffff888066214380 RBX: ffffffff99850fe0 RCX: ffffffff964298a8
        RDX: 0000000000000000 RSI: 00000000fffffff5 RDI: 0000000000000005
        RBP: 1ffff1100c7c9f96 R08: ffff888066214380 R09: ffffed100c7c9f9b
        R10: ffffed100c7c9f9a R11: 0000000000000003 R12: 0000000000000000
        R13: 00000000ffffffea R14: ffff888066214380 R15: ffffffff99851060
        FS:  00007f9f8173c700(0000) GS:ffff88806d000000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000714dc0 CR3: 0000000066fa6000 CR4: 00000000000006f0
        Call Trace:
         ? trace_array_printk_buf+0x140/0x140
         ? __mutex_lock_slowpath+0x10/0x10
         tracing_snapshot_write+0x4c8/0x7f0
         ? trace_printk_init_buffers+0x60/0x60
         ? selinux_file_permission+0x3b/0x540
         ? tracer_preempt_off+0x38/0x506
         ? trace_printk_init_buffers+0x60/0x60
         __vfs_write+0x81/0x100
         vfs_write+0x1e1/0x560
         ksys_write+0x126/0x250
         ? __ia32_sys_read+0xb0/0xb0
         ? do_syscall_64+0x1f/0x390
         do_syscall_64+0xc1/0x390
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This patch adds resize_buffer_duplicate_size() to check if there is a
      difference between current/spare buffer sizes and resize a spare buffer
      if necessary.
      
      Link: http://lkml.kernel.org/r/20190625012910.13109-1-devel@etsukata.com
      
      Cc: stable@vger.kernel.org
      Fixes: ad909e21 ("tracing: Add internal tracing_snapshot() functions")
      Signed-off-by: default avatarEiichi Tsukata <devel@etsukata.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce3ce345
    • Oleg Nesterov's avatar
      swap_readpage(): avoid blk_wake_io_task() if !synchronous · d82bf47a
      Oleg Nesterov authored
      commit 87518530 upstream.
      
      swap_readpage() sets waiter = bio->bi_private even if synchronous = F,
      this means that the caller can get the spurious wakeup after return.
      
      This can be fatal if blk_wake_io_task() does
      set_current_state(TASK_RUNNING) after the caller does
      set_special_state(), in the worst case the kernel can crash in
      do_task_dead().
      
      Link: http://lkml.kernel.org/r/20190704160301.GA5956@redhat.com
      Fixes: 0619317f ("block: add polled wakeup task helper")
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d82bf47a
    • Eric Biggers's avatar
      fs/userfaultfd.c: disable irqs for fault_pending and event locks · 61ff807f
      Eric Biggers authored
      commit cbcfa130 upstream.
      
      When IOCB_CMD_POLL is used on a userfaultfd, aio_poll() disables IRQs
      and takes kioctx::ctx_lock, then userfaultfd_ctx::fd_wqh.lock.
      
      This may have to wait for userfaultfd_ctx::fd_wqh.lock to be released by
      userfaultfd_ctx_read(), which in turn can be waiting for
      userfaultfd_ctx::fault_pending_wqh.lock or
      userfaultfd_ctx::event_wqh.lock.
      
      But elsewhere the fault_pending_wqh and event_wqh locks are taken with
      IRQs enabled.  Since the IRQ handler may take kioctx::ctx_lock, lockdep
      reports that a deadlock is possible.
      
      Fix it by always disabling IRQs when taking the fault_pending_wqh and
      event_wqh locks.
      
      Commit ae62c16e ("userfaultfd: disable irqs when taking the
      waitqueue lock") didn't fix this because it only accounted for the
      fd_wqh lock, not the other locks nested inside it.
      
      Link: http://lkml.kernel.org/r/20190627075004.21259-1-ebiggers@kernel.org
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reported-by: syzbot+fab6de82892b6b9c6191@syzkaller.appspotmail.com
      Reported-by: syzbot+53c0b767f7ca0dc0c451@syzkaller.appspotmail.com
      Reported-by: syzbot+a3accb352f9c22041cfa@syzkaller.appspotmail.com
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.19+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61ff807f
    • Herbert Xu's avatar
      lib/mpi: Fix karactx leak in mpi_powm · 3e421061
      Herbert Xu authored
      commit c8ea9fce upstream.
      
      Sometimes mpi_powm will leak karactx because a memory allocation
      failure causes a bail-out that skips the freeing of karactx.  This
      patch moves the freeing of karactx to the end of the function like
      everything else so that it can't be skipped.
      
      Reported-by: syzbot+f7baccc38dcc1e094e77@syzkaller.appspotmail.com
      Fixes: cdec9cb5 ("crypto: GnuPG based MPI lib - source files...")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Reviewed-by: default avatarEric Biggers <ebiggers@kernel.org>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e421061
    • Jan Kara's avatar
      dax: Fix xarray entry association for mixed mappings · 7b6532ed
      Jan Kara authored
      commit 1571c029 upstream.
      
      When inserting entry into xarray, we store mapping and index in
      corresponding struct pages for memory error handling. When it happened
      that one process was mapping file at PMD granularity while another
      process at PTE granularity, we could wrongly deassociate PMD range and
      then reassociate PTE range leaving the rest of struct pages in PMD range
      without mapping information which could later cause missed notifications
      about memory errors. Fix the problem by calling the association /
      deassociation code if and only if we are really going to update the
      xarray (deassociating and associating zero or empty entries is just
      no-op so there's no reason to complicate the code with trying to avoid
      the calls for these cases).
      
      Cc: <stable@vger.kernel.org>
      Fixes: d2c997c0 ("fs, dax: use page->mapping to warn if truncate...")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b6532ed
    • Dennis Wassenberg's avatar
      ALSA: hda/realtek - Change front mic location for Lenovo M710q · 786db71d
      Dennis Wassenberg authored
      commit bef33e19 upstream.
      
      On M710q Lenovo ThinkCentre machine, there are two front mics,
      we change the location for one of them to avoid conflicts.
      Signed-off-by: default avatarDennis Wassenberg <dennis.wassenberg@secunet.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      786db71d
    • Richard Sailer's avatar
      ALSA: hda/realtek: Add quirks for several Clevo notebook barebones · 32a4cd4c
      Richard Sailer authored
      commit 503d90b3 upstream.
      
      This adds 4 SND_PCI_QUIRK(...) lines for several barebone models of the ODM
      Clevo. The model names are written in regex syntax to describe/match all clevo
      models that are similar enough and use the same PCI SSID that this fixup works
      for them.
      
      Additionally the lines regarding SSID 0x96e1 and 0x97e1 didn't fix audio for the
      all our Clevo notebooks using these SSIDs (models Clevo P960* and P970*) since
      ALC1220_FIXP_CLEVO_PB51ED_PINS swapped pins that are not necesarry to be
      swapped. This patch initiates ALC1220_FIXUP_CLEVO_P950 instead for these model
      and fixes the audio.
      
      Fixes: 80690a27 ("ALSA: hda/realtek - Add quirk for Tuxedo XC 1509")
      Signed-off-by: default avatarRichard Sailer <rs@tuxedocomputers.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32a4cd4c
    • Colin Ian King's avatar
      ALSA: usb-audio: fix sign unintended sign extension on left shifts · 97001b39
      Colin Ian King authored
      commit 2acf5a3e upstream.
      
      There are a couple of left shifts of unsigned 8 bit values that
      first get promoted to signed ints and hence get sign extended
      on the shift if the top bit of the 8 bit values are set. Fix
      this by casting the 8 bit values to unsigned ints to stop the
      unintentional sign extension.
      
      Addresses-Coverity: ("Unintended sign extension")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97001b39
    • Takashi Iwai's avatar
      ALSA: line6: Fix write on zero-sized buffer · 308e490d
      Takashi Iwai authored
      commit 34501219 upstream.
      
      LINE6 drivers allocate the buffers based on the value returned from
      usb_maxpacket() calls.  The manipulated device may return zero for
      this, and this results in the kmalloc() with zero size (and it may
      succeed) while the other part of the driver code writes the packet
      data with the fixed size -- which eventually overwrites.
      
      This patch adds a simple sanity check for the invalid buffer size for
      avoiding that problem.
      
      Reported-by: syzbot+219f00fb49874dcaea17@syzkaller.appspotmail.com
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      308e490d
    • Takashi Sakamoto's avatar
      ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messages · 139b5516
      Takashi Sakamoto authored
      commit 7fbd1753 upstream.
      
      In IEC 61883-6, 8 MIDI data streams are multiplexed into single
      MIDI conformant data channel. The index of stream is calculated by
      modulo 8 of the value of data block counter.
      
      In fireworks, the value of data block counter in CIP header has a quirk
      with firmware version v5.0.0, v5.7.3 and v5.8.0. This brings ALSA
      IEC 61883-1/6 packet streaming engine to miss detection of MIDI
      messages.
      
      This commit fixes the miss detection to modify the value of data block
      counter for the modulo calculation.
      
      For maintainers, this bug exists since a commit 18f5ed36 ("ALSA:
      fireworks/firewire-lib: add support for recent firmware quirk") in Linux
      kernel v4.2. There're many changes since the commit.  This fix can be
      backported to Linux kernel v4.4 or later. I tagged a base commit to the
      backport for your convenience.
      
      Besides, my work for Linux kernel v5.3 brings heavy code refactoring and
      some structure members are renamed in 'sound/firewire/amdtp-stream.h'.
      The content of this patch brings conflict when merging -rc tree with
      this patch and the latest tree. I request maintainers to solve the
      conflict to replace 'tx_first_dbc' with 'ctx_data.tx.first_dbc'.
      
      Fixes: df075fee ("ALSA: firewire-lib: complete AM824 data block processing layer")
      Cc: <stable@vger.kernel.org> # v4.4+
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      139b5516
    • Colin Ian King's avatar
      ALSA: seq: fix incorrect order of dest_client/dest_ports arguments · ff487e21
      Colin Ian King authored
      commit c3ea60c2 upstream.
      
      There are two occurrances of a call to snd_seq_oss_fill_addr where
      the dest_client and dest_port arguments are in the wrong order. Fix
      this by swapping them around.
      
      Addresses-Coverity: ("Arguments in wrong order")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ff487e21
    • Vincent Whitchurch's avatar
      crypto: cryptd - Fix skcipher instance memory leak · 5e961e89
      Vincent Whitchurch authored
      commit 1a0fad63 upstream.
      
      cryptd_skcipher_free() fails to free the struct skcipher_instance
      allocated in cryptd_create_skcipher(), leading to a memory leak.  This
      is detected by kmemleak on bootup on ARM64 platforms:
      
       unreferenced object 0xffff80003377b180 (size 1024):
         comm "cryptomgr_probe", pid 822, jiffies 4294894830 (age 52.760s)
         backtrace:
           kmem_cache_alloc_trace+0x270/0x2d0
           cryptd_create+0x990/0x124c
           cryptomgr_probe+0x5c/0x1e8
           kthread+0x258/0x318
           ret_from_fork+0x10/0x1c
      
      Fixes: 4e0958d1 ("crypto: cryptd - Add support for skcipher")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e961e89
    • Eric Biggers's avatar
      crypto: user - prevent operating on larval algorithms · 5781d162
      Eric Biggers authored
      commit 21d4120e upstream.
      
      Michal Suchanek reported [1] that running the pcrypt_aead01 test from
      LTP [2] in a loop and holding Ctrl-C causes a NULL dereference of
      alg->cra_users.next in crypto_remove_spawns(), via crypto_del_alg().
      The test repeatedly uses CRYPTO_MSG_NEWALG and CRYPTO_MSG_DELALG.
      
      The crash occurs when the instance that CRYPTO_MSG_DELALG is trying to
      unregister isn't a real registered algorithm, but rather is a "test
      larval", which is a special "algorithm" added to the algorithms list
      while the real algorithm is still being tested.  Larvals don't have
      initialized cra_users, so that causes the crash.  Normally pcrypt_aead01
      doesn't trigger this because CRYPTO_MSG_NEWALG waits for the algorithm
      to be tested; however, CRYPTO_MSG_NEWALG returns early when interrupted.
      
      Everything else in the "crypto user configuration" API has this same bug
      too, i.e. it inappropriately allows operating on larval algorithms
      (though it doesn't look like the other cases can cause a crash).
      
      Fix this by making crypto_alg_match() exclude larval algorithms.
      
      [1] https://lkml.kernel.org/r/20190625071624.27039-1-msuchanek@suse.de
      [2] https://github.com/linux-test-project/ltp/blob/20190517/testcases/kernel/crypto/pcrypt_aead01.cReported-by: default avatarMichal Suchanek <msuchanek@suse.de>
      Fixes: a38f7907 ("crypto: Add userspace configuration API")
      Cc: <stable@vger.kernel.org> # v3.2+
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5781d162
    • Jann Horn's avatar
      ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME · 49887fc3
      Jann Horn authored
      commit 6994eefb upstream.
      
      Fix two issues:
      
      When called for PTRACE_TRACEME, ptrace_link() would obtain an RCU
      reference to the parent's objective credentials, then give that pointer
      to get_cred().  However, the object lifetime rules for things like
      struct cred do not permit unconditionally turning an RCU reference into
      a stable reference.
      
      PTRACE_TRACEME records the parent's credentials as if the parent was
      acting as the subject, but that's not the case.  If a malicious
      unprivileged child uses PTRACE_TRACEME and the parent is privileged, and
      at a later point, the parent process becomes attacker-controlled
      (because it drops privileges and calls execve()), the attacker ends up
      with control over two processes with a privileged ptrace relationship,
      which can be abused to ptrace a suid binary and obtain root privileges.
      
      Fix both of these by always recording the credentials of the process
      that is requesting the creation of the ptrace relationship:
      current_cred() can't change under us, and current is the proper subject
      for access control.
      
      This change is theoretically userspace-visible, but I am not aware of
      any code that it will actually break.
      
      Fixes: 64b875f7 ("ptrace: Capture the ptracer's creds not PT_PTRACE_CAP")
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49887fc3
    • Wei Li's avatar
      ftrace: Fix NULL pointer dereference in free_ftrace_func_mapper() · 1c7c91a9
      Wei Li authored
      [ Upstream commit 04e03d9a ]
      
      The mapper may be NULL when called from register_ftrace_function_probe()
      with probe->data == NULL.
      
      This issue can be reproduced as follow (it may be covered by compiler
      optimization sometime):
      
      / # cat /sys/kernel/debug/tracing/set_ftrace_filter
      #### all functions enabled ####
      / # echo foo_bar:dump > /sys/kernel/debug/tracing/set_ftrace_filter
      [  206.949100] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      [  206.952402] Mem abort info:
      [  206.952819]   ESR = 0x96000006
      [  206.955326]   Exception class = DABT (current EL), IL = 32 bits
      [  206.955844]   SET = 0, FnV = 0
      [  206.956272]   EA = 0, S1PTW = 0
      [  206.956652] Data abort info:
      [  206.957320]   ISV = 0, ISS = 0x00000006
      [  206.959271]   CM = 0, WnR = 0
      [  206.959938] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000419f3a000
      [  206.960483] [0000000000000000] pgd=0000000411a87003, pud=0000000411a83003, pmd=0000000000000000
      [  206.964953] Internal error: Oops: 96000006 [#1] SMP
      [  206.971122] Dumping ftrace buffer:
      [  206.973677]    (ftrace buffer empty)
      [  206.975258] Modules linked in:
      [  206.976631] Process sh (pid: 281, stack limit = 0x(____ptrval____))
      [  206.978449] CPU: 10 PID: 281 Comm: sh Not tainted 5.2.0-rc1+ #17
      [  206.978955] Hardware name: linux,dummy-virt (DT)
      [  206.979883] pstate: 60000005 (nZCv daif -PAN -UAO)
      [  206.980499] pc : free_ftrace_func_mapper+0x2c/0x118
      [  206.980874] lr : ftrace_count_free+0x68/0x80
      [  206.982539] sp : ffff0000182f3ab0
      [  206.983102] x29: ffff0000182f3ab0 x28: ffff8003d0ec1700
      [  206.983632] x27: ffff000013054b40 x26: 0000000000000001
      [  206.984000] x25: ffff00001385f000 x24: 0000000000000000
      [  206.984394] x23: ffff000013453000 x22: ffff000013054000
      [  206.984775] x21: 0000000000000000 x20: ffff00001385fe28
      [  206.986575] x19: ffff000013872c30 x18: 0000000000000000
      [  206.987111] x17: 0000000000000000 x16: 0000000000000000
      [  206.987491] x15: ffffffffffffffb0 x14: 0000000000000000
      [  206.987850] x13: 000000000017430e x12: 0000000000000580
      [  206.988251] x11: 0000000000000000 x10: cccccccccccccccc
      [  206.988740] x9 : 0000000000000000 x8 : ffff000013917550
      [  206.990198] x7 : ffff000012fac2e8 x6 : ffff000012fac000
      [  206.991008] x5 : ffff0000103da588 x4 : 0000000000000001
      [  206.991395] x3 : 0000000000000001 x2 : ffff000013872a28
      [  206.991771] x1 : 0000000000000000 x0 : 0000000000000000
      [  206.992557] Call trace:
      [  206.993101]  free_ftrace_func_mapper+0x2c/0x118
      [  206.994827]  ftrace_count_free+0x68/0x80
      [  206.995238]  release_probe+0xfc/0x1d0
      [  206.995555]  register_ftrace_function_probe+0x4a8/0x868
      [  206.995923]  ftrace_trace_probe_callback.isra.4+0xb8/0x180
      [  206.996330]  ftrace_dump_callback+0x50/0x70
      [  206.996663]  ftrace_regex_write.isra.29+0x290/0x3a8
      [  206.997157]  ftrace_filter_write+0x44/0x60
      [  206.998971]  __vfs_write+0x64/0xf0
      [  206.999285]  vfs_write+0x14c/0x2f0
      [  206.999591]  ksys_write+0xbc/0x1b0
      [  206.999888]  __arm64_sys_write+0x3c/0x58
      [  207.000246]  el0_svc_common.constprop.0+0x408/0x5f0
      [  207.000607]  el0_svc_handler+0x144/0x1c8
      [  207.000916]  el0_svc+0x8/0xc
      [  207.003699] Code: aa0003f8 a9025bf5 aa0103f5 f946ea80 (f9400303)
      [  207.008388] ---[ end trace 7b6d11b5f542bdf1 ]---
      [  207.010126] Kernel panic - not syncing: Fatal exception
      [  207.011322] SMP: stopping secondary CPUs
      [  207.013956] Dumping ftrace buffer:
      [  207.014595]    (ftrace buffer empty)
      [  207.015632] Kernel Offset: disabled
      [  207.017187] CPU features: 0x002,20006008
      [  207.017985] Memory Limit: none
      [  207.019825] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Link: http://lkml.kernel.org/r/20190606031754.10798-1-liwei391@huawei.comSigned-off-by: default avatarWei Li <liwei391@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1c7c91a9
    • Josh Poimboeuf's avatar
      module: Fix livepatch/ftrace module text permissions race · 77dd6d8e
      Josh Poimboeuf authored
      [ Upstream commit 9f255b63 ]
      
      It's possible for livepatch and ftrace to be toggling a module's text
      permissions at the same time, resulting in the following panic:
      
        BUG: unable to handle page fault for address: ffffffffc005b1d9
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0003) - permissions violation
        PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
        Oops: 0003 [#1] PREEMPT SMP PTI
        CPU: 1 PID: 453 Comm: insmod Tainted: G           O  K   5.2.0-rc1-a188339c #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
        RIP: 0010:apply_relocate_add+0xbe/0x14c
        Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
        RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
        RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
        RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
        RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
        R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
        R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
        FS:  00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         klp_init_object_loaded+0x10f/0x219
         ? preempt_latency_start+0x21/0x57
         klp_enable_patch+0x662/0x809
         ? virt_to_head_page+0x3a/0x3c
         ? kfree+0x8c/0x126
         patch_init+0x2ed/0x1000 [livepatch_test02]
         ? 0xffffffffc0060000
         do_one_initcall+0x9f/0x1c5
         ? kmem_cache_alloc_trace+0xc4/0xd4
         ? do_init_module+0x27/0x210
         do_init_module+0x5f/0x210
         load_module+0x1c41/0x2290
         ? fsnotify_path+0x3b/0x42
         ? strstarts+0x2b/0x2b
         ? kernel_read+0x58/0x65
         __do_sys_finit_module+0x9f/0xc3
         ? __do_sys_finit_module+0x9f/0xc3
         __x64_sys_finit_module+0x1a/0x1c
         do_syscall_64+0x52/0x61
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The above panic occurs when loading two modules at the same time with
      ftrace enabled, where at least one of the modules is a livepatch module:
      
      CPU0					CPU1
      klp_enable_patch()
        klp_init_object_loaded()
          module_disable_ro()
          					ftrace_module_enable()
      					  ftrace_arch_code_modify_post_process()
      				    	    set_all_modules_text_ro()
            klp_write_object_relocations()
              apply_relocate_add()
      	  *patches read-only code* - BOOM
      
      A similar race exists when toggling ftrace while loading a livepatch
      module.
      
      Fix it by ensuring that the livepatch and ftrace code patching
      operations -- and their respective permissions changes -- are protected
      by the text_mutex.
      
      Link: http://lkml.kernel.org/r/ab43d56ab909469ac5d2520c5d944ad6d4abd476.1560474114.git.jpoimboe@redhat.comReported-by: default avatarJohannes Erdfelt <johannes@erdfelt.com>
      Fixes: 444d13ff ("modules: add ro_after_init support")
      Acked-by: default avatarJessica Yu <jeyu@kernel.org>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.com>
      Reviewed-by: default avatarMiroslav Benes <mbenes@suse.cz>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      77dd6d8e
    • Vasily Gorbik's avatar
      tracing: avoid build warning with HAVE_NOP_MCOUNT · b61581d5
      Vasily Gorbik authored
      [ Upstream commit cbdaeaf0 ]
      
      Selecting HAVE_NOP_MCOUNT enables -mnop-mcount (if gcc supports it)
      and sets CC_USING_NOP_MCOUNT. Reuse __is_defined (which is suitable for
      testing CC_USING_* defines) to avoid conditional compilation and fix
      the following gcc 9 warning on s390:
      
      kernel/trace/ftrace.c:2514:1: warning: ‘ftrace_code_disable’ defined
      but not used [-Wunused-function]
      
      Link: http://lkml.kernel.org/r/patch.git-1a82d13f33ac.your-ad-here.call-01559732716-ext-6629@work.hours
      
      Fixes: 2f4df001 ("tracing: Add -mcount-nop option support")
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b61581d5
    • swkhack's avatar
      mm/mlock.c: change count_mm_mlocked_page_nr return type · 12006942
      swkhack authored
      [ Upstream commit 0874bb49 ]
      
      On a 64-bit machine the value of "vma->vm_end - vma->vm_start" may be
      negative when using 32 bit ints and the "count >> PAGE_SHIFT"'s result
      will be wrong.  So change the local variable and return value to
      unsigned long to fix the problem.
      
      Link: http://lkml.kernel.org/r/20190513023701.83056-1-swkhack@gmail.com
      Fixes: 0cf2f6f6 ("mm: mlock: check against vma for actual mlock() size")
      Signed-off-by: default avatarswkhack <swkhack@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      12006942
    • Manuel Traut's avatar
      scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE · 80d567f7
      Manuel Traut authored
      [ Upstream commit c04e32e9 ]
      
      At least for ARM64 kernels compiled with the crosstoolchain from
      Debian/stretch or with the toolchain from kernel.org the line number is
      not decoded correctly by 'decode_stacktrace.sh':
      
        $ echo "[  136.513051]  f1+0x0/0xc [kcrash]" | \
          CROSS_COMPILE=/opt/gcc-8.1.0-nolibc/aarch64-linux/bin/aarch64-linux- \
         ./scripts/decode_stacktrace.sh /scratch/linux-arm64/vmlinux \
                                        /scratch/linux-arm64 \
                                        /nfs/debian/lib/modules/4.20.0-devel
        [  136.513051] f1 (/linux/drivers/staging/kcrash/kcrash.c:68) kcrash
      
      If addr2line from the toolchain is used the decoded line number is correct:
      
        [  136.513051] f1 (/linux/drivers/staging/kcrash/kcrash.c:57) kcrash
      
      Link: http://lkml.kernel.org/r/20190527083425.3763-1-manut@linutronix.deSigned-off-by: default avatarManuel Traut <manut@linutronix.de>
      Acked-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      80d567f7
    • Joel Savitz's avatar
      cpuset: restore sanity to cpuset_cpus_allowed_fallback() · 8f4b554b
      Joel Savitz authored
      [ Upstream commit d477f8c2 ]
      
      In the case that a process is constrained by taskset(1) (i.e.
      sched_setaffinity(2)) to a subset of available cpus, and all of those are
      subsequently offlined, the scheduler will set tsk->cpus_allowed to
      the current value of task_cs(tsk)->effective_cpus.
      
      This is done via a call to do_set_cpus_allowed() in the context of
      cpuset_cpus_allowed_fallback() made by the scheduler when this case is
      detected. This is the only call made to cpuset_cpus_allowed_fallback()
      in the latest mainline kernel.
      
      However, this is not sane behavior.
      
      I will demonstrate this on a system running the latest upstream kernel
      with the following initial configuration:
      
      	# grep -i cpu /proc/$$/status
      	Cpus_allowed:	ffffffff,fffffff
      	Cpus_allowed_list:	0-63
      
      (Where cpus 32-63 are provided via smt.)
      
      If we limit our current shell process to cpu2 only and then offline it
      and reonline it:
      
      	# taskset -p 4 $$
      	pid 2272's current affinity mask: ffffffffffffffff
      	pid 2272's new affinity mask: 4
      
      	# echo off > /sys/devices/system/cpu/cpu2/online
      	# dmesg | tail -3
      	[ 2195.866089] process 2272 (bash) no longer affine to cpu2
      	[ 2195.872700] IRQ 114: no longer affine to CPU2
      	[ 2195.879128] smpboot: CPU 2 is now offline
      
      	# echo on > /sys/devices/system/cpu/cpu2/online
      	# dmesg | tail -1
      	[ 2617.043572] smpboot: Booting Node 0 Processor 2 APIC 0x4
      
      We see that our current process now has an affinity mask containing
      every cpu available on the system _except_ the one we originally
      constrained it to:
      
      	# grep -i cpu /proc/$$/status
      	Cpus_allowed:   ffffffff,fffffffb
      	Cpus_allowed_list:      0-1,3-63
      
      This is not sane behavior, as the scheduler can now not only place the
      process on previously forbidden cpus, it can't even schedule it on
      the cpu it was originally constrained to!
      
      Other cases result in even more exotic affinity masks. Take for instance
      a process with an affinity mask containing only cpus provided by smt at
      the moment that smt is toggled, in a configuration such as the following:
      
      	# taskset -p f000000000 $$
      	# grep -i cpu /proc/$$/status
      	Cpus_allowed:	000000f0,00000000
      	Cpus_allowed_list:	36-39
      
      A double toggle of smt results in the following behavior:
      
      	# echo off > /sys/devices/system/cpu/smt/control
      	# echo on > /sys/devices/system/cpu/smt/control
      	# grep -i cpus /proc/$$/status
      	Cpus_allowed:	ffffff00,ffffffff
      	Cpus_allowed_list:	0-31,40-63
      
      This is even less sane than the previous case, as the new affinity mask
      excludes all smt-provided cpus with ids less than those that were
      previously in the affinity mask, as well as those that were actually in
      the mask.
      
      With this patch applied, both of these cases end in the following state:
      
      	# grep -i cpu /proc/$$/status
      	Cpus_allowed:	ffffffff,ffffffff
      	Cpus_allowed_list:	0-63
      
      The original policy is discarded. Though not ideal, it is the simplest way
      to restore sanity to this fallback case without reinventing the cpuset
      wheel that rolls down the kernel just fine in cgroup v2. A user who wishes
      for the previous affinity mask to be restored in this fallback case can use
      that mechanism instead.
      
      This patch modifies scheduler behavior by instead resetting the mask to
      task_cs(tsk)->cpus_allowed by default, and cpu_possible mask in legacy
      mode. I tested the cases above on both modes.
      
      Note that the scheduler uses this fallback mechanism if and only if
      _every_ other valid avenue has been traveled, and it is the last resort
      before calling BUG().
      Suggested-by: default avatarWaiman Long <longman@redhat.com>
      Suggested-by: default avatarPhil Auld <pauld@redhat.com>
      Signed-off-by: default avatarJoel Savitz <jsavitz@redhat.com>
      Acked-by: default avatarPhil Auld <pauld@redhat.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8f4b554b
    • Will Deacon's avatar
      arm64: tlbflush: Ensure start/end of address range are aligned to stride · e2beb74a
      Will Deacon authored
      [ Upstream commit 01d57485 ]
      
      Since commit 3d65b6bb ("arm64: tlbi: Set MAX_TLBI_OPS to
      PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to
      perform more than PTRS_PER_PTE invalidation instructions in a single
      call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather
      code does not ensure that the end address of the range is rounded-up
      to the stride when freeing intermediate page tables in pXX_free_tlb(),
      which defeats our range checking.
      
      Align the bounds passed into __flush_tlb_range().
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reported-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Tested-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Reviewed-by: default avatarHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e2beb74a
    • Linus Walleij's avatar
      i2c: pca-platform: Fix GPIO lookup code · 60bc55de
      Linus Walleij authored
      [ Upstream commit a0cac264 ]
      
      The devm_gpiod_request_gpiod() call will add "-gpios" to
      any passed connection ID before looking it up.
      
      I do not think the reset GPIO on this platform is named
      "reset-gpios-gpios" but rather "reset-gpios" in the device
      tree, so fix this up so that we get a proper reset GPIO
      handle.
      
      Also drop the inclusion of the legacy GPIO header.
      
      Fixes: 0e8ce93b ("i2c: pca-platform: add devicetree awareness")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: default avatarChris Packham <chris.packham@alliedtelesis.co.nz>
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      60bc55de