1. 24 Sep, 2021 23 commits
  2. 23 Sep, 2021 17 commits
    • Dave Airlie's avatar
      Merge tag 'drm-intel-fixes-2021-09-23' of... · ef88d7a8
      Dave Airlie authored
      Merge tag 'drm-intel-fixes-2021-09-23' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      drm/i915 fixes for v5.15-rc3:
      - Fix ADL-P memory bandwidth parameters
      - Fix memory corruption due to a double free
      - Fix memory leak in DMC firmware handling
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Jani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/87o88jbk3o.fsf@intel.com
      ef88d7a8
    • Dave Airlie's avatar
      Merge tag 'amd-drm-fixes-5.15-2021-09-23' of... · 22a94600
      Dave Airlie authored
      Merge tag 'amd-drm-fixes-5.15-2021-09-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
      
      amd-drm-fixes-5.15-2021-09-23:
      
      amdgpu:
      - Update MAINTAINERS entry for powerplay
      - Fix empty macros
      - SI DPM fix
      
      amdkfd:
      - SVM fixes
      - DMA mapping fix
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexander.deucher@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20210923211330.20725-1-alexander.deucher@amd.com
      22a94600
    • Linus Torvalds's avatar
      Merge tag 'for-5.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · f9e36107
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - regression fix for leak of transaction handle after verity rollback
         failure
      
       - properly reset device last error between mounts
      
       - improve one error handling case when checksumming bios
      
       - fixup confusing displayed size of space info free space
      
      * tag 'for-5.15-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: prevent __btrfs_dump_space_info() to underflow its free space
        btrfs: fix mount failure due to past and transient device flush error
        btrfs: fix transaction handle leak after verity rollback failure
        btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling
      f9e36107
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20210923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 831c9bd3
      Linus Torvalds authored
      Pull SELinux/Smack fixes from Paul Moore:
       "Another single-patch pull request for SELinux, as well as Smack.
      
        This fixes some credential misuse and is explained reasonably well in
        the patch description"
      
      * tag 'selinux-pr-20210923' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux,smack: fix subjective/objective credential use mixups
      831c9bd3
    • Philip Yang's avatar
      drm/amdkfd: fix svm_migrate_fini warning · 197ae177
      Philip Yang authored
      Device manager releases device-specific resources when a driver
      disconnects from a device, devm_memunmap_pages and
      devm_release_mem_region calls in svm_migrate_fini are redundant.
      
      It causes below warning trace after patch "drm/amdgpu: Split
      amdgpu_device_fini into early and late", so remove function
      svm_migrate_fini.
      
      BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718
      
      WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795
      devm_release_action+0x51/0x60
      Call Trace:
          ? memunmap_pages+0x360/0x360
          svm_migrate_fini+0x2d/0x60 [amdgpu]
          kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
          amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
          amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
          amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
          drm_dev_release+0x20/0x40 [drm]
          release_nodes+0x196/0x1e0
          device_release_driver_internal+0x104/0x1d0
          driver_detach+0x47/0x90
          bus_remove_driver+0x7a/0xd0
          pci_unregister_driver+0x3d/0x90
          amdgpu_exit+0x11/0x20 [amdgpu]
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      197ae177
    • Philip Yang's avatar
      drm/amdkfd: handle svm migrate init error · 7d668720
      Philip Yang authored
      If svm migration init failed to create pgmap for device memory, set
      pgmap type to 0 to disable device SVM support capability.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      7d668720
    • Lijo Lazar's avatar
      drm/amd/pm: Update intermediate power state for SI · ab39d3ce
      Lijo Lazar authored
      Update the current state as boot state during dpm initialization.
      During the subsequent initialization, set_power_state gets called to
      transition to the final power state. set_power_state refers to values
      from the current state and without current state populated, it could
      result in NULL pointer dereference.
      
      For ex: on platforms where PCI speed change is supported through ACPI
      ATCS method, the link speed of current state needs to be queried before
      deciding on changing to final power state's link speed. The logic to query
      ATCS-support was broken on certain platforms. The issue became visible
      when broken ATCS-support logic got fixed with commit
      f9b7f370 ("drm/amdgpu/acpi: make ATPX/ATCS structures global (v2)").
      
      Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1698Signed-off-by: default avatarLijo Lazar <lijo.lazar@amd.com>
      Reviewed-by: default avatarHawking Zhang <Hawking.Zhang@amd.com>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      ab39d3ce
    • Philip Yang's avatar
      drm/amdkfd: fix dma mapping leaking warning · f6325118
      Philip Yang authored
      For xnack off, restore work dma unmap previous system memory page, and
      dma map the updated system memory page to update GPU mapping, this is
      not dma mapping leaking, remove the WARN_ONCE for dma mapping leaking.
      
      prange->dma_addr store the VRAM page pfn after the range migrated to
      VRAM, should not dma unmap VRAM page when updating GPU mapping or
      remove prange. Add helper svm_is_valid_dma_mapping_addr to check VRAM
      page and error cases.
      
      Mask out SVM_RANGE_VRAM_DOMAIN flag in dma_addr before calling amdgpu vm
      update to avoid BUG_ON(*addr & 0xFFFF00000000003FULL), and set it again
      immediately after. This flag is used to know the type of page later to
      dma unmapping system memory page.
      
      Fixes: 1d5dbfe6 ("drm/amdkfd: classify and map mixed svm range pages in GPU")
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      f6325118
    • Philip Yang's avatar
      drm/amdkfd: SVM map to gpus check vma boundary · 7beb26dc
      Philip Yang authored
      SVM range may includes multiple VMAs with different vm_flags, if prange
      page index is the last page of the VMA offset + npages, update GPU
      mapping to create GPU page table with same VMA access permission.
      Signed-off-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: default avatarFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      7beb26dc
    • Alex Deucher's avatar
      MAINTAINERS: fix up entry for AMD Powerplay · 6de0653f
      Alex Deucher authored
      Fix the path to cover both the older powerplay infrastructure
      and the newer SwSMU infrastructure.
      Reviewed-by: default avatarEvan Quan <evan.quan@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      6de0653f
    • Arnd Bergmann's avatar
      drm/amd/display: fix empty debug macros · c48977f0
      Arnd Bergmann authored
      Using an empty macro expansion as a conditional expression
      produces a W=1 warning:
      
      drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c: In function 'dce_aux_transfer_with_retries':
      drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:775:156: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
        775 |                                                                 "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_DEFER");
            |                                                                                                                                                            ^
      drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_aux.c:783:155: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body]
        783 |                                                                 "dce_aux_transfer_with_retries: AUX_RET_SUCCESS: AUX_TRANSACTION_REPLY_I2C_OVER_AUX_NACK");
            |                                                                                                                                                           ^
      
      Expand it to "do { } while (0)" instead to make the expression
      more robust and avoid the warning.
      
      Fixes: 56aca230 ("drm/amd/display: Add AUX I2C tracing.")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      c48977f0
    • Jia He's avatar
      Revert "ACPI: Add memory semantics to acpi_os_map_memory()" · 12064c17
      Jia He authored
      This reverts commit 437b38c5.
      
      The memory semantics added in commit 437b38c5 causes SystemMemory
      Operation region, whose address range is not described in the EFI memory
      map to be mapped as NormalNC memory on arm64 platforms (through
      acpi_os_map_memory() in acpi_ex_system_memory_space_handler()).
      
      This triggers the following abort on an ARM64 Ampere eMAG machine,
      because presumably the physical address range area backing the Opregion
      does not support NormalNC memory attributes driven on the bus.
      
       Internal error: synchronous external abort: 96000410 [#1] SMP
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0+ #462
       Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 0.14 02/22/2019
       pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [...snip...]
       Call trace:
        acpi_ex_system_memory_space_handler+0x26c/0x2c8
        acpi_ev_address_space_dispatch+0x228/0x2c4
        acpi_ex_access_region+0x114/0x268
        acpi_ex_field_datum_io+0x128/0x1b8
        acpi_ex_extract_from_field+0x14c/0x2ac
        acpi_ex_read_data_from_field+0x190/0x1b8
        acpi_ex_resolve_node_to_value+0x1ec/0x288
        acpi_ex_resolve_to_value+0x250/0x274
        acpi_ds_evaluate_name_path+0xac/0x124
        acpi_ds_exec_end_op+0x90/0x410
        acpi_ps_parse_loop+0x4ac/0x5d8
        acpi_ps_parse_aml+0xe0/0x2c8
        acpi_ps_execute_method+0x19c/0x1ac
        acpi_ns_evaluate+0x1f8/0x26c
        acpi_ns_init_one_device+0x104/0x140
        acpi_ns_walk_namespace+0x158/0x1d0
        acpi_ns_initialize_devices+0x194/0x218
        acpi_initialize_objects+0x48/0x50
        acpi_init+0xe0/0x498
      
      If the Opregion address range is not present in the EFI memory map there
      is no way for us to determine the memory attributes to use to map it -
      defaulting to NormalNC does not work (and it is not correct on a memory
      region that may have read side-effects) and therefore commit
      437b38c5 should be reverted, which means reverting back to the
      original behavior whereby address ranges that are mapped using
      acpi_os_map_memory() default to the safe devicenGnRnE attributes on
      ARM64 if the mapped address range is not defined in the EFI memory map.
      
      Fixes: 437b38c5 ("ACPI: Add memory semantics to acpi_os_map_memory()")
      Signed-off-by: default avatarJia He <justin.he@arm.com>
      Acked-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      12064c17
    • Linus Torvalds's avatar
      Merge tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvm · f10f0481
      Linus Torvalds authored
      Pull rseq fixes from Paolo Bonzini:
       "A fix for a bug with restartable sequences and KVM.
      
        KVM's handling of TIF_NOTIFY_RESUME, e.g. for task migration, clears
        the flag without informing rseq and leads to stale data in userspace's
        rseq struct"
      
      * tag 'for-linus-rseq' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: selftests: Remove __NR_userfaultfd syscall fallback
        KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
        tools: Move x86 syscall number fallbacks to .../uapi/
        entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
        KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
      f10f0481
    • Linus Torvalds's avatar
      Merge tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9bc62afe
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Current release - regressions:
      
         - dsa: bcm_sf2: fix array overrun in bcm_sf2_num_active_ports()
      
        Previous releases - regressions:
      
         - introduce a shutdown method to mdio device drivers, and make DSA
           switch drivers compatible with masters disappearing on shutdown;
           preventing infinite reference wait
      
         - fix issues in mdiobus users related to ->shutdown vs ->remove
      
         - virtio-net: fix pages leaking when building skb in big mode
      
         - xen-netback: correct success/error reporting for the
           SKB-with-fraglist
      
         - dsa: tear down devlink port regions when tearing down the devlink
           port on error
      
         - nexthop: fix division by zero while replacing a resilient group
      
         - hns3: check queue, vf, vlan ids range before using
      
        Previous releases - always broken:
      
         - napi: fix race against netpoll causing NAPI getting stuck
      
         - mlx4_en: ensure link operstate is updated even if link comes up
           before netdev registration
      
         - bnxt_en: fix TX timeout when TX ring size is set to the smallest
      
         - enetc: fix illegal access when reading affinity_hint; prevent oops
           on sysfs access
      
         - mtk_eth_soc: avoid creating duplicate offload entries
      
        Misc:
      
         - core: correct the sock::sk_lock.owned lockdep annotations"
      
      * tag 'net-5.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
        atlantic: Fix issue in the pm resume flow.
        net/mlx4_en: Don't allow aRFS for encapsulated packets
        net: mscc: ocelot: fix forwarding from BLOCKING ports remaining enabled
        net: ethernet: mtk_eth_soc: avoid creating duplicate offload entries
        nfc: st-nci: Add SPI ID matching DT compatible
        MAINTAINERS: remove Guvenc Gulce as net/smc maintainer
        nexthop: Fix memory leaks in nexthop notification chain listeners
        mptcp: ensure tx skbs always have the MPTCP ext
        qed: rdma - don't wait for resources under hw error recovery flow
        s390/qeth: fix deadlock during failing recovery
        s390/qeth: Fix deadlock in remove_discipline
        s390/qeth: fix NULL deref in qeth_clear_working_pool_list()
        net: dsa: realtek: register the MDIO bus under devres
        net: dsa: don't allocate the slave_mii_bus using devres
        Doc: networking: Fox a typo in ice.rst
        net: dsa: fix dsa_tree_setup error path
        net/smc: fix 'workqueue leaked lock' in smc_conn_abort_work
        net/smc: add missing error check in smc_clc_prfx_set()
        net: hns3: fix a return value error in hclge_get_reset_status()
        net: hns3: check vlan id before using it
        ...
      9bc62afe
    • Shakeel Butt's avatar
      memcg: flush lruvec stats in the refault · 1f828223
      Shakeel Butt authored
      Prior to the commit 7e1c0d6f ("memcg: switch lruvec stats to rstat")
      and the commit aa48e47e ("memcg: infrastructure to flush memcg
      stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
      32) at worst and for unbounded amount of time.  The commit aa48e47e
      moved the lruvec stats to rstat infrastructure and the commit
      7e1c0d6f bounded the error for all the lruvec stats to (nr_cpus *
      32) at worst for at most 2 seconds.  More specifically it decoupled the
      number of stats and the number of cgroups from the error rate.
      
      However this reduction in error comes with the cost of triggering the
      slowpath of stats update more frequently.  Previously in the slowpath
      the kernel adds the stats up the memcg tree.  After aa48e47e, the
      kernel triggers the asyn lruvec stats flush through queue_work().  This
      causes regression reports from 0day kernel bot [1] as well as from
      phoronix test suite [2].
      
      We tried two options to fix the regression:
      
       1) Increase the threshold to trigger the slowpath in lruvec stats
          update codepath from 32 to 512.
      
       2) Remove the slowpath from lruvec stats update codepath and instead
          flush the stats in the page refault codepath. The assumption is that
          the kernel timely flush the stats, so, the update tree would be
          small in the refault codepath to not cause the preformance impact.
      
      Following are the results of will-it-scale/page_fault[1|2|3] benchmark
      on four settings i.e.  (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
      aa48e47e and 7e1c0d6f reverted (3) 5.15-rc1 with option-1
      (4) 5.15-rc1 with option-2.
      
        test       (1)      (2)               (3)               (4)
        pg_f1   368563   406277 (10.23%)   399693  (8.44%)   416398 (12.97%)
        pg_f2   338399   372133  (9.96%)   369180  (9.09%)   381024 (12.59%)
        pg_f3   500853   575399 (14.88%)   570388 (13.88%)   576083 (15.02%)
      
      From the above result, it seems like the option-2 not only solves the
      regression but also improves the performance for at least these
      benchmarks.
      
      Feng Tang (intel) ran the aim7 benchmark with these two options and
      confirms that option-1 reduces the regression but option-2 removes the
      regression.
      
      Michael Larabel (phoronix) ran multiple benchmarks with these options
      and reported the results at [3] and it shows for most benchmarks
      option-2 removes the regression introduced by the commit aa48e47e
      ("memcg: infrastructure to flush memcg stats").
      
      Based on the experiment results, this patch proposed the option-2 as the
      solution to resolve the regression.
      
      Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1]
      Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2]
      Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3]
      Fixes: aa48e47e ("memcg: infrastructure to flush memcg stats")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Tested-by: default avatarMichael Larabel <Michael@phoronix.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hillf Danton <hdanton@sina.com>,
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>,
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1f828223
    • Paul Moore's avatar
      selinux,smack: fix subjective/objective credential use mixups · a3727a8b
      Paul Moore authored
      Jann Horn reported a problem with commit eb1231f7 ("selinux:
      clarify task subjective and objective credentials") where some LSM
      hooks were attempting to access the subjective credentials of a task
      other than the current task.  Generally speaking, it is not safe to
      access another task's subjective credentials and doing so can cause
      a number of problems.
      
      Further, while looking into the problem, I realized that Smack was
      suffering from a similar problem brought about by a similar commit
      1fb057dc ("smack: differentiate between subjective and objective
      task credentials").
      
      This patch addresses this problem by restoring the use of the task's
      objective credentials in those cases where the task is other than the
      current executing task.  Not only does this resolve the problem
      reported by Jann, it is arguably the correct thing to do in these
      cases.
      
      Cc: stable@vger.kernel.org
      Fixes: eb1231f7 ("selinux: clarify task subjective and objective credentials")
      Fixes: 1fb057dc ("smack: differentiate between subjective and objective task credentials")
      Reported-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Acked-by: default avatarCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      a3727a8b
    • dann frazier's avatar
      arm64: Restore forced disabling of KPTI on ThunderX · 22b70e6f
      dann frazier authored
      A noted side-effect of commit 0c6c2d36 ("arm64: Generate cpucaps.h")
      is that cpucaps are now sorted, changing the enumeration order. This
      assumed no dependencies between cpucaps, which turned out not to be true
      in one case. UNMAP_KERNEL_AT_EL0 currently needs to be processed after
      WORKAROUND_CAVIUM_27456. ThunderX systems are incompatible with KPTI, so
      unmap_kernel_at_el0() bails if WORKAROUND_CAVIUM_27456 is set. But because
      of the sorting, WORKAROUND_CAVIUM_27456 will not yet have been considered
      when unmap_kernel_at_el0() checks for it, so the kernel tries to
      run w/ KPTI - and quickly falls over.
      
      Because all ThunderX implementations have homogeneous CPUs, we can remove
      this dependency by just checking the current CPU for the erratum.
      
      Fixes: 0c6c2d36 ("arm64: Generate cpucaps.h")
      Cc: <stable@vger.kernel.org> # 5.13.x
      Signed-off-by: default avatardann frazier <dann.frazier@canonical.com>
      Suggested-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Reviewed-by: default avatarSuzuki K Poulose <suzuki.poulose@arm.com>
      Reviewed-by: default avatarMark Brown <broonie@kernel.org>
      Acked-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20210923145002.3394558-1-dann.frazier@canonical.comSigned-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      22b70e6f