1. 17 Oct, 2021 5 commits
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode · 9d449c71
      Marc Zyngier authored
      Having realised that a virtual LPI does transition through an active
      state that does not exist on bare metal, align the CPU interface
      emulation with the behaviour specified in the architecture pseudocode.
      
      The LPIs now transition to active on IAR read, and to inactive on
      EOI write. Special care is taken not to increment the EOIcount for
      an LPI that isn't present in the LRs.
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211010150910.2911495-6-maz@kernel.org
      9d449c71
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS · f87ab682
      Marc Zyngier authored
      Since we are trapping all sysreg accesses when ICH_VTR_EL2.SEIS
      is set, and that we never deliver an SError when emulating
      any of the GICv3 sysregs, don't advertise ICC_CTLR_EL1.SEIS.
      Reviewed-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211010150910.2911495-5-maz@kernel.org
      f87ab682
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible · 0924729b
      Marc Zyngier authored
      On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
      accesses from the guest. From a performance perspective, this is OK
      as long as the guest doesn't hammer the GICv3 CPU interface.
      
      In most cases, this is fine, unless the guest actively uses
      priorities and switches PMR_EL1 very often. Which is exactly what
      happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
      In these condition, the performance plumets as we hit PMR each time
      we mask/unmask interrupts. Not good.
      
      There is however an opportunity for improvement. Careful reading
      of the architecture specification indicates that the only GICv3
      sysreg belonging to the common group (which contains the SGI
      registers, PMR, DIR, CTLR and RPR) that is allowed to generate
      a SError is DIR. Everything else is safe.
      
      It is thus possible to substitute the trapping of all the common
      group with just that of DIR if it supported by the implementation.
      Yes, that's yet another optional bit of the architecture.
      So let's just do that, as it leads to some impressive result on
      the M1:
      
      Without this change:
      	bash-5.1# /host/home/maz/hackbench 100 process 1000
      	Running with 100*40 (== 4000) tasks.
      	Time: 56.596
      
      With this change:
      	bash-5.1# /host/home/maz/hackbench 100 process 1000
      	Running with 100*40 (== 4000) tasks.
      	Time: 8.649
      
      which is a pretty convincing result.
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
      0924729b
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors · df652bcf
      Marc Zyngier authored
      The infamous M1 has a feature nobody else ever implemented,
      in the form of the "GIC locally generated SError interrupts",
      also known as SEIS for short.
      
      These SErrors are generated when a guest does something that violates
      the GIC state machine. It would have been simpler to just *ignore*
      the damned thing, but that's not what this HW does. Oh well.
      
      This part of of the architecture is also amazingly under-specified.
      There is a whole 10 lines that describe the feature in a spec that
      is 930 pages long, and some of these lines are factually wrong.
      Oh, and it is deprecated, so the insentive to clarify it is low.
      
      Now, the spec says that this should be a *virtual* SError when
      HCR_EL2.AMO is set. As it turns out, that's not always the case
      on this CPU, and the SError sometimes fires on the host as a
      physical SError. Goodbye, cruel world. This clearly is a HW bug,
      and it means that a guest can easily take the host down, on demand.
      
      Thankfully, we have seen systems that were just as broken in the
      past, and we have the perfect vaccine for it.
      
      Apple M1, please meet the Cavium ThunderX workaround. All your
      GIC accesses will be trapped, sanitised, and emulated. Only the
      signalling aspect of the HW will be used. It won't be super speedy,
      but it will at least be safe. You're most welcome.
      
      Given that this has only ever been seen on this single implementation,
      that the spec is unclear at best and that we cannot trust it to ever
      be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
      being set.
      Tested-by: default avatarJoey Gouly <joey.gouly@arm.com>
      Reviewed-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
      df652bcf
    • Marc Zyngier's avatar
      KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3 · 562e530f
      Marc Zyngier authored
      Until now, we always let ID_AA64PFR0_EL1.GIC reflect the value
      visible on the host, even if we were running a GICv2-enabled VM
      on a GICv3+compat host.
      
      That's fine, but we also now have the case of a host that does not
      expose ID_AA64PFR0_EL1.GIC==1 despite having a vGIC. Yes, this is
      confusing. Thank you M1.
      
      Let's go back to first principles and expose ID_AA64PFR0_EL1.GIC=1
      when a GICv3 is exposed to the guest. This also hides a GICv4.1
      CPU interface from the guest which has no business knowing about
      the v4.1 extension.
      Reviewed-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211010150910.2911495-2-maz@kernel.org
      562e530f
  2. 03 Oct, 2021 12 commits
    • Linus Torvalds's avatar
      Linux 5.15-rc4 · 9e1ff307
      Linus Torvalds authored
      9e1ff307
    • Chen Jingwen's avatar
      elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings · 9b2f72cc
      Chen Jingwen authored
      In commit b212921b ("elf: don't use MAP_FIXED_NOREPLACE for elf
      executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
      load_elf_interp.
      
      Unfortunately, this will cause kernel to fail to start with:
      
          1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already
          Failed to execute /init (error -17)
      
      The reason is that the elf interpreter (ld.so) has overlapping segments.
      
        readelf -l ld-2.31.so
        Program Headers:
          Type           Offset             VirtAddr           PhysAddr
                         FileSiz            MemSiz              Flags  Align
          LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                         0x000000000002c94c 0x000000000002c94c  R E    0x10000
          LOAD           0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
                         0x00000000000021e8 0x0000000000002320  RW     0x10000
          LOAD           0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
                         0x00000000000011ac 0x0000000000001328  RW     0x10000
      
      The reason for this problem is the same as described in commit
      ad55eac7 ("elf: enforce MAP_FIXED on overlaying elf segments").
      
      Not only executable binaries, elf interpreters (e.g. ld.so) can have
      overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
      back to MAP_FIXED in load_elf_interp.
      
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map")
      Cc: <stable@vger.kernel.org> # v4.19
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarChen Jingwen <chenjingwen6@huawei.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b2f72cc
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · ca3cef46
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Fix a number of ext4 bugs in fast_commit, inline data, and delayed
        allocation.
      
        Also fix error handling code paths in ext4_dx_readdir() and
        ext4_fill_super().
      
        Finally, avoid a grabbing a journal head in the delayed allocation
        write in the common cases where we are overwriting a pre-existing
        block or appending to an inode"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: recheck buffer uptodate bit under buffer lock
        ext4: fix potential infinite loop in ext4_dx_readdir()
        ext4: flush s_error_work before journal destroy in ext4_fill_super
        ext4: fix loff_t overflow in ext4_max_bitmap_size()
        ext4: fix reserved space counter leakage
        ext4: limit the number of blocks in one ADD_RANGE TLV
        ext4: enforce buffer head state assertion in ext4_da_map_blocks
        ext4: remove extent cache entries when truncating inline data
        ext4: drop unnecessary journal handle in delalloc write
        ext4: factor out write end code of inline file
        ext4: correct the error path of ext4_write_inline_data_end()
        ext4: check and update i_disksize properly
        ext4: add error checking to ext4_ext_replay_set_iblocks()
      ca3cef46
    • Linus Torvalds's avatar
      objtool: print out the symbol type when complaining about it · 7fab1c12
      Linus Torvalds authored
      The objtool warning that the kvm instruction emulation code triggered
      wasn't very useful:
      
          arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception
      
      in that it helpfully tells you which symbol name it had trouble figuring
      out the relocation for, but it doesn't actually say what the unknown
      symbol type was that triggered it all.
      
      In this case it was because of missing type information (type 0, aka
      STT_NOTYPE), but on the whole it really should just have printed that
      out as part of the message.
      
      Because if this warning triggers, that's very much the first thing you
      want to know - why did reloc2sec_off() return failure for that symbol?
      
      So rather than just saying you can't handle some type of symbol without
      saying what the type _was_, just print out the type number too.
      
      Fixes: 24ff6525 ("objtool: Teach get_alt_entry() about more relocation types")
      Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7fab1c12
    • Linus Torvalds's avatar
      kvm: fix objtool relocation warning · 291073a5
      Linus Torvalds authored
      The recent change to make objtool aware of more symbol relocation types
      (commit 24ff6525: "objtool: Teach get_alt_entry() about more
      relocation types") also added another check, and resulted in this
      objtool warning when building kvm on x86:
      
          arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception
      
      The reason seems to be that kvm_fastop_exception() is marked as a global
      symbol, which causes the relocation to ke kept around for objtool.  And
      at the same time, the kvm_fastop_exception definition (which is done as
      an inline asm statement) doesn't actually set the type of the global,
      which then makes objtool unhappy.
      
      The minimal fix is to just not mark kvm_fastop_exception as being a
      global symbol.  It's only used in that one compilation unit anyway, so
      it was always pointless.  That's how all the other local exception table
      labels are done.
      
      I'm not entirely happy about the kinds of games that the kvm code plays
      with doing its own exception handling, and the fact that it confused
      objtool is most definitely a symptom of the code being a bit too subtle
      and ad-hoc.  But at least this trivial one-liner makes objtool no longer
      upset about what is going on.
      
      Fixes: 24ff6525 ("objtool: Teach get_alt_entry() about more relocation types")
      Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wanpeng Li <wanpengli@tencent.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      291073a5
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 6761a0ae
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are some small misc driver fixes for 5.15-rc4. They are in two
        "groups":
      
         - ipack driver fixes for issues found by Johan Hovold
      
         - interconnect driver fixes for reported problems
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'char-misc-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        ipack: ipoctal: fix module reference leak
        ipack: ipoctal: fix missing allocation-failure check
        ipack: ipoctal: fix tty-registration error handling
        ipack: ipoctal: fix tty registration race
        ipack: ipoctal: fix stack information leak
        interconnect: qcom: sdm660: Add missing a2noc qos clocks
        dt-bindings: interconnect: sdm660: Add missing a2noc qos clocks
        interconnect: qcom: sdm660: Correct NOC_QOS_PRIORITY shift and mask
        interconnect: qcom: sdm660: Fix id of slv_cnoc_mnoc_cfg
      6761a0ae
    • Linus Torvalds's avatar
      Merge tag 'driver-core-5.15-rc4' of... · 84928ce3
      Linus Torvalds authored
      Merge tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
      
      Pull driver core fixes from Greg KH:
       "Here are some driver core and kernfs fixes for reported issues for
        5.15-rc4. These fixes include:
      
         - kernfs positive dentry bugfix
      
         - debugfs_create_file_size error path fix
      
         - cpumask sysfs file bugfix to preserve the user/kernel abi (has been
           reported multiple times.)
      
         - devlink fixes for mdiobus devices as reported by the subsystem
           maintainers.
      
        Also included in here are some devlink debugging changes to make it
        easier for people to report problems when asked. They have already
        helped with the mdiobus and other subsystems reporting issues.
      
        All of these have been linux-next for a while with no reported issues"
      
      * tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
        kernfs: also call kernfs_set_rev() for positive dentry
        driver core: Add debug logs when fwnode links are added/deleted
        driver core: Create __fwnode_link_del() helper function
        driver core: Set deferred probe reason when deferred by driver core
        net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
        driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
        driver core: fw_devlink: Improve handling of cyclic dependencies
        cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf
        debugfs: debugfs_create_file_size(): use IS_ERR to check for error
      84928ce3
    • Linus Torvalds's avatar
      Merge tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 777feaba
      Linus Torvalds authored
      Pull scheduler fixes from Borislav Petkov:
      
       - Tell the compiler to always inline is_percpu_thread()
      
       - Make sure tunable_scaling buffer is null-terminated after an update
         in sysfs
      
       - Fix LTP named regression due to cgroup list ordering
      
      * tag 'sched_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Always inline is_percpu_thread()
        sched/fair: Null terminate buffer when updating tunable_scaling
        sched/fair: Add ancestors of unthrottled undecayed cfs_rq
      777feaba
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3a399a2b
      Linus Torvalds authored
      Pull perf fixes from Borislav Petkov:
      
       - Make sure the destroy callback is reset when a event initialization
         fails
      
       - Update the event constraints for Icelake
      
       - Make sure the active time of an event is updated even for inactive
         events
      
      * tag 'perf_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: fix userpage->time_enabled of inactive events
        perf/x86/intel: Update event constraints for ICX
        perf/x86: Reset destroy callback on event init failure
      3a399a2b
    • Linus Torvalds's avatar
      Merge tag 'objtool_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 52c3c170
      Linus Torvalds authored
      Pull objtool fix from Borislav Petkov:
      
       - Handle symbol relocations properly due to changes in the toolchains
         which remove section symbols now
      
      * tag 'objtool_urgent_for_v5.15_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Teach get_alt_entry() about more relocation types
      52c3c170
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v5.15-rc4' of... · 7b66f439
      Linus Torvalds authored
      Merge tag 'hwmon-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
      
       - Fixed various potential NULL pointer accesses in w8379* drivers
      
       - Improved error handling, fault reporting, and fixed rounding in
         thmp421 driver
      
       - Fixed error handling in ltc2947 driver
      
       - Added missing attribute to pmbus/mp2975 driver
      
       - Fixed attribute values in pbus/ibm-cffps, occ, and mlxreg-fan
         drivers
      
       - Removed unused residual code from k10temp driver
      
      * tag 'hwmon-for-v5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (w83793) Fix NULL pointer dereference by removing unnecessary structure field
        hwmon: (w83792d) Fix NULL pointer dereference by removing unnecessary structure field
        hwmon: (w83791d) Fix NULL pointer dereference by removing unnecessary structure field
        hwmon: (pmbus/mp2975) Add missed POUT attribute for page 1 mp2975 controller
        hwmon: (pmbus/ibm-cffps) max_power_out swap changes
        hwmon: (occ) Fix P10 VRM temp sensors
        hwmon: (ltc2947) Properly handle errors when looking for the external clock
        hwmon: (tmp421) fix rounding for negative values
        hwmon: (tmp421) report /PVLD condition as fault
        hwmon: (tmp421) handle I2C errors
        hwmon: (mlxreg-fan) Return non-zero value when fan current state is enforced from sysfs
        hwmon: (k10temp) Remove residues of current and voltage
      7b66f439
    • Linus Torvalds's avatar
      Merge tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd · e25ca045
      Linus Torvalds authored
      Pull ksmbd server fixes from Steve French:
       "Eleven fixes for the ksmbd kernel server, mostly security related:
      
         - an important fix for disabling weak NTLMv1 authentication
      
         - seven security (improved buffer overflow checks) fixes
      
         - fix for wrong infolevel struct used in some getattr/setattr paths
      
         - two small documentation fixes"
      
      * tag '5.15-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd:
        ksmbd: missing check for NULL in convert_to_nt_pathname()
        ksmbd: fix transform header validation
        ksmbd: add buffer validation for SMB2_CREATE_CONTEXT
        ksmbd: add validation in smb2 negotiate
        ksmbd: add request buffer validation in smb2_set_info
        ksmbd: use correct basic info level in set_file_basic_info()
        ksmbd: remove NTLMv1 authentication
        ksmbd: fix documentation for 2 functions
        MAINTAINERS: rename cifs_common to smbfs_common in cifs and ksmbd entry
        ksmbd: fix invalid request buffer access in compound
        ksmbd: remove RFC1002 check in smb2 request
      e25ca045
  3. 02 Oct, 2021 12 commits
  4. 01 Oct, 2021 11 commits