1. 17 Oct, 2021 6 commits
    • Marc Zyngier's avatar
      Merge branch kvm-arm64/vgic-fixes-5.16 into kvmarm-master/next · 20a30430
      Marc Zyngier authored
      * kvm-arm64/vgic-fixes-5.16:
        : .
        : Multiple updates to the GICv3 emulation in order to better support
        : the dreadful Apple M1 that only implements half of it, and in a
        : broken way...
        : .
        KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode
        KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS
        KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible
        KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors
        KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      20a30430
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v3: Align emulated cpuif LPI state machine with the pseudocode · 9d449c71
      Marc Zyngier authored
      Having realised that a virtual LPI does transition through an active
      state that does not exist on bare metal, align the CPU interface
      emulation with the behaviour specified in the architecture pseudocode.
      
      The LPIs now transition to active on IAR read, and to inactive on
      EOI write. Special care is taken not to increment the EOIcount for
      an LPI that isn't present in the LRs.
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211010150910.2911495-6-maz@kernel.org
      9d449c71
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v3: Don't advertise ICC_CTLR_EL1.SEIS · f87ab682
      Marc Zyngier authored
      Since we are trapping all sysreg accesses when ICH_VTR_EL2.SEIS
      is set, and that we never deliver an SError when emulating
      any of the GICv3 sysregs, don't advertise ICC_CTLR_EL1.SEIS.
      Reviewed-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211010150910.2911495-5-maz@kernel.org
      f87ab682
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v3: Reduce common group trapping to ICV_DIR_EL1 when possible · 0924729b
      Marc Zyngier authored
      On systems that advertise ICH_VTR_EL2.SEIS, we trap all GICv3 sysreg
      accesses from the guest. From a performance perspective, this is OK
      as long as the guest doesn't hammer the GICv3 CPU interface.
      
      In most cases, this is fine, unless the guest actively uses
      priorities and switches PMR_EL1 very often. Which is exactly what
      happens when a Linux guest runs with irqchip.gicv3_pseudo_nmi=1.
      In these condition, the performance plumets as we hit PMR each time
      we mask/unmask interrupts. Not good.
      
      There is however an opportunity for improvement. Careful reading
      of the architecture specification indicates that the only GICv3
      sysreg belonging to the common group (which contains the SGI
      registers, PMR, DIR, CTLR and RPR) that is allowed to generate
      a SError is DIR. Everything else is safe.
      
      It is thus possible to substitute the trapping of all the common
      group with just that of DIR if it supported by the implementation.
      Yes, that's yet another optional bit of the architecture.
      So let's just do that, as it leads to some impressive result on
      the M1:
      
      Without this change:
      	bash-5.1# /host/home/maz/hackbench 100 process 1000
      	Running with 100*40 (== 4000) tasks.
      	Time: 56.596
      
      With this change:
      	bash-5.1# /host/home/maz/hackbench 100 process 1000
      	Running with 100*40 (== 4000) tasks.
      	Time: 8.649
      
      which is a pretty convincing result.
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Reviewed-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Link: https://lore.kernel.org/r/20211010150910.2911495-4-maz@kernel.org
      0924729b
    • Marc Zyngier's avatar
      KVM: arm64: vgic-v3: Work around GICv3 locally generated SErrors · df652bcf
      Marc Zyngier authored
      The infamous M1 has a feature nobody else ever implemented,
      in the form of the "GIC locally generated SError interrupts",
      also known as SEIS for short.
      
      These SErrors are generated when a guest does something that violates
      the GIC state machine. It would have been simpler to just *ignore*
      the damned thing, but that's not what this HW does. Oh well.
      
      This part of of the architecture is also amazingly under-specified.
      There is a whole 10 lines that describe the feature in a spec that
      is 930 pages long, and some of these lines are factually wrong.
      Oh, and it is deprecated, so the insentive to clarify it is low.
      
      Now, the spec says that this should be a *virtual* SError when
      HCR_EL2.AMO is set. As it turns out, that's not always the case
      on this CPU, and the SError sometimes fires on the host as a
      physical SError. Goodbye, cruel world. This clearly is a HW bug,
      and it means that a guest can easily take the host down, on demand.
      
      Thankfully, we have seen systems that were just as broken in the
      past, and we have the perfect vaccine for it.
      
      Apple M1, please meet the Cavium ThunderX workaround. All your
      GIC accesses will be trapped, sanitised, and emulated. Only the
      signalling aspect of the HW will be used. It won't be super speedy,
      but it will at least be safe. You're most welcome.
      
      Given that this has only ever been seen on this single implementation,
      that the spec is unclear at best and that we cannot trust it to ever
      be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS
      being set.
      Tested-by: default avatarJoey Gouly <joey.gouly@arm.com>
      Reviewed-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211010150910.2911495-3-maz@kernel.org
      df652bcf
    • Marc Zyngier's avatar
      KVM: arm64: Force ID_AA64PFR0_EL1.GIC=1 when exposing a virtual GICv3 · 562e530f
      Marc Zyngier authored
      Until now, we always let ID_AA64PFR0_EL1.GIC reflect the value
      visible on the host, even if we were running a GICv2-enabled VM
      on a GICv3+compat host.
      
      That's fine, but we also now have the case of a host that does not
      expose ID_AA64PFR0_EL1.GIC==1 despite having a vGIC. Yes, this is
      confusing. Thank you M1.
      
      Let's go back to first principles and expose ID_AA64PFR0_EL1.GIC=1
      when a GICv3 is exposed to the guest. This also hides a GICv4.1
      CPU interface from the guest which has no business knowing about
      the v4.1 extension.
      Reviewed-by: default avatarAlexandru Elisei <alexandru.elisei@arm.com>
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211010150910.2911495-2-maz@kernel.org
      562e530f
  2. 12 Oct, 2021 2 commits
  3. 11 Oct, 2021 29 commits
  4. 03 Oct, 2021 3 commits
    • Linus Torvalds's avatar
      Linux 5.15-rc4 · 9e1ff307
      Linus Torvalds authored
      9e1ff307
    • Chen Jingwen's avatar
      elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings · 9b2f72cc
      Chen Jingwen authored
      In commit b212921b ("elf: don't use MAP_FIXED_NOREPLACE for elf
      executable mappings") we still leave MAP_FIXED_NOREPLACE in place for
      load_elf_interp.
      
      Unfortunately, this will cause kernel to fail to start with:
      
          1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already
          Failed to execute /init (error -17)
      
      The reason is that the elf interpreter (ld.so) has overlapping segments.
      
        readelf -l ld-2.31.so
        Program Headers:
          Type           Offset             VirtAddr           PhysAddr
                         FileSiz            MemSiz              Flags  Align
          LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                         0x000000000002c94c 0x000000000002c94c  R E    0x10000
          LOAD           0x000000000002dae0 0x000000000003dae0 0x000000000003dae0
                         0x00000000000021e8 0x0000000000002320  RW     0x10000
          LOAD           0x000000000002fe00 0x000000000003fe00 0x000000000003fe00
                         0x00000000000011ac 0x0000000000001328  RW     0x10000
      
      The reason for this problem is the same as described in commit
      ad55eac7 ("elf: enforce MAP_FIXED on overlaying elf segments").
      
      Not only executable binaries, elf interpreters (e.g. ld.so) can have
      overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go
      back to MAP_FIXED in load_elf_interp.
      
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map")
      Cc: <stable@vger.kernel.org> # v4.19
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarChen Jingwen <chenjingwen6@huawei.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b2f72cc
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · ca3cef46
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Fix a number of ext4 bugs in fast_commit, inline data, and delayed
        allocation.
      
        Also fix error handling code paths in ext4_dx_readdir() and
        ext4_fill_super().
      
        Finally, avoid a grabbing a journal head in the delayed allocation
        write in the common cases where we are overwriting a pre-existing
        block or appending to an inode"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: recheck buffer uptodate bit under buffer lock
        ext4: fix potential infinite loop in ext4_dx_readdir()
        ext4: flush s_error_work before journal destroy in ext4_fill_super
        ext4: fix loff_t overflow in ext4_max_bitmap_size()
        ext4: fix reserved space counter leakage
        ext4: limit the number of blocks in one ADD_RANGE TLV
        ext4: enforce buffer head state assertion in ext4_da_map_blocks
        ext4: remove extent cache entries when truncating inline data
        ext4: drop unnecessary journal handle in delalloc write
        ext4: factor out write end code of inline file
        ext4: correct the error path of ext4_write_inline_data_end()
        ext4: check and update i_disksize properly
        ext4: add error checking to ext4_ext_replay_set_iblocks()
      ca3cef46