1. 16 Mar, 2016 9 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit · 1c463a39
      Paul Mackerras authored
      commit ccec4456 upstream.
      
      Thomas Huth discovered that a guest could cause a hard hang of a
      host CPU by setting the Instruction Authority Mask Register (IAMR)
      to a suitable value.  It turns out that this is because when the
      code was added to context-switch the new special-purpose registers
      (SPRs) that were added in POWER8, we forgot to add code to ensure
      that they were restored to a sane value on guest exit.
      
      This adds code to set those registers where a bad value could
      compromise the execution of the host kernel to a suitable neutral
      value on guest exit.
      
      Fixes: b005255eReported-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c463a39
    • David Hildenbrand's avatar
      KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS · 78939530
      David Hildenbrand authored
      commit 9522b37f upstream.
      
      With MACHINE_HAS_VX, we convert the floating point registers from the
      vector registeres when storing the status. For other VCPUs, these are
      stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs,
      which resolves to the currently loaded VCPU.
      
      So kvm_s390_store_status_unloaded() currently writes the wrong floating
      point registers (converted from the vector registers) when called from
      another VCPU on a z13.
      
      This is only the case for old user space not handling SIGP STORE STATUS and
      SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All
      other calls come from the loaded VCPU via kvm_s390_store_status().
      
      Fixes: 9abc2a08 (KVM: s390: fix memory overwrites when vx is disabled)
      Reviewed-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78939530
    • Radim Krčmář's avatar
      KVM: VMX: disable PEBS before a guest entry · 0bbe5fa4
      Radim Krčmář authored
      commit 7099e2e1 upstream.
      
      Linux guests on Haswell (and also SandyBridge and Broadwell, at least)
      would crash if you decided to run a host command that uses PEBS, like
        perf record -e 'cpu/mem-stores/pp' -a
      
      This happens because KVM is using VMX MSR switching to disable PEBS, but
      SDM [2015-12] 18.4.4.4 Re-configuring PEBS Facilities explains why it
      isn't safe:
        When software needs to reconfigure PEBS facilities, it should allow a
        quiescent period between stopping the prior event counting and setting
        up a new PEBS event. The quiescent period is to allow any latent
        residual PEBS records to complete its capture at their previously
        specified buffer address (provided by IA32_DS_AREA).
      
      There might not be a quiescent period after the MSR switch, so a CPU
      ends up using host's MSR_IA32_DS_AREA to access an area in guest's
      memory.  (Or MSR switching is just buggy on some models.)
      
      The guest can learn something about the host this way:
      If the guest doesn't map address pointed by MSR_IA32_DS_AREA, it results
      in #PF where we leak host's MSR_IA32_DS_AREA through CR2.
      
      After that, a malicious guest can map and configure memory where
      MSR_IA32_DS_AREA is pointing and can therefore get an output from
      host's tracing.
      
      This is not a critical leak as the host must initiate with PEBS tracing
      and I have not been able to get a record from more than one instruction
      before vmentry in vmx_vcpu_run() (that place has most registers already
      overwritten with guest's).
      
      We could disable PEBS just few instructions before vmentry, but
      disabling it earlier shouldn't affect host tracing too much.
      We also don't need to switch MSR_IA32_PEBS_ENABLE on VMENTRY, but that
      optimization isn't worth its code, IMO.
      
      (If you are implementing PEBS for guests, be sure to handle the case
       where both host and guest enable PEBS, because this patch doesn't.)
      
      Fixes: 26a4f3c0 ("perf/x86: disable PEBS on a guest entry.")
      Reported-by: default avatarJiří Olša <jolsa@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0bbe5fa4
    • David Matlack's avatar
      kvm: cap halt polling at exactly halt_poll_ns · c9e1bbef
      David Matlack authored
      commit 313f636d upstream.
      
      When growing halt-polling, there is no check that the poll time exceeds
      the limit. It's possible for vcpu->halt_poll_ns grow once past
      halt_poll_ns, and stay there until a halt which takes longer than
      vcpu->halt_poll_ns. For example, booting a Linux guest with
      halt_poll_ns=11000:
      
       ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 0 (shrink 10000)
       ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 10000 (grow 0)
       ... kvm:kvm_halt_poll_ns: vcpu 0: halt_poll_ns 20000 (grow 10000)
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Fixes: aca6ff29Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9e1bbef
    • Krzysztof Hałasa's avatar
      PCI: Allow a NULL "parent" pointer in pci_bus_assign_domain_nr() · 431c9f01
      Krzysztof Hałasa authored
      commit 54c6e2dd upstream.
      
      pci_create_root_bus() passes a "parent" pointer to
      pci_bus_assign_domain_nr().  When CONFIG_PCI_DOMAINS_GENERIC is defined,
      pci_bus_assign_domain_nr() dereferences that pointer.  Many callers of
      pci_create_root_bus() supply a NULL "parent" pointer, which leads to a NULL
      pointer dereference error.
      
      7c674700 ("PCI: Move domain assignment from arm64 to generic code")
      moved the "parent" dereference from arm64 to generic code.  Only arm64 used
      that code (because only arm64 defined CONFIG_PCI_DOMAINS_GENERIC), and it
      always supplied a valid "parent" pointer.  Other arches supplied NULL
      "parent" pointers but didn't defined CONFIG_PCI_DOMAINS_GENERIC, so they
      used a no-op version of pci_bus_assign_domain_nr().
      
      8c7d1474 ("ARM/PCI: Move to generic PCI domains") defined
      CONFIG_PCI_DOMAINS_GENERIC on ARM, and many ARM platforms use
      pci_common_init(), which supplies a NULL "parent" pointer.
      These platforms (cns3xxx, dove, footbridge, iop13xx, etc.) crash
      with a NULL pointer dereference like this while probing PCI:
      
        Unable to handle kernel NULL pointer dereference at virtual address 000000a4
        PC is at pci_bus_assign_domain_nr+0x10/0x84
        LR is at pci_create_root_bus+0x48/0x2e4
        Kernel panic - not syncing: Attempted to kill init!
      
      [bhelgaas: changelog, add "Reported:" and "Fixes:" tags]
      Reported: http://forum.doozan.com/read.php?2,17868,22070,quote=1
      Fixes: 8c7d1474 ("ARM/PCI: Move to generic PCI domains")
      Fixes: 7c674700 ("PCI: Move domain assignment from arm64 to generic code")
      Signed-off-by: default avatarKrzysztof Hałasa <khalasa@piap.pl>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Acked-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      431c9f01
    • Lokesh Vutla's avatar
      ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property · 6327a31a
      Lokesh Vutla authored
      commit 2e18f5a1 upstream.
      
      Introduce a dt property, ti,no-idle, that prevents an IP to idle at any
      point. This is to handle Errata i877, which tells that GMAC clocks
      cannot be disabled.
      Acked-by: default avatarRoger Quadros <rogerq@ti.com>
      Tested-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Signed-off-by: default avatarLokesh Vutla <lokeshvutla@ti.com>
      Signed-off-by: default avatarSekhar Nori <nsekhar@ti.com>
      Signed-off-by: default avatarDave Gerlach <d-gerlach@ti.com>
      Acked-by: default avatarRob Herring <robh@kernel.org>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6327a31a
    • Mugunthan V N's avatar
      ARM: dts: dra7: do not gate cpsw clock due to errata i877 · 958df498
      Mugunthan V N authored
      commit 0f514e69 upstream.
      
      Errata id: i877
      
      Description:
      ------------
      The RGMII 1000 Mbps Transmit timing is based on the output clock
      (rgmiin_txc) being driven relative to the rising edge of an internal
      clock and the output control/data (rgmiin_txctl/txd) being driven relative
      to the falling edge of an internal clock source. If the internal clock
      source is allowed to be static low (i.e., disabled) for an extended period
      of time then when the clock is actually enabled the timing delta between
      the rising edge and falling edge can change over the lifetime of the
      device. This can result in the device switching characteristics degrading
      over time, and eventually failing to meet the Data Manual Delay Time/Skew
      specs.
      To maintain RGMII 1000 Mbps IO Timings, SW should minimize the
      duration that the Ethernet internal clock source is disabled. Note that
      the device reset state for the Ethernet clock is "disabled".
      Other RGMII modes (10 Mbps, 100Mbps) are not affected
      
      Workaround:
      -----------
      If the SoC Ethernet interface(s) are used in RGMII mode at 1000 Mbps,
      SW should minimize the time the Ethernet internal clock source is disabled
      to a maximum of 200 hours in a device life cycle. This is done by enabling
      the clock as early as possible in IPL (QNX) or SPL/u-boot (Linux/Android)
      by setting the register CM_GMAC_CLKSTCTRL[1:0]CLKTRCTRL = 0x2:SW_WKUP.
      
      So, do not allow to gate the cpsw clocks using ti,no-idle property in
      cpsw node assuming 1000 Mbps is being used all the time. If someone does
      not need 1000 Mbps and wants to gate clocks to cpsw, this property needs
      to be deleted in their respective board files.
      Signed-off-by: default avatarMugunthan V N <mugunthanvnm@ti.com>
      Signed-off-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarLokesh Vutla <lokeshvutla@ti.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      958df498
    • Thomas Petazzoni's avatar
      ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window · 744744e2
      Thomas Petazzoni authored
      commit d7d5a43c upstream.
      
      When the Crypto SRAM mappings were added to the Device Tree files
      describing the Armada XP boards in commit c466d997 ("ARM: mvebu:
      define crypto SRAM ranges for all armada-xp boards"), the fact that
      those mappings were overlaping with the PCIe memory aperture was
      overlooked. Due to this, we currently have for all Armada XP platforms
      a situation that looks like this:
      
      Memory mapping on Armada XP boards with internal registers at
      0xf1000000:
      
       - 0x00000000 -> 0xf0000000	3.75G 	RAM
       - 0xf0000000 -> 0xf1000000	16M	NOR flashes (AXP GP / AXP DB)
       - 0xf1000000 -> 0xf1100000	1M	internal registers
       - 0xf8000000 -> 0xffe0000	126M	PCIe memory aperture
       - 0xf8100000 -> 0xf8110000	64KB	Crypto SRAM #0	=> OVERLAPS WITH PCIE !
       - 0xf8110000 -> 0xf8120000	64KB	Crypto SRAM #1	=> OVERLAPS WITH PCIE !
       - 0xffe00000 -> 0xfff00000	1M	PCIe I/O aperture
       - 0xfff0000  -> 0xffffffff	1M	BootROM
      
      The overlap means that when PCIe devices are added, depending on their
      memory window needs, they might or might not be mapped into the
      physical address space. Indeed, they will not be mapped if the area
      allocated in the PCIe memory aperture by the PCI core overlaps with
      one of the Crypto SRAM. Typically, a Intel IGB PCIe NIC that needs 8MB
      of PCIe memory will see its PCIe memory window allocated from
      0xf80000000 for 8MB, which overlaps with the Crypto SRAM windows. Due
      to this, the PCIe window is not created, and any attempt to access the
      PCIe window makes the kernel explode:
      
      [    3.302213] igb: Copyright (c) 2007-2014 Intel Corporation.
      [    3.307841] pci 0000:00:09.0: enabling device (0140 -> 0143)
      [    3.313539] mvebu_mbus: cannot add window '4:f8', conflicts with another window
      [    3.320870] mvebu-pcie soc:pcie-controller: Could not create MBus window at [mem 0xf8000000-0xf87fffff]: -22
      [    3.330811] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf08c0018
      
      This problem does not occur on Armada 370 boards, because we use the
      following memory mapping (for boards that have internal registers at
      0xf1000000):
      
       - 0x00000000 -> 0xf0000000	3.75G 	RAM
       - 0xf0000000 -> 0xf1000000	16M	NOR flashes (AXP GP / AXP DB)
       - 0xf1000000 -> 0xf1100000	1M	internal registers
       - 0xf1100000 -> 0xf1110000	64KB	Crypto SRAM #0 => OK !
       - 0xf8000000 -> 0xffe0000	126M	PCIe memory
       - 0xffe00000 -> 0xfff00000	1M	PCIe I/O
       - 0xfff0000  -> 0xffffffff	1M	BootROM
      
      Obviously, the solution is to align the location of the Crypto SRAM
      mappings of Armada XP to be similar with the ones on Armada 370, i.e
      have them between the "internal registers" area and the beginning of
      the PCIe aperture.
      
      However, we have a special case with the OpenBlocks AX3-4 platform,
      which has a 128 MB NOR flash. Currently, this NOR flash is mapped from
      0xf0000000 to 0xf8000000. This is possible because on OpenBlocks
      AX3-4, the internal registers are not at 0xf1000000. And this explains
      why the Crypto SRAM mappings were not configured at the same place on
      Armada XP.
      
      Hence, the solution is two-fold:
      
       (1) Move the NOR flash mapping on Armada XP OpenBlocks AX3-4 from
           0xe8000000 to 0xf0000000. This frees the 0xf0000000 ->
           0xf80000000 space.
      
       (2) Move the Crypto SRAM mappings on Armada XP to be similar to
           Armada 370 (except of course that Armada XP has two Crypto SRAM
           and not one).
      
      After this patch, the memory mapping on Armada XP boards with
      registers at 0xf1 is:
      
       - 0x00000000 -> 0xf0000000	3.75G 	RAM
       - 0xf0000000 -> 0xf1000000	16M	NOR flashes (AXP GP / AXP DB)
       - 0xf1000000 -> 0xf1100000	1M	internal registers
       - 0xf1100000 -> 0xf1110000	64KB	Crypto SRAM #0
       - 0xf1110000 -> 0xf1120000	64KB	Crypto SRAM #1
       - 0xf8000000 -> 0xffe0000	126M	PCIe memory
       - 0xffe00000 -> 0xfff00000	1M	PCIe I/O
       - 0xfff0000  -> 0xffffffff	1M	BootROM
      
      And the memory mapping for the special case of the OpenBlocks AX3-4
      (internal registers at 0xd0000000, NOR of 128 MB):
      
       - 0x00000000 -> 0xc0000000	3G 	RAM
       - 0xd0000000 -> 0xd1000000	1M	internal registers
       - 0xe800000  -> 0xf0000000	128M	NOR flash
       - 0xf1100000 -> 0xf1110000	64KB	Crypto SRAM #0
       - 0xf1110000 -> 0xf1120000	64KB	Crypto SRAM #1
       - 0xf8000000 -> 0xffe0000	126M	PCIe memory
       - 0xffe00000 -> 0xfff00000	1M	PCIe I/O
       - 0xfff0000  -> 0xffffffff	1M	BootROM
      
      Fixes: c466d997 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards")
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Cc: Phil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Acked-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      744744e2
    • Ard Biesheuvel's avatar
      arm64: account for sparsemem section alignment when choosing vmemmap offset · 97142f30
      Ard Biesheuvel authored
      commit 36e5cd6b upstream.
      
      Commit dfd55ad8 ("arm64: vmemmap: use virtual projection of linear
      region") fixed an issue where the struct page array would overflow into the
      adjacent virtual memory region if system RAM was placed so high up in
      physical memory that its addresses were not representable in the build time
      configured virtual address size.
      
      However, the fix failed to take into account that the vmemmap region needs
      to be relatively aligned with respect to the sparsemem section size, so that
      a sequence of page structs corresponding with a sparsemem section in the
      linear region appears naturally aligned in the vmemmap region.
      
      So round up vmemmap to sparsemem section size. Since this essentially moves
      the projection of the linear region up in memory, also revert the reduction
      of the size of the vmemmap region.
      
      Fixes: dfd55ad8 ("arm64: vmemmap: use virtual projection of linear region")
      Tested-by: default avatarMark Langsdorf <mlangsdo@redhat.com>
      Tested-by: default avatarDavid Daney <david.daney@cavium.com>
      Tested-by: default avatarRobert Richter <rrichter@cavium.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97142f30
  2. 09 Mar, 2016 31 commits