Commit 645d9f8c authored by Stefano Stabellini's avatar Stefano Stabellini Committed by Sasha Levin

Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"

This reverts commit 2c3fc8d2.

This commit broke on x86 PV because entries in the generic SWIOTLB are
indexed using (pseudo-)physical address not DMA address and these are
not the same in a x86 PV guest.
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>

Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"

This reverts commit 2c3fc8d2.

This commit broke on x86 PV because entries in the generic SWIOTLB are
indexed using (pseudo-)physical address not DMA address and these are
not the same in a x86 PV guest.
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>

xen: annotate xen_set_identity_and_remap_chunk() with __init

Commit 5b8e7d80 removed the __init
annotation from xen_set_identity_and_remap_chunk(). Add it again.
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: introduce helper functions to do safe read and write accesses

Introduce two helper functions to safely read and write unsigned long
values from or to memory when the access may fault because the mapping
is non-present or read-only.

These helpers can be used instead of open coded uses of __get_user()
and __put_user() avoiding the need to do casts to fix sparse warnings.

Use the helpers in page.h and p2m.c. This will fix the sparse
warnings when doing "make C=1".
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: Speed up set_phys_to_machine() by using read-only mappings

Instead of checking at each call of set_phys_to_machine() whether a
new p2m page has to be allocated due to writing an entry in a large
invalid or identity area, just map those areas read only and react
to a page fault on write by allocating the new page.

This change will make the common path with no allocation much
faster as it only requires a single write of the new mfn instead
of walking the address translation tables and checking for the
special cases.
Suggested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: switch to linear virtual mapped sparse p2m list

At start of the day the Xen hypervisor presents a contiguous mfn list
to a pv-domain. In order to support sparse memory this mfn list is
accessed via a three level p2m tree built early in the boot process.
Whenever the system needs the mfn associated with a pfn this tree is
used to find the mfn.

Instead of using a software walked tree for accessing a specific mfn
list entry this patch is creating a virtual address area for the
entire possible mfn list including memory holes. The holes are
covered by mapping a pre-defined  page consisting only of "invalid
mfn" entries. Access to a mfn entry is possible by just using the
virtual base address of the mfn list and the pfn as index into that
list. This speeds up the (hot) path of determining the mfn of a
pfn.

Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0
showed following improvements:

Elapsed time: 32:50 ->  32:35
System:       18:07 ->  17:47
User:        104:00 -> 103:30

Tested with following configurations:
- 64 bit dom0, 8GB RAM
- 64 bit dom0, 128 GB RAM, PCI-area above 4 GB
- 32 bit domU, 512 MB, 8 GB, 43 GB (more wouldn't work even without
                                    the patch)
- 32 bit domU, ballooning up and down
- 32 bit domU, save and restore
- 32 bit domU with PCI passthrough
- 64 bit domU, 8 GB, 2049 MB, 5000 MB
- 64 bit domU, ballooning up and down
- 64 bit domU, save and restore
- 64 bit domU with PCI passthrough
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: Hide get_phys_to_machine() to be able to tune common path

Today get_phys_to_machine() is always called when the mfn for a pfn
is to be obtained. Add a wrapper __pfn_to_mfn() as inline function
to be able to avoid calling get_phys_to_machine() when possible as
soon as the switch to a linear mapped p2m list has been done.
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

x86: Introduce function to get pmd entry pointer

Introduces lookup_pmd_address() to get the address of the pmd entry
related to a virtual address in the current address space. This
function is needed for support of a virtual mapped sparse p2m list
in xen pv domains, as we need the address of the pmd entry, not the
one of the pte in that case.
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: Delay invalidating extra memory

When the physical memory configuration is initialized the p2m entries
for not pouplated memory pages are set to "invalid". As those pages
are beyond the hypervisor built p2m list the p2m tree has to be
extended.

This patch delays processing the extra memory related p2m entries
during the boot process until some more basic memory management
functions are callable. This removes the need to create new p2m
entries until virtual memory management is available.
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: Delay m2p_override initialization

The m2p overrides are used to be able to find the local pfn for a
foreign mfn mapped into the domain. They are used by driver backends
having to access frontend data.

As this functionality isn't used in early boot it makes no sense to
initialize the m2p override functions very early. It can be done
later without doing any harm, removing the need for allocating memory
via extend_brk().

While at it make some m2p override functions static as they are only
used internally.
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: Delay remapping memory of pv-domain

Early in the boot process the memory layout of a pv-domain is changed
to match the E820 map (either the host one for Dom0 or the Xen one)
regarding placement of RAM and PCI holes. This requires removing memory
pages initially located at positions not suitable for RAM and adding
them later at higher addresses where no restrictions apply.

To be able to operate on the hypervisor supported p2m list until a
virtual mapped linear p2m list can be constructed, remapping must
be delayed until virtual memory management is initialized, as the
initial p2m list can't be extended unlimited at physical memory
initialization time due to it's fixed structure.

A further advantage is the reduction in complexity and code volume as
we don't have to be careful regarding memory restrictions during p2m
updates.
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: use common page allocation function in p2m.c

In arch/x86/xen/p2m.c three different allocation functions for
obtaining a memory page are used: extend_brk(), alloc_bootmem_align()
or __get_free_page().  Which of those functions is used depends on the
progress of the boot process of the system.

Introduce a common allocation routine selecting the to be called
allocation routine dynamically based on the boot progress. This allows
moving initialization steps without having to care about changing
allocation calls.
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: Make functions static

Some functions in arch/x86/xen/p2m.c are used locally only. Make them
static. Rearrange the functions in p2m.c to avoid forward declarations.
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen: fix some style issues in p2m.c

The source arch/x86/xen/p2m.c has some coding style issues. Fix them.
Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen/pci: Use APIC directly when APIC virtualization hardware is available

When hardware supports APIC/x2APIC virtualization we don't need to use
pirqs for MSI handling and instead use APIC since most APIC accesses
(MMIO or MSR) will now be processed without VMEXITs.

As an example, netperf on the original code produces this profile
(collected wih 'xentrace -e 0x0008ffff -T 5'):

    342 cpu_change
    260 CPUID
  34638 HLT
  64067 INJ_VIRQ
  28374 INTR
  82733 INTR_WINDOW
     10 NPF
  24337 TRAP
 370610 vlapic_accept_pic_intr
 307528 VMENTRY
 307527 VMEXIT
 140998 VMMCALL
    127 wrap_buffer

After applying this patch the same test shows

    230 cpu_change
    260 CPUID
  36542 HLT
    174 INJ_VIRQ
  27250 INTR
    222 INTR_WINDOW
     20 NPF
  24999 TRAP
 381812 vlapic_accept_pic_intr
 166480 VMENTRY
 166479 VMEXIT
  77208 VMMCALL
     81 wrap_buffer

ApacheBench results (ab -n 10000 -c 200) improve by about 10%
Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen/pci: Defer initialization of MSI ops on HVM guests

If the hardware supports APIC virtualization we may decide not to use
pirqs and instead use APIC/x2APIC directly, meaning that we don't want
to set x86_msi.setup_msi_irqs and x86_msi.teardown_msi_irq to
Xen-specific routines.  However, x2APIC is not set up by the time
pci_xen_hvm_init() is called so we need to postpone setting these ops
until later, when we know which APIC mode is used.

(Note that currently x2APIC is never initialized on HVM guests. This
may change in the future)
Signed-off-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen-pciback: drop SR-IOV VFs when PF driver unloads

When a PF driver unloads, it may find it necessary to leave the VFs
around simply because of pciback having marked them as assigned to a
guest. Utilize a suitable notification to let go of the VFs, thus
allowing the PF to go back into the state it was before its driver
loaded (which in particular allows the driver to be loaded again with
it being able to create the VFs anew, but which also allows to then
pass through the PF instead of the VFs).

Don't do this however for any VFs currently in active use by a guest.
Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
[v2: Removed the switch statement, moved it about]
[v3: Redid it a bit differently]
Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen/pciback: Restore configuration space when detaching from a guest.

The commit "xen/pciback: Don't deadlock when unbinding." was using
the version of pci_reset_function which would lock the device lock.
That is no good as we can dead-lock. As such we swapped to using
the lock-less version and requiring that the callers
of 'pcistub_put_pci_dev' take the device lock. And as such
this bug got exposed.

Using the lock-less version is  OK, except that we tried to
use 'pci_restore_state' after the lock-less version of
__pci_reset_function_locked - which won't work as 'state_saved'
is set to false. Said 'state_saved' is a toggle boolean that
is to be used by the sequence of a) pci_save_state/pci_restore_state
or b) pci_load_and_free_saved_state/pci_restore_state. We don't
want to use a) as the guest might have messed up the PCI
configuration space and we want it to revert to the state
when the PCI device was binded to us. Therefore we pick
b) to restore the configuration space.

We restore from our 'golden' version of PCI configuration space, when an:
 - Device is unbinded from pciback
 - Device is detached from a guest.
Reported-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

PCI: Expose pci_load_saved_state for public consumption.

We have the pci_load_and_free_saved_state, and pci_store_saved_state
but are missing the functionality to just load the state
multiple times in the PCI device without having to free/save
the state.

This patch makes it possible to use this function.
Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: default avatarBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen/pciback: Remove tons of dereferences

A little cleanup. No functional difference.
Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen/pciback: Print out the domain owning the device.

We had been printing it only if the device was built with
debug enabled. But this information is useful in the field
to troubleshoot.
Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen/pciback: Include the domain id if removing the device whilst still in use

Cleanup the function a bit - also include the id of the
domain that is using the device.
Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

driver core: Provide an wrapper around the mutex to do lockdep warnings

Instead of open-coding it in drivers that want to double check
that their functions are indeed holding the device lock.
Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Suggested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

xen/pciback: Don't deadlock when unbinding.

As commit 0a9fd015
'xen/pciback: Document the entry points for 'pcistub_put_pci_dev''
explained there are four entry points in this function.
Two of them are when the user fiddles in the SysFS to
unbind a device which might be in use by a guest or not.

Both 'unbind' states will cause a deadlock as the the PCI lock has
already been taken, which then pci_device_reset tries to take.

We can simplify this by requiring that all callers of
pcistub_put_pci_dev MUST hold the device lock. And then
we can just call the lockless version of pci_device_reset.

To make it even simpler we will modify xen_pcibk_release_pci_dev
to quality whether it should take a lock or not - as it ends
up calling xen_pcibk_release_pci_dev and needs to hold the lock.
Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>

swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single

Need to pass the pointer within the swiotlb internal buffer to the
swiotlb library, that in the case of xen_unmap_single is dev_addr, not
paddr.
Signed-off-by: default avatarStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@vger.kernel.org

(cherry picked from commit dbdd7476
4ef8e3f3
2c3fc8d2)
Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
parent 17099e5f
......@@ -388,7 +388,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
/* NOTE: We use dev_addr here, not paddr! */
if (is_xen_swiotlb_buffer(dev_addr)) {
swiotlb_tbl_unmap_single(hwdev, paddr, size, dir);
swiotlb_tbl_unmap_single(hwdev, dev_addr, size, dir);
return;
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment