1. 16 Jul, 2008 40 commits
    • Ingo Molnar's avatar
      x86, paravirt-spinlocks: fix boot hang · 34646bca
      Ingo Molnar authored
      the paravirt-spinlock patches caused a boot hang with this config:
      
       http://redhat.com/~mingo/misc/config-Wed_Jul__9_14_47_04_CEST_2008.bad
      
      i have bisected it down to:
      
      |  commit e17b58c2e85bc2ad2afc07fb8d898017c2b75ed1
      |  Author: Jeremy Fitzhardinge <jeremy@goop.org>
      |  Date:   Mon Jul 7 12:07:53 2008 -0700
      |
      |      xen: implement Xen-specific spinlocks
      
      i.e. applying that patch alone causes the hang. The hang happens in the
      ftrace self-test:
      
        initcall utsname_sysctl_init+0x0/0x19 returned 0 after 0 msecs
        calling  init_sched_switch_trace+0x0/0x4c
        Testing tracer sched_switch: PASSED
        initcall init_sched_switch_trace+0x0/0x4c returned 0 after 167 msecs
        calling  init_function_trace+0x0/0x12
        Testing tracer ftrace:
        [hard hang]
      
      it should have continued like this:
      
        Testing tracer ftrace: PASSED
        initcall init_function_trace+0x0/0x12 returned 0 after 198 msecs
        calling  init_irqsoff_tracer+0x0/0x14
        Testing tracer irqsoff: PASSED
        initcall init_irqsoff_tracer+0x0/0x14 returned 0 after 3 msecs
        calling  init_mmio_trace+0x0/0x12
        initcall init_mmio_trace+0x0/0x12 returned 0 after 0 msecs
      
      the problem is that such lowlevel primitives as spinlocks should never
      be built with -pg (which ftrace does). Marking paravirt.o as non-pg and
      marking all spinlock ops as always-inline solve the hang.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      34646bca
    • Ingo Molnar's avatar
      x86: paravirt spinlocks, modular build fix · 9af98578
      Ingo Molnar authored
      fix:
      
        MODPOST 408 modules
      ERROR: "pv_lock_ops" [net/dccp/dccp.ko] undefined!
      ERROR: "pv_lock_ops" [fs/jbd2/jbd2.ko] undefined!
      ERROR: "pv_lock_ops" [drivers/media/common/saa7146_vv.ko] undefined!
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9af98578
    • Ingo Molnar's avatar
      x86: paravirt spinlocks, !CONFIG_SMP build fixes · 4bb689ee
      Ingo Molnar authored
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4bb689ee
    • Jeremy Fitzhardinge's avatar
      xen: implement Xen-specific spinlocks · 2d9e1e2f
      Jeremy Fitzhardinge authored
      The standard ticket spinlocks are very expensive in a virtual
      environment, because their performance depends on Xen's scheduler
      giving vcpus time in the order that they're supposed to take the
      spinlock.
      
      This implements a Xen-specific spinlock, which should be much more
      efficient.
      
      The fast-path is essentially the old Linux-x86 locks, using a single
      lock byte.  The locker decrements the byte; if the result is 0, then
      they have the lock.  If the lock is negative, then locker must spin
      until the lock is positive again.
      
      When there's contention, the locker spin for 2^16[*] iterations waiting
      to get the lock.  If it fails to get the lock in that time, it adds
      itself to the contention count in the lock and blocks on a per-cpu
      event channel.
      
      When unlocking the spinlock, the locker looks to see if there's anyone
      blocked waiting for the lock by checking for a non-zero waiter count.
      If there's a waiter, it traverses the per-cpu "lock_spinners"
      variable, which contains which lock each CPU is waiting on.  It picks
      one CPU waiting on the lock and sends it an event to wake it up.
      
      This allows efficient fast-path spinlock operation, while allowing
      spinning vcpus to give up their processor time while waiting for a
      contended lock.
      
      [*] 2^16 iterations is threshold at which 98% locks have been taken
      according to Thomas Friebel's Xen Summit talk "Preventing Guests from
      Spinning Around".  Therefore, we'd expect the lock and unlock slow
      paths will only be entered 2% of the time.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <clameter@linux-foundation.org>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Virtualization <virtualization@lists.linux-foundation.org>
      Cc: Xen devel <xen-devel@lists.xensource.com>
      Cc: Thomas Friebel <thomas.friebel@amd.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2d9e1e2f
    • Jeremy Fitzhardinge's avatar
      xen: use lock-byte spinlock implementation · 56397f8d
      Jeremy Fitzhardinge authored
      Switch to using the lock-byte spinlock implementation, to avoid the
      worst of the performance hit from ticket locks.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <clameter@linux-foundation.org>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Virtualization <virtualization@lists.linux-foundation.org>
      Cc: Xen devel <xen-devel@lists.xensource.com>
      Cc: Thomas Friebel <thomas.friebel@amd.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      56397f8d
    • Jeremy Fitzhardinge's avatar
      paravirt: introduce a "lock-byte" spinlock implementation · 8efcbab6
      Jeremy Fitzhardinge authored
      Implement a version of the old spinlock algorithm, in which everyone
      spins waiting for a lock byte.  In order to be compatible with the
      ticket-lock's use of a zero initializer, this uses the convention of
      '0' for unlocked and '1' for locked.
      
      This algorithm is much better than ticket locks in a virtual
      envionment, because it doesn't interact badly with the vcpu scheduler.
      If there are multiple vcpus spinning on a lock and the lock is
      released, the next vcpu to be scheduled will take the lock, rather
      than cycling around until the next ticketed vcpu gets it.
      
      To use this, you must call paravirt_use_bytelocks() very early, before
      any spinlocks have been taken.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <clameter@linux-foundation.org>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Virtualization <virtualization@lists.linux-foundation.org>
      Cc: Xen devel <xen-devel@lists.xensource.com>
      Cc: Thomas Friebel <thomas.friebel@amd.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8efcbab6
    • Jeremy Fitzhardinge's avatar
      x86/paravirt: add hooks for spinlock operations · 74d4affd
      Jeremy Fitzhardinge authored
      Ticket spinlocks have absolutely ghastly worst-case performance
      characteristics in a virtual environment.  If there is any contention
      for physical CPUs (ie, there are more runnable vcpus than cpus), then
      ticket locks can cause the system to end up spending 90+% of its time
      spinning.
      
      The problem is that (v)cpus waiting on a ticket spinlock will be
      granted access to the lock in strict order they got their tickets.  If
      the hypervisor scheduler doesn't give the vcpus time in that order,
      they will burn timeslices waiting for the scheduler to give the right
      vcpu some time.  In the worst case it could take O(n^2) vcpu scheduler
      timeslices for everyone waiting on the lock to get it, not counting
      new cpus trying to take the lock while the log-jam is sorted out.
      
      These hooks allow a paravirt backend to replace the spinlock
      implementation.
      
      At the very least, this could revert the implementation back to the
      old lock algorithm, which allows the next scheduled vcpu to take the
      lock, and has basically fairly good performance.
      
      It also allows the spinlocks to take advantages of the hypervisor
      features to make locks more efficient (spin and block, for example).
      
      The cost to native execution is an extra direct call when using a
      spinlock function.  There's no overhead if CONFIG_PARAVIRT is turned
      off.
      
      The lock structure is fixed at a single "unsigned int", initialized to
      zero, but the spinlock implementation can use it as it wishes.
      
      Thanks to Thomas Friebel's Xen Summit talk "Preventing Guests from
      Spinning Around" for pointing out this problem.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <clameter@linux-foundation.org>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Virtualization <virtualization@lists.linux-foundation.org>
      Cc: Xen devel <xen-devel@lists.xensource.com>
      Cc: Thomas Friebel <thomas.friebel@amd.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      74d4affd
    • Jeremy Fitzhardinge's avatar
      x86_64: adjust exception frame on paranoid exceptions · 09402947
      Jeremy Fitzhardinge authored
      Exceptions using paranoidentry need to have their exception frames
      adjusted explicitly.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      09402947
    • Jeremy Fitzhardinge's avatar
      x86: xen: no need to disable vdso32 · d5303b81
      Jeremy Fitzhardinge authored
      Now that the vdso32 code can cope with both syscall and sysenter
      missing for 32-bit compat processes, just disable the features without
      disabling vdso altogether.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      d5303b81
    • Jeremy Fitzhardinge's avatar
      x86_64: further cleanup of 32-bit compat syscall mechanisms · 6a52e4b1
      Jeremy Fitzhardinge authored
      AMD only supports "syscall" from 32-bit compat usermode.
      Intel and Centaur(?) only support "sysenter" from 32-bit compat usermode.
      
      Set the X86 feature bits accordingly, and set up the vdso in
      accordance with those bits.  On the offchance we run on in a 64-bit
      environment which supports neither syscall nor sysenter from 32-bit
      mode, then fall back to the int $0x80 vdso.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      6a52e4b1
    • Ingo Molnar's avatar
      x86, xen, vdso: fix build error · 71415c6a
      Ingo Molnar authored
      fix:
      
         arch/x86/xen/built-in.o: In function `xen_enable_syscall':
         (.cpuinit.text+0xdb): undefined reference to `sysctl_vsyscall32'
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      71415c6a
    • Jeremy Fitzhardinge's avatar
      xen64: disable 32-bit syscall/sysenter if not supported. · 62541c37
      Jeremy Fitzhardinge authored
      Old versions of Xen (3.1 and before) don't support sysenter or syscall
      from 32-bit compat userspaces.  If we can't set the appropriate
      syscall callback, then disable the corresponding feature bit, which
      will cause the vdso32 setup to fall back appropriately.
      
      Linux assumes that syscall is always available to 32-bit userspace,
      and installs it by default if sysenter isn't available.  In that case,
      we just disable vdso altogether, forcing userspace libc to fall back
      to int $0x80.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      62541c37
    • Ingo Molnar's avatar
      Revert "x86_64: there's no need to preallocate level1_fixmap_pgt" · 6596f242
      Ingo Molnar authored
      This reverts commit 033786969d1d1b5af12a32a19d3a760314d05329.
      
      Suresh Siddha reported that this broke booting on his 2GB testbox.
      Reported-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6596f242
    • Ingo Molnar's avatar
      Revert "suspend, xen: enable PM_SLEEP for CONFIG_XEN" · 6717ef1a
      Ingo Molnar authored
      This reverts commit 6fbbec428c8e7bb617da2e8a589af2e97bcf3bc4.
      
      Rafael doesnt like it - it breaks various assumptions.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6717ef1a
    • Ingo Molnar's avatar
      xen64: fix build error on 32-bit + !HIGHMEM · b3fe1243
      Ingo Molnar authored
      fix:
      
      arch/x86/xen/enlighten.c: In function 'xen_set_fixmap':
      arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_BEGIN' undeclared (first use in this function)
      arch/x86/xen/enlighten.c:1127: error: (Each undeclared identifier is reported only once
      arch/x86/xen/enlighten.c:1127: error: for each function it appears in.)
      arch/x86/xen/enlighten.c:1127: error: 'FIX_KMAP_END' undeclared (first use in this function)
      make[1]: *** [arch/x86/xen/enlighten.o] Error 1
      make: *** [arch/x86/xen/enlighten.o] Error 2
      
      FIX_KMAP_BEGIN is only available on HIGHMEM.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b3fe1243
    • Ingo Molnar's avatar
      xen64: fix !HVC_XEN build dependency · 9c8a4420
      Ingo Molnar authored
      fix:
      
      arch/x86/xen/built-in.o: In function `set_page_prot':
      enlighten.c:(.text+0x111d): undefined reference to `xen_raw_printk'
      arch/x86/xen/built-in.o: In function `xen_start_kernel':
      : undefined reference to `xen_raw_console_write'
      arch/x86/xen/built-in.o: In function `xen_start_kernel':
      : undefined reference to `xen_raw_console_write'
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9c8a4420
    • Jeremy Fitzhardinge's avatar
      xen: update Kconfig to allow 64-bit Xen · 51dd660a
      Jeremy Fitzhardinge authored
      Allow Xen to be enabled on 64-bit.
      
      Also extend domain size limit from 8 GB (on 32-bit) to 32 GB on 64-bit.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      51dd660a
    • Jeremy Fitzhardinge's avatar
      xen: implement Xen write_msr operation · 1153968a
      Jeremy Fitzhardinge authored
      64-bit uses MSRs for important things like the base for fs and
      gs-prefixed addresses.  It's more efficient to use a hypercall to
      update these, rather than go via the trap and emulate path.
      
      Other MSR writes are just passed through; in an unprivileged domain
      they do nothing, but it might be useful later.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1153968a
    • Jeremy Fitzhardinge's avatar
      xen64: set up userspace syscall patch · bf18bf94
      Jeremy Fitzhardinge authored
      64-bit userspace expects the vdso to be mapped at a specific fixed
      address, which happens to be in the middle of the kernel address
      space.  Because we have split user and kernel pagetables, we need to
      make special arrangements for the vsyscall mapping to appear in the
      kernel part of the user pagetable.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bf18bf94
    • Jeremy Fitzhardinge's avatar
      xen64: set up syscall and sysenter entrypoints for 64-bit · 6fcac6d3
      Jeremy Fitzhardinge authored
      We set up entrypoints for syscall and sysenter.  sysenter is only used
      for 32-bit compat processes, whereas syscall can be used in by both 32
      and 64-bit processes.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6fcac6d3
    • Jeremy Fitzhardinge's avatar
      xen64: allocate and manage user pagetables · d6182fbf
      Jeremy Fitzhardinge authored
      Because the x86_64 architecture does not enforce segment limits, Xen
      cannot protect itself with them as it does in 32-bit mode.  Therefore,
      to protect itself, it runs the guest kernel in ring 3.  Since it also
      runs the guest userspace in ring3, the guest kernel must maintain a
      second pagetable for its userspace, which does not map kernel space.
      Naturally, the guest kernel pagetables map both kernel and userspace.
      
      The userspace pagetable is attached to the corresponding kernel
      pagetable via the pgd's page->private field.  It is allocated and
      freed at the same time as the kernel pgd via the
      paravirt_pgd_alloc/free hooks.
      
      Fortunately, the user pagetable is almost entirely shared with the
      kernel pagetable; the only difference is the pgd page itself.  set_pgd
      will populate all entries in the kernel pagetable, and also set the
      corresponding user pgd entry if the address is less than
      STACK_TOP_MAX.
      
      The user pagetable must be pinned and unpinned with the kernel one,
      but because the pagetables are aliased, pgd_walk() only needs to be
      called on the kernel pagetable.  The user pgd page is then
      pinned/unpinned along with the kernel pgd page.
      
      xen_write_cr3 must write both the kernel and user cr3s.
      
      The init_mm.pgd pagetable never has a user pagetable allocated for it,
      because it can never be used while running usermode.
      
      One awkward area is that early in boot the page structures are not
      available.  No user pagetable can exist at that point, but it
      complicates the logic to avoid looking at the page structure.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d6182fbf
    • Jeremy Fitzhardinge's avatar
      xen64: save lots of registers · c24481e9
      Jeremy Fitzhardinge authored
      The Xen hypercall interface is allowed to trash any or all of the
      argument registers, so we need to be careful that the kernel state
      isn't damaged.  On 32-bit kernels, the hypercall parameter registers
      same as a regparm function call, so we've got away without explicit
      clobbering so far.  The 64-bit ABI defines lots of caller-save
      registers, so save them all for safety.  We can trim this set later by
      re-distributing the responsibility for saving all these registers.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c24481e9
    • Jeremy Fitzhardinge's avatar
      xen64: implement 64-bit update_descriptor · c05f1cfa
      Jeremy Fitzhardinge authored
      64-bit hypercall interface can pass a maddr in one argument rather
      than splitting it.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c05f1cfa
    • Eduardo Habkost's avatar
      xen64: Clear %fs on xen_load_tls() · 8a95408e
      Eduardo Habkost authored
      We need to do this, otherwise we can get a GPF on hypercall return
      after TLS descriptor is cleared but %fs is still pointing to it.
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8a95408e
    • Jeremy Fitzhardinge's avatar
      xen64: implement failsafe callback · 4a5c3e77
      Jeremy Fitzhardinge authored
      Implement the failsafe callback, so that iret and segment register
      load exceptions are reported to the kernel.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4a5c3e77
    • Jeremy Fitzhardinge's avatar
      suspend, xen: enable PM_SLEEP for CONFIG_XEN · 0775b3db
      Jeremy Fitzhardinge authored
      Xen save/restore requires PM_SLEEP to be set without requiring
      SUSPEND or HIBERNATION.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0775b3db
    • Jeremy Fitzhardinge's avatar
      xen: make sure the kernel command line is right · b7c3c5c1
      Jeremy Fitzhardinge authored
      Point the boot params cmd_line_ptr to the domain-builder-provided
      command line.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b7c3c5c1
    • Jeremy Fitzhardinge's avatar
      xen: rework pgd_walk to deal with 32/64 bit · 5deb30d1
      Jeremy Fitzhardinge authored
      Rewrite pgd_walk to deal with 64-bit address spaces.  There are two
      notible features of 64-bit workspaces:
      
       1. The physical address is only 48 bits wide, with the upper 16 bits
          being sign extension; kernel addresses are negative, and userspace is
          positive.
      
       2. The Xen hypervisor mapping is at the negative-most address, just above
          the sign-extension hole.
      
      1. means that we can't easily use addresses when traversing the space,
      since we must deal with sign extension.  This rewrite expresses
      everything in terms of pgd/pud/pmd indices, which means we don't need
      to worry about the exact configuration of the virtual memory space.
      This approach works equally well in 32-bit.
      
      To deal with 2, assume the hole is between the uppermost userspace
      address and PAGE_OFFSET.  For 64-bit this skips the Xen mapping hole.
      For 32-bit, the hole is zero-sized.
      
      In all cases, the uppermost kernel address is FIXADDR_TOP.
      
      A side-effect of this patch is that the upper boundary is actually
      handled properly, exposing a long-standing bug in 32-bit, which failed
      to pin kernel pmd page.  The kernel pmd is not shared, and so must be
      explicitly pinned, even though the kernel ptes are shared and don't
      need pinning.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5deb30d1
    • Eduardo Habkost's avatar
      xen64: implement xen_load_gs_index() · a8fc1089
      Eduardo Habkost authored
      xen-64: implement xen_load_gs_index()
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a8fc1089
    • Eduardo Habkost's avatar
      Xen64: HYPERVISOR_set_segment_base() implementation · 45eb0d88
      Eduardo Habkost authored
      Signed-off-by: default avatarEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      45eb0d88
    • Jeremy Fitzhardinge's avatar
      xen64: add identity irq->vector map · 0725cbb9
      Jeremy Fitzhardinge authored
      The x86_64 interrupt subsystem is oriented towards vectors, as opposed
      to a flat irq space as it is in x86-32.  This patch adds a simple
      identity irq->vector mapping so that we can continue to feed irqs into
      do_IRQ() and get a good result.
      
      Ideally x86_32 will unify with the 64-bit code and use vectors too.
      At that point we can move to mapping event channels to vectors, which
      will allow us to economise on irqs (so per-cpu event channels can
      share irqs, rather than having to allocte one per cpu, for example).
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0725cbb9
    • Jeremy Fitzhardinge's avatar
      xen64: register callbacks in arch-independent way · 88459d4c
      Jeremy Fitzhardinge authored
      Use callback_op hypercall to register callbacks in a 32/64-bit
      independent way (64-bit doesn't need a code segment, but that detail
      is hidden in XEN_CALLBACK).
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      88459d4c
    • Jeremy Fitzhardinge's avatar
      xen64: add pvop for swapgs · 952d1d70
      Jeremy Fitzhardinge authored
      swapgs is a no-op under Xen, because the hypervisor makes sure the
      right version of %gs is current when switching between user and kernel
      modes.  This means that the swapgs "implementation" can be inlined and
      used when the stack is unsafe (usermode).  Unfortunately, it means
      that disabling patching will result in a non-booting kernel...
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      952d1d70
    • Jeremy Fitzhardinge's avatar
      xen64: deal with extra words Xen pushes onto exception frames · 997409d3
      Jeremy Fitzhardinge authored
      Xen pushes two extra words containing the values of rcx and r11.  This
      pvop hook copies the words back into their appropriate registers, and
      cleans them off the stack.  This leaves the stack in native form, so
      the normal handler can run unchanged.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      997409d3
    • Eduardo Habkost's avatar
      xen64: xen_write_idt_entry() and cvt_gate_to_trap() · e176d367
      Eduardo Habkost authored
      Changed to use the (to-be-)unified descriptor structs.
      Signed-off-by: default avatarEduardo Habkost <ehabkost@Rawhide-64.localdomain>
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e176d367
    • Jeremy Fitzhardinge's avatar
      xen: use set_pte_vaddr · 836fe2f2
      Jeremy Fitzhardinge authored
      Make Xen's set_pte_mfn() use set_pte_vaddr rather than copying it.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarMark McLoughlin <markmc@redhat.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      836fe2f2
    • Jeremy Fitzhardinge's avatar
      xen64: defer setting pagetable alloc/release ops · 8745f8b0
      Jeremy Fitzhardinge authored
      We need to wait until the page structure is available to use the
      proper pagetable page alloc/release operations, since they use struct
      page to determine if a pagetable is pinned.
      
      This happened to work in 32bit because nobody allocated new pagetable
      pages in the interim between xen_pagetable_setup_done and
      xen_post_allocator_init, but the 64-bit kenrel needs to allocate more
      pagetable levels.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8745f8b0
    • Jeremy Fitzhardinge's avatar
      xen: set num_processors · 4560a294
      Jeremy Fitzhardinge authored
      Someone's got to do it.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4560a294
    • Jeremy Fitzhardinge's avatar
      xen64: use arbitrary_virt_to_machine for xen_set_pmd · ce803e70
      Jeremy Fitzhardinge authored
      When building initial pagetables in 64-bit kernel the pud/pmd pointer may
      be in ioremap/fixmap space, so we need to walk the pagetable to look up the
      physical address.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ce803e70
    • Jeremy Fitzhardinge's avatar
      xen: fix truncation of machine address · ebd879e3
      Jeremy Fitzhardinge authored
      arbitrary_virt_to_machine can truncate a machine address if its above
      4G.  Cast the problem away.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ebd879e3