1. 17 Feb, 2009 1 commit
    • Paul E. McKenney's avatar
      x86, rcu: fix strange load average and ksoftirqd behavior · bf51935f
      Paul E. McKenney authored
      Damien Wyart reported high ksoftirqd CPU usage (20%) on an
      otherwise idle system.
      
      The function-graph trace Damien provided:
      
      >   799.521187 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521371 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521555 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521738 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521934 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522068 |   1)  ksoftir-2324  |               |                rcu_check_callbacks() {
      >   799.522208 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522392 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522575 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522759 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522956 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523074 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.523214 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523397 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523579 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523762 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523960 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524079 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.524220 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524403 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524587 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524770 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      > [ . . . ]
      
      Shows rcu_check_callbacks() being invoked way too often. It should be called
      once per jiffy, and here it is called no less than 22 times in about
      3.5 milliseconds, meaning one call every 160 microseconds or so.
      
      Why do we need to call rcu_pending() and rcu_check_callbacks() from the
      idle loop of 32-bit x86, especially given that no other architecture does
      this?
      
      The following patch removes the call to rcu_pending() and
      rcu_check_callbacks() from the x86 32-bit idle loop in order to
      reduce the softirq load on idle systems.
      Reported-by: default avatarDamien Wyart <damien.wyart@free.fr>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bf51935f
  2. 15 Feb, 2009 1 commit
    • Thomas Gleixner's avatar
      x86, vm86: fix preemption bug · be716615
      Thomas Gleixner authored
      Commit 3d2a71a5 ("x86, traps: converge
      do_debug handlers") changed the preemption disable logic of do_debug()
      so vm86_handle_trap() is called with preemption disabled resulting in:
      
       BUG: sleeping function called from invalid context at include/linux/kernel.h:155
       in_atomic(): 1, irqs_disabled(): 0, pid: 3005, name: dosemu.bin
       Pid: 3005, comm: dosemu.bin Tainted: G        W  2.6.29-rc1 #51
       Call Trace:
        [<c050d669>] copy_to_user+0x33/0x108
        [<c04181f4>] save_v86_state+0x65/0x149
        [<c0418531>] handle_vm86_trap+0x20/0x8f
        [<c064e345>] do_debug+0x15b/0x1a4
        [<c064df1f>] debug_stack_correct+0x27/0x2c
        [<c040365b>] sysenter_do_call+0x12/0x2f
       BUG: scheduling while atomic: dosemu.bin/3005/0x10000001
      
      Restore the original calling convention and reenable preemption before
      calling handle_vm86_trap().
      Reported-by: default avatarMichal Suchanek <hramrach@centrum.cz>
      Cc: stable@kernel.org
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      be716615
  3. 14 Feb, 2009 1 commit
  4. 13 Feb, 2009 1 commit
    • john stultz's avatar
      x86, hpet: fix for LS21 + HPET = boot hang · b13e2464
      john stultz authored
      Between 2.6.23 and 2.6.24-rc1 a change was made that broke IBM LS21
      systems that had the HPET enabled in the BIOS, resulting in boot hangs
      for x86_64.
      
      Specifically commit b8ce3359, which
      merges the i386 and x86_64 HPET code.
      
      Prior to this commit, when we setup the HPET timers in x86_64, we did
      the following:
      
      	hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
                          HPET_TN_32BIT, HPET_T0_CFG);
      
      However after the i386/x86_64 HPET merge, we do the following:
      
      	cfg = hpet_readl(HPET_Tn_CFG(timer));
      	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC |
      			HPET_TN_SETVAL | HPET_TN_32BIT;
      	hpet_writel(cfg, HPET_Tn_CFG(timer));
      
      However on LS21s with HPET enabled in the BIOS, the HPET_T0_CFG register
      boots with Level triggered interrupts (HPET_TN_LEVEL) enabled. This
      causes the periodic interrupt to be not so periodic, and that results in
      the boot time hang I reported earlier in the delay calibration.
      
      My fix: Always disable HPET_TN_LEVEL when setting up periodic mode.
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b13e2464
  5. 12 Feb, 2009 5 commits
  6. 11 Feb, 2009 2 commits
    • Markus Metzger's avatar
      x86, ptrace, mm: fix double-free on race · 9f339e70
      Markus Metzger authored
      Ptrace_detach() races with __ptrace_unlink() if the traced task is
      reaped while detaching. This might cause a double-free of the BTS
      buffer.
      
      Change the ptrace_detach() path to only do the memory accounting in
      ptrace_bts_detach() and leave the buffer free to ptrace_bts_untrace()
      which will be called from __ptrace_unlink().
      
      The fix follows a proposal from Oleg Nesterov.
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9f339e70
    • Oleg Nesterov's avatar
      ptrace, x86: fix the usage of ptrace_fork() · 06eb23b1
      Oleg Nesterov authored
      I noticed by pure accident we have ptrace_fork() and friends. This was
      added by "x86, bts: add fork and exit handling", commit
      bf53de90.
      
      I can't test this, ds_request_bts() returns -EOPNOTSUPP, but I strongly
      believe this needs the fix. I think something like this program
      
      	int main(void)
      	{
      		int pid = fork();
      
      		if (!pid) {
      			ptrace(PTRACE_TRACEME, 0, NULL, NULL);
      			kill(getpid(), SIGSTOP);
      			fork();
      		} else {
      			struct ptrace_bts_config bts = {
      				.flags = PTRACE_BTS_O_ALLOC,
      				.size  = 4 * 4096,
      			};
      
      			wait(NULL);
      
      			ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACEFORK);
      			ptrace(PTRACE_BTS_CONFIG, pid, &bts, sizeof(bts));
      			ptrace(PTRACE_CONT, pid, NULL, NULL);
      
      			sleep(1);
      		}
      
      		return 0;
      	}
      
      should crash the kernel.
      
      If the task is traced by its natural parent ptrace_reparented() returns 0
      but we should clear ->btsxxx anyway.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      06eb23b1
  7. 10 Feb, 2009 1 commit
  8. 09 Feb, 2009 6 commits
    • Tejun Heo's avatar
      x86: fix math_emu register frame access · d315760f
      Tejun Heo authored
      do_device_not_available() is the handler for #NM and it declares that
      it takes a unsigned long and calls math_emu(), which takes a long
      argument and surprisingly expects the stack frame starting at the zero
      argument would match struct math_emu_info, which isn't true regardless
      of configuration in the current code.
      
      This patch makes do_device_not_available() take struct pt_regs like
      other exception handlers and initialize struct math_emu_info with
      pointer to it and pass pointer to the math_emu_info to math_emulate()
      like normal C functions do.  This way, unless gcc makes a copy of
      struct pt_regs in do_device_not_available(), the register frame is
      correctly accessed regardless of kernel configuration or compiler
      used.
      
      This doesn't fix all math_emu problems but it at least gets it
      somewhat working.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d315760f
    • Tejun Heo's avatar
      x86: math_emu info cleanup · ae6af41f
      Tejun Heo authored
      Impact: cleanup
      
      * Come on, struct info?  s/struct info/struct math_emu_info/
      
      * Use struct pt_regs and kernel_vm86_regs instead of defining its own
        register frame structure.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ae6af41f
    • Tejun Heo's avatar
      x86: include correct %gs in a.out core dump · 914c3d63
      Tejun Heo authored
      Impact: dump the correct %gs into a.out core dump
      
      aout_dump_thread() read %gs but didn't include it in core dump.  Fix
      it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      914c3d63
    • Alok Kataria's avatar
      x86, vmi: put a missing paravirt_release_pmd in pgd_dtor · 55a8ba4b
      Alok Kataria authored
      Commit 6194ba6f ("x86: don't special-case
      pmd allocations as much") made changes to the way we handle pmd allocations,
      and while doing that it dropped a call to  paravirt_release_pd on the
      pgd page from the pgd_dtor code path.
      
      As a result of this missing release, the hypervisor is now unaware of the
      pgd page being freed, and as a result it ends up tracking this page as a
      page table page.
      
      After this the guest may start using the same page for other purposes, and
      depending on what use the page is put to, it may result in various performance
      and/or functional issues ( hangs, reboots).
      
      Since this release is only required for VMI, I now release the pgd page from
      the (vmi)_pgd_free hook.
      Signed-off-by: default avatarAlok N Kataria <akataria@vmware.com>
      Acked-by: default avatarJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      55a8ba4b
    • Yinghai Lu's avatar
      x86: find nr_irqs_gsi with mp_ioapic_routing · 3f4a739c
      Yinghai Lu authored
      Impact: find right nr_irqs_gsi on some systems.
      
      One test-system has gap between gsi's:
      
      [    0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
      [    0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
      [    0.000000] ACPI: IOAPIC (id[0x05] address[0xfeafd000] gsi_base[48])
      [    0.000000] IOAPIC[1]: apic_id 5, version 0, address 0xfeafd000, GSI 48-54
      [    0.000000] ACPI: IOAPIC (id[0x06] address[0xfeafc000] gsi_base[56])
      [    0.000000] IOAPIC[2]: apic_id 6, version 0, address 0xfeafc000, GSI 56-62
      ...
      [    0.000000] nr_irqs_gsi: 38
      
      So nr_irqs_gsi is not right. some irq for MSI will overwrite with io_apic.
      
      need to get that with acpi_probe_gsi when acpi io_apic is used
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3f4a739c
    • Pallipadi, Venkatesh's avatar
      x86: add clflush before monitor for Intel 7400 series · e736ad54
      Pallipadi, Venkatesh authored
      For Intel 7400 series CPUs, the recommendation is to use a clflush on the
      monitored address just before monitor and mwait pair [1].
      
      This clflush makes sure that there are no false wakeups from mwait when the
      monitored address was recently written to.
      
      [1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
          section in specification update document of 7400 series
          http://download.intel.com/design/xeon/specupdt/32033601.pdfSigned-off-by: default avatarVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e736ad54
  9. 05 Feb, 2009 3 commits
  10. 04 Feb, 2009 5 commits
  11. 03 Feb, 2009 13 commits
  12. 02 Feb, 2009 1 commit