An error occurred fetching the project authors.
  1. 11 Dec, 2011 3 commits
    • Frederic Weisbecker's avatar
      nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() · 1268fbc7
      Frederic Weisbecker authored
      Those two APIs were provided to optimize the calls of
      tick_nohz_idle_enter() and rcu_idle_enter() into a single
      irq disabled section. This way no interrupt happening in-between would
      needlessly process any RCU job.
      
      Now we are talking about an optimization for which benefits
      have yet to be measured. Let's start simple and completely decouple
      idle rcu and dyntick idle logics to simplify.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1268fbc7
    • Frederic Weisbecker's avatar
      nohz: Allow rcu extended quiescent state handling seperately from tick stop · 2bbb6817
      Frederic Weisbecker authored
      It is assumed that rcu won't be used once we switch to tickless
      mode and until we restart the tick. However this is not always
      true, as in x86-64 where we dereference the idle notifiers after
      the tick is stopped.
      
      To prepare for fixing this, add two new APIs:
      tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
      
      If no use of RCU is made in the idle loop between
      tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
      must instead call the new *_norcu() version such that the arch doesn't
      need to call rcu_idle_enter() and rcu_idle_exit().
      
      Otherwise the arch must call tick_nohz_enter_idle() and
      tick_nohz_exit_idle() and also call explicitly:
      
      - rcu_idle_enter() after its last use of RCU before the CPU is put
      to sleep.
      - rcu_idle_exit() before the first use of RCU after the CPU is woken
      up.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      2bbb6817
    • Frederic Weisbecker's avatar
      nohz: Separate out irq exit and idle loop dyntick logic · 280f0677
      Frederic Weisbecker authored
      The tick_nohz_stop_sched_tick() function, which tries to delay
      the next timer tick as long as possible, can be called from two
      places:
      
      - From the idle loop to start the dytick idle mode
      - From interrupt exit if we have interrupted the dyntick
      idle mode, so that we reprogram the next tick event in
      case the irq changed some internal state that requires this
      action.
      
      There are only few minor differences between both that
      are handled by that function, driven by the ts->inidle
      cpu variable and the inidle parameter. The whole guarantees
      that we only update the dyntick mode on irq exit if we actually
      interrupted the dyntick idle mode, and that we enter in RCU extended
      quiescent state from idle loop entry only.
      
      Split this function into:
      
      - tick_nohz_idle_enter(), which sets ts->inidle to 1, enters
      dynticks idle mode unconditionally if it can, and enters into RCU
      extended quiescent state.
      
      - tick_nohz_irq_exit() which only updates the dynticks idle mode
      when ts->inidle is set (ie: if tick_nohz_idle_enter() has been called).
      
      To maintain symmetry, tick_nohz_restart_sched_tick() has been renamed
      into tick_nohz_idle_exit().
      
      This simplifies the code and micro-optimize the irq exit path (no need
      for local_irq_save there). This also prepares for the split between
      dynticks and rcu extended quiescent state logics. We'll need this split to
      further fix illegal uses of RCU in extended quiescent states in the idle
      loop.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      280f0677
  2. 10 Oct, 2011 1 commit
    • Don Zickus's avatar
      x86, nmi: Add in logic to handle multiple events and unknown NMIs · b227e233
      Don Zickus authored
      Previous patches allow the NMI subsystem to process multipe NMI events
      in one NMI.  As previously discussed this can cause issues when an event
      triggered another NMI but is processed in the current NMI.  This causes the
      next NMI to go unprocessed and become an 'unknown' NMI.
      
      To handle this, we first have to flag whether or not the NMI handler handled
      more than one event or not.  If it did, then there exists a chance that
      the next NMI might be already processed.  Once the NMI is flagged as a
      candidate to be swallowed, we next look for a back-to-back NMI condition.
      
      This is determined by looking at the %rip from pt_regs.  If it is the same
      as the previous NMI, it is assumed the cpu did not have a chance to jump
      back into a non-NMI context and execute code and instead handled another NMI.
      
      If both of those conditions are true then we will swallow any unknown NMI.
      
      There still exists a chance that we accidentally swallow a real unknown NMI,
      but for now things seem better.
      
      An optimization has also been added to the nmi notifier rountine.  Because x86
      can latch up to one NMI while currently processing an NMI, we don't have to
      worry about executing _all_ the handlers in a standalone NMI.  The idea is
      if multiple NMIs come in, the second NMI will represent them.  For those
      back-to-back NMI cases, we have the potentail to drop NMIs.  Therefore only
      execute all the handlers in the second half of a detected back-to-back NMI.
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1317409584-23662-5-git-send-email-dzickus@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b227e233
  3. 15 Sep, 2011 1 commit
  4. 03 Aug, 2011 1 commit
    • Len Brown's avatar
      cpuidle: stop depending on pm_idle · a0bfa137
      Len Brown authored
      cpuidle users should call cpuidle_call_idle() directly
      rather than via (pm_idle)() function pointer.
      
      Architecture may choose to continue using (pm_idle)(),
      but cpuidle need not depend on it:
      
        my_arch_cpu_idle()
      	...
      	if(cpuidle_call_idle())
      		pm_idle();
      
      cc: Kevin Hilman <khilman@deeprootsystems.com>
      cc: Paul Mundt <lethal@linux-sh.org>
      cc: x86@kernel.org
      Acked-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      a0bfa137
  5. 09 Jun, 2011 1 commit
    • Mathias Krause's avatar
      exec: delay address limit change until point of no return · dac853ae
      Mathias Krause authored
      Unconditionally changing the address limit to USER_DS and not restoring
      it to its old value in the error path is wrong because it prevents us
      using kernel memory on repeated calls to this function.  This, in fact,
      breaks the fallback of hard coded paths to the init program from being
      ever successful if the first candidate fails to load.
      
      With this patch applied switching to USER_DS is delayed until the point
      of no return is reached which makes it possible to have a multi-arch
      rootfs with one arch specific init binary for each of the (hard coded)
      probed paths.
      
      Since the address limit is already set to USER_DS when start_thread()
      will be invoked, this redundancy can be safely removed.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dac853ae
  6. 12 Jan, 2011 1 commit
    • Thomas Renninger's avatar
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle... · f77cfe4e
      Thomas Renninger authored
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
      
      Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
      events -> this patch fixes it and makes cpu_idle events throwing less complex.
      
      It also introduces cpu_idle events for all architectures which use
      the cpuidle subsystem, namely:
        - arch/arm/mach-at91/cpuidle.c
        - arch/arm/mach-davinci/cpuidle.c
        - arch/arm/mach-kirkwood/cpuidle.c
        - arch/arm/mach-omap2/cpuidle34xx.c
        - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
        - arch/x86/kernel/process.c (did throw events before, but was a mess)
        - drivers/idle/intel_idle.c (did throw events before)
      
      Convention should be:
      Fire cpu_idle events inside the current pm_idle function (not somewhere
      down the the callee tree) to keep things easy.
      
      Current possible pm_idle functions in X86:
      c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
      -> this is really easy is now.
      
      This affects userspace:
      The type field of the cpu_idle power event can now direclty get
      mapped to:
      /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
      instead of throwing very CPU/mwait specific values.
      This change is not visible for the intel_idle driver.
      For the acpi_idle driver it should only be visible if the vendor
      misses out C-states in his BIOS.
      Another (perf timechart) patch reads out cpuidle info of cpu_idle
      events from:
      /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
      to the correct C-/cpuidle state again, even if e.g. vendors miss
      out C-states in their BIOS and for example only export C1 and C3.
      -> everything is fine.
      Signed-off-by: default avatarThomas Renninger <trenn@suse.de>
      CC: Robert Schoene <robert.schoene@tu-dresden.de>
      CC: Jean Pihet <j-pihet@ti.com>
      CC: Arjan van de Ven <arjan@linux.intel.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: linux-pm@lists.linux-foundation.org
      CC: linux-acpi@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      CC: linux-perf-users@vger.kernel.org
      CC: linux-omap@vger.kernel.org
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      f77cfe4e
  7. 04 Jan, 2011 1 commit
    • Thomas Renninger's avatar
      perf: Clean up power events by introducing new, more generic ones · 25e41933
      Thomas Renninger authored
      Add these new power trace events:
      
       power:cpu_idle
       power:cpu_frequency
       power:machine_suspend
      
      The old C-state/idle accounting events:
        power:power_start
        power:power_end
      
      Have now a replacement (but we are still keeping the old
      tracepoints for compatibility):
      
        power:cpu_idle
      
      and
        power:power_frequency
      
      is replaced with:
        power:cpu_frequency
      
      power:machine_suspend is newly introduced.
      
      Jean Pihet has a patch integrated into the generic layer
      (kernel/power/suspend.c) which will make use of it.
      
      the type= field got removed from both, it was never
      used and the type is differed by the event type itself.
      
      perf timechart userspace tool gets adjusted in a separate patch.
      Signed-off-by: default avatarThomas Renninger <trenn@suse.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Acked-by: default avatarJean Pihet <jean.pihet@newoldbits.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: rjw@sisk.pl
      LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
      25e41933
  8. 18 Jun, 2010 1 commit
  9. 10 May, 2010 1 commit
  10. 26 Mar, 2010 1 commit
    • Peter Zijlstra's avatar
      x86, perf, bts, mm: Delete the never used BTS-ptrace code · faa4602e
      Peter Zijlstra authored
      Support for the PMU's BTS features has been upstreamed in
      v2.6.32, but we still have the old and disabled ptrace-BTS,
      as Linus noticed it not so long ago.
      
      It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
      regard for other uses (perf) and doesn't provide the flexibility
      needed for perf either.
      
      Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
      was never used and ptrace-block-step can be implemented using a
      much simpler approach.
      
      So axe all 3000 lines of it. That includes the *locked_memory*()
      APIs in mm/mlock.c as well.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20100325135413.938004390@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      faa4602e
  11. 13 Jan, 2010 1 commit
  12. 28 Dec, 2009 1 commit
    • Pekka Enberg's avatar
      x86: Use KERN_DEFAULT log-level in __show_regs() · d015a092
      Pekka Enberg authored
      Andrew Morton reported a strange looking kmemcheck warning:
      
        WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff88004fba6c20)
        0000000000000000310000000000000000000000000000002413000000c9ffff
         u u u u u u u u u u u u u u u u i i i i i i i i u u u u u u u u
      
         [<ffffffff810af3aa>] kmemleak_scan+0x25a/0x540
         [<ffffffff810afbcb>] kmemleak_scan_thread+0x5b/0xe0
         [<ffffffff8104d0fe>] kthread+0x9e/0xb0
         [<ffffffff81003074>] kernel_thread_helper+0x4/0x10
         [<ffffffffffffffff>] 0xffffffffffffffff
      
      The above printout is missing register dump completely. The
      problem here is that the output comes from syslog which doesn't
      show KERN_INFO log-level messages. We didn't see this before
      because both of us were testing on 32-bit kernels which use the
      _default_ log-level.
      
      Fix that up by explicitly using KERN_DEFAULT log-level for
      __show_regs() printks.
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1261988819.4641.2.camel@penberg-laptop>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d015a092
  13. 11 Dec, 2009 1 commit
  14. 10 Dec, 2009 4 commits
  15. 09 Dec, 2009 1 commit
  16. 08 Nov, 2009 1 commit
    • Frederic Weisbecker's avatar
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker authored
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
  17. 03 Nov, 2009 1 commit
  18. 12 Oct, 2009 1 commit
    • H. Peter Anvin's avatar
      x86: use kernel_stack_pointer() in process_32.c · def3c5d0
      H. Peter Anvin authored
      The way to obtain a kernel-mode stack pointer from a struct pt_regs in
      32-bit mode is "subtle": the stack doesn't actually contain the stack
      pointer, but rather the location where it would have been marks the
      actual previous stack frame.  For clarity, use kernel_stack_pointer()
      instead of coding this weirdness explicitly.
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      def3c5d0
  19. 03 Aug, 2009 1 commit
    • Tejun Heo's avatar
      x86, percpu: Collect hot percpu variables into one cacheline · bdf977b3
      Tejun Heo authored
      On x86_64, percpu variables current_task and kernel_stack are used for
      get_current() and current_thread_info() respectively and thus are
      often used close to each other.  Move definition of current_task to
      kernel/cpu/common.c right above kernel_stack definition and align it
      to cacheline so that they always fall into the same cacheline.  Two
      percpu variables defined there together - irq_stack_ptr and irq_count
      - are also pretty hot and will benefit from sharing the cacheline.
      
      For consistency, current_task definition for x86_32 is also moved to
      kernel/cpu/common.c.
      
      Putting current_task and kernel_stack into the same cacheline was
      suggested by Linus Torvalds.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      bdf977b3
  20. 17 Jun, 2009 1 commit
    • Jeremy Fitzhardinge's avatar
      x86-32: make sure clts is batched during context switch · 2fcddce1
      Jeremy Fitzhardinge authored
      If we're preloading the fpu state during context switch, make sure the clts
      happens while we're batching the cpu context update, then do the actual
      __math_state_restore once the updates are flushed.
      
      This allows more efficient context switches when running paravirtualized,
      as all the hypercalls can be folded together into one.
      
      [ Impact: optimise paravirtual FPU context switch ]
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      2fcddce1
  21. 02 Jun, 2009 1 commit
  22. 12 May, 2009 2 commits
  23. 07 Apr, 2009 1 commit
    • Markus Metzger's avatar
      x86, ds: add leakage warning · 2311f0de
      Markus Metzger authored
      Add a warning in case a debug store context is not removed before
      the task it is attached to is freed.
      
      Remove the old warning at thread exit. It is too early.
      
      Declare the debug store context field in thread_struct unconditionally.
      
      Remove ds_copy_thread() and ds_exit_thread() and do the work directly
      in process*.c.
      Signed-off-by: default avatarMarkus Metzger <markus.t.metzger@intel.com>
      Cc: roland@redhat.com
      Cc: eranian@googlemail.com
      Cc: oleg@redhat.com
      Cc: juan.villacis@intel.com
      Cc: ak@linux.jf.intel.com
      LKML-Reference: <20090403144601.254472000@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2311f0de
  24. 03 Apr, 2009 1 commit
  25. 30 Mar, 2009 2 commits
  26. 02 Mar, 2009 2 commits
    • Jeremy Fitzhardinge's avatar
      x86: unify chunks of kernel/process*.c · 389d1fb1
      Jeremy Fitzhardinge authored
      With x86-32 and -64 using the same mechanism for managing the
      tss io permissions bitmap, large chunks of process*.c are
      trivially unifyable, including:
      
       - exit_thread
       - flush_thread
       - __switch_to_xtra (along with tsc enable/disable)
      
      and as bonus pickups:
      
       - sys_fork
       - sys_vfork
      
      (Note: asmlinkage expands to empty on x86-64)
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      389d1fb1
    • Jeremy Fitzhardinge's avatar
      x86-32: use non-lazy io bitmap context switching · db949bba
      Jeremy Fitzhardinge authored
      Impact: remove 32-bit optimization to prepare unification
      
      x86-32 and -64 differ in the way they context-switch tasks
      with io permission bitmaps.  x86-64 simply copies the next
      tasks io bitmap into place (if any) on context switch.  x86-32
      invalidates the bitmap on context switch, so that the next
      IO instruction will fault; at that point it installs the
      appropriate IO bitmap.
      
      This makes context switching IO-bitmap-using tasks a bit more
      less expensive, at the cost of making the next IO instruction
      slower due to the extra fault.  This tradeoff only makes sense
      if IO-bitmap-using processes are relatively common, but they
      don't actually use IO instructions very often.
      
      However, in a typical desktop system, the only process likely
      to be using IO bitmaps is the X server, and nothing at all on
      a server.  Therefore the lazy context switch doesn't really win
      all that much, and its just a gratuitious difference from
      64-bit code.
      
      This patch removes the lazy context switch, with a view to
      unifying this code in a later change.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      db949bba
  27. 17 Feb, 2009 1 commit
    • Paul E. McKenney's avatar
      x86, rcu: fix strange load average and ksoftirqd behavior · bf51935f
      Paul E. McKenney authored
      Damien Wyart reported high ksoftirqd CPU usage (20%) on an
      otherwise idle system.
      
      The function-graph trace Damien provided:
      
      >   799.521187 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521371 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521555 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521738 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521934 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522068 |   1)  ksoftir-2324  |               |                rcu_check_callbacks() {
      >   799.522208 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522392 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522575 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522759 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522956 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523074 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.523214 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523397 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523579 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523762 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523960 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524079 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.524220 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524403 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524587 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524770 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      > [ . . . ]
      
      Shows rcu_check_callbacks() being invoked way too often. It should be called
      once per jiffy, and here it is called no less than 22 times in about
      3.5 milliseconds, meaning one call every 160 microseconds or so.
      
      Why do we need to call rcu_pending() and rcu_check_callbacks() from the
      idle loop of 32-bit x86, especially given that no other architecture does
      this?
      
      The following patch removes the call to rcu_pending() and
      rcu_check_callbacks() from the x86 32-bit idle loop in order to
      reduce the softirq load on idle systems.
      Reported-by: default avatarDamien Wyart <damien.wyart@free.fr>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bf51935f
  28. 11 Feb, 2009 3 commits
    • Brian Gerst's avatar
      x86: use regparm(3) for passed-in pt_regs pointer · b12bdaf1
      Brian Gerst authored
      Some syscalls need to access the pt_regs structure, either to copy
      user register state or to modifiy it.  This patch adds stubs to load
      the address of the pt_regs struct into the %eax register, and changes
      the syscalls to take the pointer as an argument instead of relying on
      the assumption that the pt_regs structure overlaps the function
      arguments.
      
      Drop the use of regparm(1) due to concern about gcc bugs, and to move
      in the direction of the eventual removal of regparm(0) for asmlinkage.
      Signed-off-by: default avatarBrian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      b12bdaf1
    • Brian Gerst's avatar
      x86: pass in pt_regs pointer for syscalls that need it · 253f29a4
      Brian Gerst authored
      Some syscalls need to access the pt_regs structure, either to copy
      user register state or to modifiy it.  This patch adds stubs to load
      the address of the pt_regs struct into the %eax register, and changes
      the syscalls to regparm(1) to receive the pt_regs pointer as the
      first argument.
      Signed-off-by: default avatarBrian Gerst <brgerst@gmail.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      253f29a4
    • Tejun Heo's avatar
      x86: fix x86_32 stack protector bugs · 5c79d2a5
      Tejun Heo authored
      Impact: fix x86_32 stack protector
      
      Brian Gerst found out that %gs was being initialized to stack_canary
      instead of stack_canary - 20, which basically gave the same canary
      value for all threads.  Fixing this also exposed the following bugs.
      
      * cpu_idle() didn't call boot_init_stack_canary()
      
      * stack canary switching in switch_to() was being done too late making
        the initial run of a new thread use the old stack canary value.
      
      Fix all of them and while at it update comment in cpu_idle() about
      calling boot_init_stack_canary().
      Reported-by: default avatarBrian Gerst <brgerst@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5c79d2a5
  29. 09 Feb, 2009 2 commits
    • Tejun Heo's avatar
      x86: implement x86_32 stack protector · 60a5317f
      Tejun Heo authored
      Impact: stack protector for x86_32
      
      Implement stack protector for x86_32.  GDT entry 28 is used for it.
      It's set to point to stack_canary-20 and have the length of 24 bytes.
      CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
      to the stack canary segment on entry.  As %gs is otherwise unused by
      the kernel, the canary can be anywhere.  It's defined as a percpu
      variable.
      
      x86_32 exception handlers take register frame on stack directly as
      struct pt_regs.  With -fstack-protector turned on, gcc copies the
      whole structure after the stack canary and (of course) doesn't copy
      back on return thus losing all changed.  For now, -fno-stack-protector
      is added to all files which contain those functions.  We definitely
      need something better.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      60a5317f
    • Tejun Heo's avatar
      x86: make lazy %gs optional on x86_32 · ccbeed3a
      Tejun Heo authored
      Impact: pt_regs changed, lazy gs handling made optional, add slight
              overhead to SAVE_ALL, simplifies error_code path a bit
      
      On x86_32, %gs hasn't been used by kernel and handled lazily.  pt_regs
      doesn't have place for it and gs is saved/loaded only when necessary.
      In preparation for stack protector support, this patch makes lazy %gs
      handling optional by doing the followings.
      
      * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.
      
      * Save and restore %gs along with other registers in entry_32.S unless
        LAZY_GS.  Note that this unfortunately adds "pushl $0" on SAVE_ALL
        even when LAZY_GS.  However, it adds no overhead to common exit path
        and simplifies entry path with error code.
      
      * Define different user_gs accessors depending on LAZY_GS and add
        lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS.  The
        lazy_*_gs() ops are used to save, load and clear %gs lazily.
      
      * Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.
      
      xen and lguest changes need to be verified.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ccbeed3a