1. 08 Apr, 2015 1 commit
    • Denys Vlasenko's avatar
      x86/asm/entry/irq: Simplify interrupt dispatch table (IDT) layout · 3304c9c3
      Denys Vlasenko authored
      
      Interrupt entry points are handled with the following code,
      each 32-byte code block contains seven entry points:
      
      		...
      		[push][jump 22] // 4 bytes
      		[push][jump 18] // 4 bytes
      		[push][jump 14] // 4 bytes
      		[push][jump 10] // 4 bytes
      		[push][jump  6] // 4 bytes
      		[push][jump  2] // 4 bytes
      		[push][jump common_interrupt][padding] // 8 bytes
      
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump]
      		[push][jump common_interrupt][padding]
      
      		[padding_2]
      	common_interrupt:
      
      And there is a table which holds pointers to every entry point,
      IOW: to every push.
      
      In cold cache, two jumps are still costlier than one, even
      though we get the benefit of them residing in the same
      cacheline.
      
      This change replaces short jumps with near ones to
      'common_interrupt', and pads every push+jump pair to 8 bytes. This
      way, each interrupt takes only one jump.
      
      This change replaces ".p2align CONFIG_X86_L1_CACHE_SHIFT" before
      dispatch table with ".align 8" - we do not need anything
      stronger than that.
      
      The table of entry addresses (the interrupt[] array) is no
      longer necessary, the address of entries can be easily
      calculated as (irq_entries_start + i*8).
      
         text	   data	    bss	    dec	    hex	filename
        12546	      0	      0	  12546	   3102	entry_64.o.before
        11626	      0	      0	  11626	   2d6a	entry_64.o
      
      The size decrease is because 1656 bytes of .init.rodata are
      gone. That's initdata, though. The resident size does go up a
      bit.
      
      Run-tested (32 and 64 bits).
      Acked-and-Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1428090553-7283-1-git-send-email-dvlasenk@redhat.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3304c9c3
  2. 16 Dec, 2014 2 commits
    • Jiang Liu's avatar
      x86, irq: Move local APIC related code from io_apic.c into vector.c · 74afab7a
      Jiang Liu authored
      
      Create arch/x86/kernel/apic/vector.c to host local APIC related code,
      prepare for making MSI/HT_IRQ independent of IOAPIC.
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Link: http://lkml.kernel.org/r/1414397531-28254-10-git-send-email-jiang.liu@linux.intel.com
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      74afab7a
    • Jan Beulich's avatar
      x86: Avoid building unused IRQ entry stubs · 2414e021
      Jan Beulich authored
      
      When X86_LOCAL_APIC (i.e. unconditionally on x86-64),
      first_system_vector will never end up being higher than
      LOCAL_TIMER_VECTOR (0xef), and hence building stubs for vectors
      0xef...0xff is pointlessly reducing code density. Deal with this at
      build time already.
      
      Taking into consideration that X86_64 implies X86_LOCAL_APIC, also
      simplify (and hence make easier to read and more consistent with the
      change done here) some #if-s in arch/x86/kernel/irqinit.c.
      
      While we could further improve the packing of the IRQ entry stubs (the
      four ones now left in the last set could be fit into the four padding
      bytes each of the final four sets have) this doesn't seem to provide
      any real benefit: Both irq_entries_start and common_interrupt getting
      cache line aligned, eliminating the 30th set would just produce 32
      bytes of padding between the 29th and common_interrupt.
      
      [ tglx: Folded lguest fix from Dan Carpenter ]
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: lguest@lists.ozlabs.org
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Link: http://lkml.kernel.org/r/54574D5F0200007800044389@mail.emea.novell.com
      Link: http://lkml.kernel.org/r/20141115185718.GB6530@mwanda
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      2414e021
  3. 28 Oct, 2014 1 commit
    • Maciej W. Rozycki's avatar
      x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts · 60e684f0
      Maciej W. Rozycki authored
      
      Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems
      that make  use of 8259A Programmable Interrupt Controllers.
      Specifically convert  output like this:
      
                 CPU0
        0:      76573    XT-PIC-XT-PIC    timer
        1:         11    XT-PIC-XT-PIC    i8042
        2:          0    XT-PIC-XT-PIC    cascade
        4:          8    XT-PIC-XT-PIC    serial
        6:          3    XT-PIC-XT-PIC    floppy
        7:          0    XT-PIC-XT-PIC    parport0
        8:          1    XT-PIC-XT-PIC    rtc0
       10:        448    XT-PIC-XT-PIC    fddi0
       12:         23    XT-PIC-XT-PIC    eth0
       14:       2464    XT-PIC-XT-PIC    ide0
      NMI:          0   Non-maskable interrupts
      ERR:          0
      
      to one like this:
      
                 CPU0
        0:     122033    XT-PIC  timer
        1:         11    XT-PIC  i8042
        2:          0    XT-PIC  cascade
        4:          8    XT-PIC  serial
        6:          3    XT-PIC  floppy
        7:          0    XT-PIC  parport0
        8:          1    XT-PIC  rtc0
       10:        145    XT-PIC  fddi0
       12:         31    XT-PIC  eth0
       14:       2245    XT-PIC  ide0
      NMI:          0   Non-maskable interrupts
      ERR:          0
      
      that is one like we used to have from ~2.2 till it was changed
      sometime.
      
      The rationale is there is no value in this duplicate
      information, it  merely clutters output and looks ugly.  We only
      have one handler for  8259A interrupts so there is no need to
      give it a name separate from the  name already given to
      irq_chip.
      
      We could define meaningful names for handlers based on bits in
      the ELCR  register on systems that have it or the value of the
      LTIM bit we use in  ICW1 otherwise (hardcoded to 0 though with
      MCA support gone), to tell  edge-triggered and level-triggered
      inputs apart.  While that information  does not affect 8259A
      interrupt handlers it could help people determine  which lines
      are shareable and which are not.  That is material for a
      separate change though.
      
      Any tools that parse /proc/interrupts are supposed not to be
      affected  since it was many years we used the format this change
      converts back to.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@linux-mips.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      60e684f0
  4. 25 Aug, 2014 1 commit
  5. 21 Jun, 2014 2 commits
  6. 12 Jan, 2014 1 commit
    • Prarit Bhargava's avatar
      x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs · 9345005f
      Prarit Bhargava authored
      During heavy CPU-hotplug operations the following spurious kernel warnings
      can trigger:
      
        do_IRQ: No ... irq handler for vector (irq -1)
      
        [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ]
      
      When downing a cpu it is possible that there are unhandled irqs
      left in the APIC IRR register.  The following code path shows
      how the problem can occur:
      
       1. CPU 5 is to go down.
      
       2. cpu_disable() on CPU 5 executes with interrupt flag cleared
          by local_irq_save() via stop_machine().
      
       3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because
          interrupt flag is cleared (CPU unabled to handle the irq)
      
       4. IRQs are migrated off of CPU 5, and the vectors' irqs are set
          to -1. 5. stop_machine() finishes cpu_disable()
      
       6. cpu_die() for CPU 5 executes in normal context.
      
       7. CPU 5 attempts to handle IRQ 12 because the IRR is set for
          IRQ 12.  The code attempts to find the vector's IRQ and cannot
          because it has been set to -1. 8. do_IRQ() warning displays
          warning about CPU 5 IRQ 12.
      
      I added a debug printk to output which CPU & vector was
      retriggered and discovered that that we are getting bogus
      events.  I see a 100% correlation between this debug printk in
      fixup_irqs() and the do_IRQ() warning.
      
      This patchset resolves this by adding definitions for
      VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying
      the code to use them.
      
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831
      
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Reviewed-by: default avatarRui Wang <rui.y.wang@intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: janet.morgan@Intel.com
      Cc: tony.luck@Intel.com
      Cc: ruiv.wang@gmail.com
      Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com
      
      
      [ Cleaned up the code a bit. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9345005f
  7. 16 Apr, 2013 1 commit
  8. 17 Dec, 2012 1 commit
  9. 28 Jun, 2012 1 commit
  10. 28 Mar, 2012 1 commit
  11. 23 Mar, 2012 1 commit
  12. 10 Mar, 2012 1 commit
  13. 22 Dec, 2011 1 commit
    • Kay Sievers's avatar
      driver-core: remove sysdev.h usage. · edbaa603
      Kay Sievers authored
      
      The sysdev.h file should not be needed by any in-kernel code, so remove
      the .h file from these random files that seem to still want to include
      it.
      
      The sysdev code will be going away soon, so this include needs to be
      removed no matter what.
      
      Cc: Jiandong Zheng <jdzheng@broadcom.com>
      Cc: Scott Branden <sbranden@broadcom.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Kukjin Kim <kgene.kim@samsung.com>
      Cc: David Brown <davidb@codeaurora.org>
      Cc: Daniel Walker <dwalker@fifo99.com>
      Cc: Bryan Huntsman <bryanh@codeaurora.org>
      Cc: Ben Dooks <ben-linux@fluff.org>
      Cc: Wan ZongShun <mcuos.com@gmail.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "Venkatesh Pallipadi
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Matthew Garrett <mjg@redhat.com>
      Signed-off-by: default avatarKay Sievers <kay.sievers@vrfy.org>
      edbaa603
  14. 26 Jul, 2011 1 commit
  15. 16 Jun, 2011 1 commit
  16. 12 Mar, 2011 2 commits
  17. 23 Feb, 2011 2 commits
  18. 14 Feb, 2011 1 commit
  19. 18 Oct, 2010 1 commit
    • Peter Zijlstra's avatar
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra authored
      
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ...
      e360adbe
  20. 12 Oct, 2010 1 commit
  21. 03 May, 2010 1 commit
  22. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        bloc...
      5a0e3ad6
  23. 16 Mar, 2010 1 commit
    • Suresh Siddha's avatar
      x86: Handle legacy PIC interrupts on all the cpu's · 36e9e1ea
      Suresh Siddha authored
      
      Ingo Molnar reported that with the recent changes of not
      statically blocking IRQ0_VECTOR..IRQ15_VECTOR's on all the
      cpu's, broke an AMD platform (with Nvidia chipset) boot when
      "noapic" boot option is used.
      
      On this platform, legacy PIC interrupts are getting delivered to
      all the cpu's instead of just the boot cpu. Thus not
      initializing the vector to irq mapping for the legacy irq's
      resulted in not handling certain interrupts causing boot hang.
      
      Fix this by initializing the vector to irq mapping on all the
      logical cpu's, if the legacy IRQ is handled by the legacy PIC.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      [ -v2: io-apic-enabled improvement ]
      Acked-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1268692386.3296.43.camel@sbs-t61.sc.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      36e9e1ea
  24. 24 Feb, 2010 1 commit
  25. 20 Feb, 2010 1 commit
  26. 19 Jan, 2010 1 commit
    • Suresh Siddha's avatar
      x86, irq: Don't block IRQ0_VECTOR..IRQ15_VECTOR's on all cpu's · 97943390
      Suresh Siddha authored
      
      Currently IRQ0..IRQ15 are assigned to IRQ0_VECTOR..IRQ15_VECTOR's on
      all the cpu's.
      
      If these IRQ's are handled by legacy pic controller, then the kernel
      handles them only on cpu 0. So there is no need to block this vector
      space on all cpu's.
      
      Similarly if these IRQ's are handled by IO-APIC, then the IRQ affinity
      will determine on which cpu's we need allocate the vector resource for
      that particular IRQ. This can be done dynamically and here also there
      is no need to block 16 vectors for IRQ0..IRQ15 on all cpu's.
      
      Fix this by initially assigning IRQ0..IRQ15 to IRQ0_VECTOR..IRQ15_VECTOR's only
      on cpu 0. If the legacy controllers like pic handles these irq's, then
      this configuration will be fixed. If more modern controllers like IO-APIC
      handle these IRQ's, then we start with this configuration and as IRQ's
      migrate, vectors (/and cpu's) associated with these IRQ's change dynamically.
      
      This will freeup the block of 16 vectors on other cpu's which don't handle
      IRQ0..IRQ15, which can now be used for other IRQ's that the particular cpu
      handle.
      
      [ hpa: this also an architectural cleanup for future legacy-PIC-free
        configurations. ]
      [ hpa: fixed typo NR_LEGACY_IRQS -> NR_IRQS_LEGACY ]
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1263932453.2814.52.camel@sbs-t61.sc.intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      97943390
  27. 14 Oct, 2009 1 commit
  28. 21 Sep, 2009 1 commit
    • Ingo Molnar's avatar
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar authored
      
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cdd6c482
  29. 16 Sep, 2009 1 commit
  30. 31 Aug, 2009 2 commits
  31. 22 Jul, 2009 1 commit
  32. 10 Jul, 2009 1 commit
  33. 03 Jun, 2009 2 commits
    • Andi Kleen's avatar
      x86: fix panic with interrupts off (needed for MCE) · 4ef702c1
      Andi Kleen authored
      
      For some time each panic() called with interrupts disabled
      triggered the !irqs_disabled() WARN_ON in smp_call_function(),
      producing ugly backtraces and confusing users.
      
      This is a common situation with machine checks for example which
      tend to call panic with interrupts disabled, but will also hit
      in other situations e.g. panic during early boot.  In fact it
      means that panic cannot be called in many circumstances, which
      would be bad.
      
      This all started with the new fancy queued smp_call_function,
      which is then used by the shutdown path to shut down the other
      CPUs.
      
      On closer examination it turned out that the fancy RCU
      smp_call_function() does lots of things not suitable in a panic
      situation anyways, like allocating memory and relying on complex
      system state.
      
      I originally tried to patch this over by checking for panic
      there, but it was quite complicated and the original patch
      was also not very popular.  This also didn't fix some of the
      underlying complexity problems.
      
      The new code in post 2.6.29 tries to patch around this by
      checking for oops_in_progress, but that is not enough to make
      this fully safe and I don't think that's a real solution
      because panic has to be reliable.
      
      So instead use an own vector to reboot.  This makes the reboot
      code extremly straight forward, which is definitely a big plus
      in a panic situation where it is important to avoid relying on
      too much kernel state.  The new simple code is also safe to be
      called from interupts off region because it is very very simple.
      
      There can be situations where it is important that panic
      is reliable.  For example on a fatal machine check the panic
      is needed to get the system up again and running as quickly
      as possible.  So it's important that panic is reliable and
      all function it calls simple.
      
      This is why I came up with this simple vector scheme.
      It's very hard to beat in simplicity.  Vectors are not
      particularly precious anymore since all big systems are
      using per CPU vectors.
      
      Another possibility would have been to use an NMI similar
      to kdump, but there is still the problem that NMIs don't
      work reliably on some systems due to BIOS issues.  NMIs
      would have been able to stop CPUs running with interrupts
      off too.  In the sake of universal reliability I opted for
      using a non NMI vector for now.
      
      I put the reboot vector into the highest priority bucket of
      the APIC vectors and moved the 64bit UV_BAU message down
      instead into the next lower priority.
      
      [ Impact: bug fix, fixes an old regression ]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      4ef702c1
    • Andi Kleen's avatar
      x86, mce: implement bootstrapping for machine check wakeups · ccc3c319
      Andi Kleen authored
      
      Machine checks support waking up the mcelog daemon quickly.
      
      The original wake up code for this was pretty ugly, relying on
      a idle notifier and a special process flag. The reason it did
      it this way is that the machine check handler is not subject
      to normal interrupt locking rules so it's not safe
      to call wake_up().  Instead it set a process flag
      and then either did the wakeup in the syscall return
      or in the idle notifier.
      
      This patch adds a new "bootstraping" method as replacement.
      
      The idea is that the handler checks if it's in a state where
      it is unsafe to call wake_up(). If it's safe it calls it directly.
      When it's not safe -- that is it interrupted in a critical
      section with interrupts disables -- it uses a new "self IPI" to trigger
      an IPI to its own CPU. This can be done safely because IPI
      triggers are atomic with some care. The IPI is raised
      once the interrupts are reenabled and can then safely call
      wake_up().
      
      When APICs are disabled the event is just queued and will be picked up
      eventually by the next polling timer. I think that's a reasonable
      compromise, since it should only happen quite rarely.
      
      Contains fixes from Ying Huang.
      
      [ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ]
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      ccc3c319
  34. 15 Apr, 2009 1 commit
    • Yinghai Lu's avatar
      x86: use used_vectors in init_IRQ() · 77857dc0
      Yinghai Lu authored
      
      Impact: fix crash with many devices
      
      I found this crash:
      
      [  552.616646] general protection fault: 0403 [#1] SMP
      [  552.620013] last sysfs file:
      /sys/devices/pci0000:00/0000:00:02.0/usb1/1-1/1-1:1.0/host13/target13:0:0/13:0:0:0/block/sr0/size
      [  552.620013] CPU 0
      [  552.620013] Modules linked in:
      [  552.620013] Pid: 0, comm: swapper Not tainted 2.6.30-rc1-tip-01931-g8fcafd8-dirty #28 Sun Fire X4440
      [  552.620013] RIP: 0010:[<ffffffff8023bada>]  [<ffffffff8023bada>] default_idle+0x7d/0xda
      [  552.620013] RSP: 0018:ffffffff81345e68  EFLAGS: 00010246
      [  552.620013] RAX: 0000000000000000 RBX: ffffffff8133d870 RCX: ffffc20000000000
      [  552.620013] RDX: 00000000001d0620 RSI: ffffffff8023bad8 RDI: ffffffff802a3169
      [  552.620013] RBP: ffffffff81345e98 R08: 0000000000000000 R09: ffffffff812244a0
      [  552.620013] R10: ffffffff81345dc8 R11: 7ebe1b6fa0bcac50 R12: 4ec4ec4ec4ec4ec5
      [  552.620013] R13: ffffffff813a54d0 R14: ffffffff813a7a40 R15: 0000000000000000
      [  552.620013] FS:  00000000006d1880(0000) GS:ffffc20000000000(0000) knlGS:0000000000000000
      [  552.620013] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      [  552.620013] CR2: 00007fec9d936a50 CR3: 000000007d1a9000 CR4: 00000000000006e0
      [  552.620013] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  552.620013] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  552.620013] Process swapper (pid: 0, threadinfo ffffffff81344000,task ffffffff812244a0)
      [  552.620013] Stack:
      [  552.620013]  0000000000000000 ffffc20000000000 00000000001d0620 7ebe1b6fa0bcac50
      [  552.620013]  ffffffff8133d870 4ec4ec4ec4ec4ec5 ffffffff81345ec8 ffffffff8023bd84
      [  552.620013]  4ec4ec4ec4ec4ec5 ffffffff813a54d0 7ebe1b6fa0bcac50 ffffffff8133d870
      [  552.620013] Call Trace:
      [  552.620013]  [<ffffffff8023bd84>] c1e_idle+0x109/0x124
      [  552.620013]  [<ffffffff8023314b>] cpu_idle+0xb8/0x101
      [  552.620013]  [<ffffffff80c16d6a>] rest_init+0x7e/0x94
      [  552.620013]  [<ffffffff81357efc>] start_kernel+0x3dc/0x3fd
      [  552.620013]  [<ffffffff813572a9>] x86_64_start_reservations+0xb9/0xd4
      [  552.620013]  [<ffffffff813573b2>] x86_64_start_kernel+0xee/0x109
      [  552.620013] Code: 48 8b 04 25 f8 b4 00 00 83 a0 3c e0 ff ff fb 0f ae f0 65 48 8b 04 25 f8 b4 00 00 f6 80 38 e0 ff ff 08 75 09 e8 71 76 06 00 fb f4 <eb> 06 e8 68 76 06 00 fb 65 48 8b 04 25 f8 b4 00 00 83 88 3c e0
      [  552.620013] RIP  [<ffffffff8023bada>] default_idle+0x7d/0xda
      [  552.620013]  RSP <ffffffff81345e68>
      [  552.828646] ---[ end trace 4cbfc5c01382af7f ]---
      
      Joerg Roedel said
      	"The 0403 error code means that there was an external interrupt with vector
      	0x80. Yinghai, my theory is that the kernel on this machine has no 32bit
      	emulation compiled in, right? In this case the selector points to a zero entry
      	which may cause the #gpf right after the hlt.
      	But I have no idea where the external int 0x80 comes from"
      
      it turns out that we could use 0x80 for external device on 64-bit
      when 32-bit emulation is disabled.
      
      But we forgot to set the gate for it.
      
      try to set gate for it by checking used_vectors.
      
      Also move apic_intr_init() early to avoid setting
      that gate two times.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Joerg Roedel <joerg.roedel@amd.com>
      LKML-Reference: <49E62DFD.6010904@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      77857dc0