1. 20 Jun, 2008 1 commit
    • Suresh Siddha's avatar
      x86: fix NULL pointer deref in __switch_to · 54481cf8
      Suresh Siddha authored
      I am able to reproduce the oops reported by Simon in __switch_to() with
      lguest.
      
      My debug showed that there is at least one lguest specific
      issue (which should be present in 2.6.25 and before aswell) and it got
      exposed with a kernel oops with the recent fpu dynamic allocation patches.
      
      In addition to the previous possible scenario (with fpu_counter), in the
      presence of lguest, it is possible that the cpu's TS bit it still set and the
      lguest launcher task's thread_info has TS_USEDFPU still set.
      
      This is because of the way the lguest launcher handling the guest's TS bit.
      (look at lguest_set_ts() in lguest_arch_run_guest()). This can result
      in a DNA fault while doing unlazy_fpu() in __switch_to(). This will
      end up causing a DNA fault in the context of new process thats
      getting context switched in (as opossed to handling DNA fault in the context
      of lguest launcher/helper process).
      
      This is wrong in both pre and post 2.6.25 kernels. In the recent
      2.6.26-rc series, this is showing up as NULL pointer dereferences or
      sleeping function called from atomic context(__switch_to()), as
      we free and dynamically allocate the FPU context for the newly
      created threads. Older kernels might show some FPU corruption for processes
      running inside of lguest.
      
      With the appended patch, my test system is running for more than 50 mins
      now. So atleast some of your oops (hopefully all!) should get fixed.
      Please give it a try. I will spend more time with this fix tomorrow.
      Reported-by: default avatarSimon Holm Thøgersen <odie@cs.aau.dk>
      Reported-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      54481cf8
  2. 19 Jun, 2008 12 commits
    • Jordan Crouse's avatar
      x86, geode: add a VSA2 ID for General Software · ffe6e1da
      Jordan Crouse authored
      General Software writes their own VSA2 module for their version
      of the Geode BIOS, which returns a different ID then the standard
      VSA2.  This was causing the framebuffer driver to break for most
      GSW boards.
      Signed-off-by: default avatarJordan Crouse <jordan.crouse@amd.com>
      Cc: tglx@linutronix.de
      Cc: linux-geode@lists.infradead.org
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ffe6e1da
    • Bernhard Walle's avatar
      x86: use BOOTMEM_EXCLUSIVE on 32-bit · d3942cff
      Bernhard Walle authored
      This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for
      i386 and prints a error message on failure.
      
      The patch is still for 2.6.26 since it is only bug fixing. The unification
      of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27.
      Signed-off-by: default avatarBernhard Walle <bwalle@suse.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      d3942cff
    • Mikael Pettersson's avatar
      x86, 32-bit: fix boot failure on TSC-less processors · df17b1d9
      Mikael Pettersson authored
      Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6"
      (invalid opcode) and a kernel halt immediately after the
      kernel has been uncompressed. The BUG shows EIP pointing
      to an rdtsc instruction in native_read_tsc(), invoked from
      native_sched_clock().
      
      (This error occurs so early that not even the serial console
      can capture it.)
      
      A bisection showed that this bug first occurs in 2.6.26-rc3-git7,
      via commit 9ccc906c:
      
      >x86: distangle user disabled TSC from unstable
      >
      >tsc_enabled is set to 0 from the command line switch "notsc" and from
      >the mark_tsc_unstable code. Seperate those functionalities and replace
      >tsc_enable with tsc_disable. This makes also the native_sched_clock()
      >decision when to use TSC understandable.
      >
      >Preparatory patch to solve the sched_clock() issue on 32 bit.
      >
      >Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
      
      The core reason for this bug is that native_sched_clock() gets
      called before tsc_init().
      
      Before the commit above, tsc_32.c used a "tsc_enabled" variable
      which defaulted to 0 == disabled, and which only got enabled late
      in tsc_init(). Thus early calls to native_sched_clock() would skip
      the TSC and use jiffies instead.
      
      After the commit above, tsc_32.c uses a "tsc_disabled" variable
      which defaults to 0, meaning that the TSC is Ok to use. Early calls
      to native_sched_clock() now erroneously try to use the TSC on
      !cpu_has_tsc processors, leading to invalid opcode exceptions.
      
      My proposed fix is to initialise tsc_disabled to a "soft disabled"
      state distinct from the hard disabled state set up by the "notsc"
      kernel option. This fixes the native_sched_clock() problem. It also
      allows tsc_init() to be simplified: instead of setting tsc_disabled = 1
      on every error return, we just set tsc_disabled = 0 once when all
      checks have succeeded.
      
      I've verified that this lets my 486 boot again. I've also verified
      that a Core2 machine still uses the TSC as clocksource after the patch.
      Signed-off-by: default avatarMikael Pettersson <mikpe@it.uu.se>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      df17b1d9
    • Suresh Siddha's avatar
      x86: fix NULL pointer deref in __switch_to · 75118a82
      Suresh Siddha authored
      Patrick McHardy reported a crash:
      
      > > I get this oops once a day, its apparently triggered by something
      > > run by cron, but the process is a different one each time.
      > >
      > > Kernel is -git from yesterday shortly before the -rc6 release
      > > (last commit is the usb-2.6 merge, the x86 patches are missing),
      > > .config is attached.
      > >
      > > I'll retry with current -git, but the patches that have gone in
      > > since I last updated don't look related.
      > >
      > > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at
      > > 000001ff
      > > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118
      > > [62060.043009] *pde = 00000000
      > > [62060.043009] Oops: 0002 [#1] PREEMPT
      
      Vegard Nossum analyzed it:
      
      > This decodes to
      >
      >    0:   0f ae 00                fxsave (%eax)
      >
      > so it's related to the floating-point context. This is the exact
      > location of the crash:
      >
      > $ addr2line -e arch/x86/kernel/process_32.o -i ab0
      > include/asm/i387.h:232
      > include/asm/i387.h:262
      > arch/x86/kernel/process_32.c:595
      >
      > ...so it looks like prev_task->thread.xstate->fxsave has become NULL.
      > Or maybe it never had any other value.
      
      Somehow (as described below) TS_USEDFPU is set but the fpu is not
      allocated or freed.
      
      Another possible FPU pre-emption issue with the sleazy FPU optimization
      which was benign before but not so anymore, with the dynamic FPU allocation
      patch.
      
      New task is getting exec'd and it is prempted at the below point.
      
      flush_thread() {
      	...
      	/*
      	* Forget coprocessor state..
      	*/
      	clear_fpu(tsk);
      		<----- Preemption point
      	clear_used_math();
      	...
      }
      
      Now when it context switches in again, as the used_math() is still set
      and fpu_counter can be > 5, we will do a math_state_restore() which sets
      the task's TS_USEDFPU. After it continues from the above preemption point
      it does clear_used_math() and much later free_thread_xstate().
      
      Now, at the next context switch, it is quite possible that xstate is
      null, used_math() is not set and TS_USEDFPU is still set. This will
      trigger unlazy_fpu() causing kernel oops.
      
      Fix this  by clearing tsk's fpu_counter before clearing task's fpu.
      Reported-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      75118a82
    • Jeremy Fitzhardinge's avatar
      x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits. · ad524d46
      Jeremy Fitzhardinge authored
      When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
      potentially have the same number of physical address bits as the
      64-bit host ("Enhanced Legacy PAE Paging").  This means, in theory,
      we could have up to 52 bits of physical address in a pte.
      
      The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
      This means that it can only represent physical addresses up to 32+12=44
      bits wide.  Rather than widening pfns everywhere, just set 2^44 as the
      Linux x86_32-PAE architectural limit for physical address size.
      
      This is a bugfix for two cases:
      1. running a 32-bit PAE kernel on a machine with
        more than 64GB RAM.
      2. running a 32-bit PAE Xen guest on a host machine with
        more than 64GB RAM
      
      In both cases, a pte could need to have more than 36 bits of physical,
      and masking it to 36-bits will cause fairly severe havoc.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ad524d46
    • Dave Airlie's avatar
      agp: brown paper bag patch - put back two lines that got lost · 9bedbcb2
      Dave Airlie authored
      Commit 62c96b9d ("agp/intel: cleanup
      some serious whitespace badness") didn't just fix whitespace.  It also
      lost two lines.
      
      Noticed by Linus. No more whitespace diffs for me.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9bedbcb2
    • Linus Torvalds's avatar
      Merge branch 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6 · 3506ba7b
      Linus Torvalds authored
      * 'agp-patches' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/agp-2.6:
        agp/intel: cleanup some serious whitespace badness
        [AGP] intel_agp: Add support for Intel 4 series chipsets
        [AGP] intel_agp: extra stolen mem size available for IGD_GM chipset
        agp: more boolean conversions.
        drivers/char/agp - use bool
        agp: two-stage page destruction issue
        agp/via: fixup pci ids
      3506ba7b
    • Dave Airlie's avatar
      62c96b9d
    • Zhenyu Wang's avatar
    • Zhenyu Wang's avatar
      [AGP] intel_agp: extra stolen mem size available for IGD_GM chipset · 598d1448
      Zhenyu Wang authored
      This adds missing stolen memory size detect for IGD_GM, be sure to
      detect right size as current X intel driver (2.3.2) which has already
      worked out.
      Signed-off-by: default avatarZhenyu Wang <zhenyu.z.wang@intel.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      598d1448
    • Dave Airlie's avatar
      agp: more boolean conversions. · 9516b030
      Dave Airlie authored
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      9516b030
    • Joe Perches's avatar
      drivers/char/agp - use bool · c7258012
      Joe Perches authored
      Use boolean in AGP instead of having own TRUE/FALSE
      
      --
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      c7258012
  3. 18 Jun, 2008 27 commits