1. 03 Apr, 2015 1 commit
    • Borislav Petkov's avatar
      x86/mm/KASLR: Propagate KASLR status to kernel proper · 78cac48c
      Borislav Petkov authored
      Commit:
      
        e2b32e67 ("x86, kaslr: randomize module base load address")
      
      made module base address randomization unconditional and didn't regard
      disabled KKASLR due to CONFIG_HIBERNATION and command line option
      "nokaslr". For more info see (now reverted) commit:
      
        f47233c2
      
       ("x86/mm/ASLR: Propagate base load address calculation")
      
      In order to propagate KASLR status to kernel proper, we need a single bit
      in boot_params.hdr.loadflags and we've chosen bit 1 thus leaving the
      top-down allocated bits for bits supposed to be used by the bootloader.
      
      Originally-From: Jiri Kosina <jkosina@suse.cz>
      Suggested-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      78cac48c
  2. 16 Mar, 2015 1 commit
    • Borislav Petkov's avatar
      Revert "x86/mm/ASLR: Propagate base load address calculation" · 69797daf
      Borislav Petkov authored
      This reverts commit:
      
        f47233c2
      
       ("x86/mm/ASLR: Propagate base load address calculation")
      
      The main reason for the revert is that the new boot flag does not work
      at all currently, and in order to make this work, we need non-trivial
      changes to the x86 boot code which we didn't manage to get done in
      time for merging.
      
      And even if we did, they would've been too risky so instead of
      rushing things and break booting 4.1 on boxes left and right, we
      will be very strict and conservative and will take our time with
      this to fix and test it properly.
      Reported-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Junjie Mao <eternal.n08@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.o...
      69797daf
  3. 24 Feb, 2015 1 commit
  4. 19 Feb, 2015 1 commit
    • Jiri Kosina's avatar
      x86/mm/ASLR: Propagate base load address calculation · f47233c2
      Jiri Kosina authored
      Commit:
      
        e2b32e67
      
       ("x86, kaslr: randomize module base load address")
      
      makes the base address for module to be unconditionally randomized in
      case when CONFIG_RANDOMIZE_BASE is defined and "nokaslr" option isn't
      present on the commandline.
      
      This is not consistent with how choose_kernel_location() decides whether
      it will randomize kernel load base.
      
      Namely, CONFIG_HIBERNATION disables kASLR (unless "kaslr" option is
      explicitly specified on kernel commandline), which makes the state space
      larger than what module loader is looking at. IOW CONFIG_HIBERNATION &&
      CONFIG_RANDOMIZE_BASE is a valid config option, kASLR wouldn't be applied
      by default in that case, but module loader is not aware of that.
      
      Instead of fixing the logic in module.c, this patch takes more generic
      aproach. It introduces a new bootparam setup data_type SETUP_KASLR and
      uses that to pass the information whether kaslr has been applied during
      kernel decompression, and sets a global 'kaslr_enabled' variable
      accordingly, so that any kernel code (module loading, livepatching, ...)
      can make decisions based on its value.
      
      x86 module loader is converted to make use of this flag.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Link: https://lkml.kernel.org/r/alpine.LNX.2.00.1502101411280.10719@pobox.suse.cz
      
      
      [ Always dump correct kaslr status when panicking ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      f47233c2
  5. 14 Feb, 2015 1 commit
    • Andrey Ryabinin's avatar
      x86_64: add KASan support · ef7f0d6a
      Andrey Ryabinin authored
      
      This patch adds arch specific code for kernel address sanitizer.
      
      16TB of virtual addressed used for shadow memory.  It's located in range
      [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
      stacks.
      
      At early stage we map whole shadow region with zero page.  Latter, after
      pages mapped to direct mapping address range we unmap zero pages from
      corresponding shadow (see kasan_map_shadow()) and allocate and map a real
      shadow memory reusing vmemmap_populate() function.
      
      Also replace __pa with __pa_nodebug before shadow initialized.  __pa with
      CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
      __phys_addr is instrumented, so __asan_load could be called before shadow
      area initialized.
      Signed-off-by: default avatarAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: default avatarAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: ...
      ef7f0d6a
  6. 04 Feb, 2015 1 commit
  7. 23 Jan, 2015 1 commit
  8. 17 Nov, 2014 1 commit
    • Dave Hansen's avatar
      x86, mpx: On-demand kernel allocation of bounds tables · fe3d197f
      Dave Hansen authored
      This is really the meat of the MPX patch set.  If there is one patch to
      review in the entire series, this is the one.  There is a new ABI here
      and this kernel code also interacts with userspace memory in a
      relatively unusual manner.  (small FAQ below).
      
      Long Description:
      
      This patch adds two prctl() commands to provide enable or disable the
      management of bounds tables in kernel, including on-demand kernel
      allocation (See the patch "on-demand kernel allocation of bounds tables")
      and cleanup (See the patch "cleanup unused bound tables"). Applications
      do not strictly need the kernel to manage bounds tables and we expect
      some applications to use MPX without taking advantage of this kernel
      support. This means the kernel can not simply infer whether an application
      needs bounds table management from the MPX registers.  The prctl() is an
      explicit signal from userspace.
      
      PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
      require kernel's help i...
      fe3d197f
  9. 03 Nov, 2014 1 commit
  10. 28 Oct, 2014 1 commit
  11. 08 Oct, 2014 1 commit
    • Bryan O'Donoghue's avatar
      x86: Quark: Comment setup_arch() to document TLB/PGE bug · 2075244f
      Bryan O'Donoghue authored
      
      Quark SoC X1000 advertises Page Global Enable for it's
      Translation Lookaside Buffer via cpuid. The silicon does not
      in fact support PGE and hence will not flush the TLB when CR4.PGE
      is rewritten. The Quark documentation makes clear the necessity to
      instead rewrite CR3 in order to flush any TLB entries, irrespective
      of the state of CR4.PGE or an individual PTE.PGE
      
      See Intel Quark Core DevMan_001.pdf section 6.4.11
      
      In setup.c setup_arch() the code will load_cr3() and then do a
      __flush_tlb_all().
      
      On Quark the entire TLB will be flushed at the load_cr3().
      The __flush_tlb_all() have no effect and can be safely ignored.
      
      Later on in the boot process we switch off the flag for cpu_has_pge()
      which means that subsequent calls to __flush_tlb_all() will
      call __flush_tlb() not __flush_tlb_global() flushing the TLB in the
      correct way via load_cr3() not CR4.PGE rewrite
      
      This patch documents the behaviour of flushing the TLB for Quark in
      setup_arch()
      
      Comment text suggested by Thomas Gleixner
      Signed-off-by: default avatarBryan O'Donoghue <pure.logic@nexus-software.ie>
      Cc: davej@redhat.com
      Cc: hmh@hmh.eng.br
      Link: http://lkml.kernel.org/r/1412641189-12415-2-git-send-email-pure.logic@nexus-software.ie
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      2075244f
  12. 18 Jul, 2014 1 commit
  13. 04 Jun, 2014 1 commit
    • Akinobu Mita's avatar
      cma: add placement specifier for "cma=" kernel parameter · 5ea3b1b2
      Akinobu Mita authored
      
      Currently, "cma=" kernel parameter is used to specify the size of CMA,
      but we can't specify where it is located.  We want to locate CMA below
      4GB for devices only supporting 32-bit addressing on 64-bit systems
      without iommu.
      
      This enables to specify the placement of CMA by extending "cma=" kernel
      parameter.
      
      Examples:
       1. locate 64MB CMA below 4GB by "cma=64M@0-4G"
       2. locate 64MB CMA exact at 512MB by "cma=64M@512M"
      
      Note that the DMA contiguous memory allocator on x86 assumes that
      page_address() works for the pages to allocate.  So this change requires
      to limit end address of contiguous memory area upto max_pfn_mapped to
      prevent from locating it on highmem area by the argument of
      dma_contiguous_reserve().
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ea3b1b2
  14. 04 Mar, 2014 3 commits
    • Borislav Petkov's avatar
      x86/efi: Quirk out SGI UV · a5d90c92
      Borislav Petkov authored
      
      Alex reported hitting the following BUG after the EFI 1:1 virtual
      mapping work was merged,
      
       kernel BUG at arch/x86/mm/init_64.c:351!
       invalid opcode: 0000 [#1] SMP
       Call Trace:
        [<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
        [<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
        [<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
        [<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
        [<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
        [<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
        [<ffffffff8153d955>] ? printk+0x72/0x74
        [<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
        [<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
        [<ffffffff81535530>] ? rest_init+0x74/0x74
        [<ffffffff81535539>] kernel_init+0x9/0xff
        [<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
        [<ffffffff81535530>] ? rest_init+0x74/0x74
      
      Getting this thing to work with the new mapping scheme would need more
      work, so automatically switch to the old memmap layout for SGI UV.
      Acked-by: default avatarRuss Anderson <rja@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      a5d90c92
    • Matt Fleming's avatar
      x86/efi: Wire up CONFIG_EFI_MIXED · 7d453eee
      Matt Fleming authored
      
      Add the Kconfig option and bump the kernel header version so that boot
      loaders can check whether the handover code is available if they want.
      
      The xloadflags field in the bzImage header is also updated to reflect
      that the kernel supports both entry points by setting both of
      XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
      XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
      guaranteed to be addressable with 32-bits.
      
      Note that no boot loaders should be using the bits set in xloadflags to
      decide which entry point to jump to. The entire scheme is based on the
      concept that 32-bit bootloaders always jump to ->handover_offset and
      64-bit loaders always jump to ->handover_offset + 512. We set both bits
      merely to inform the boot loader that it's safe to use the native
      handover offset even if the machine type in the PE/COFF header claims
      otherwise.
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      7d453eee
    • Matt Fleming's avatar
      efi: Move facility flags to struct efi · 3e909599
      Matt Fleming authored
      
      As we grow support for more EFI architectures they're going to want the
      ability to query which EFI features are available on the running system.
      Instead of storing this information in an architecture-specific place,
      stick it in the global 'struct efi', which is already the central
      location for EFI state.
      
      While we're at it, let's change the return value of efi_enabled() to be
      bool and replace all references to 'facility' with 'feature', which is
      the usual word used to describe the attributes of the running system.
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      3e909599
  15. 27 Feb, 2014 1 commit
  16. 28 Jan, 2014 1 commit
  17. 22 Jan, 2014 1 commit
    • Santosh Shilimkar's avatar
      x86: memblock: set current limit to max low memory address · 5b6e5295
      Santosh Shilimkar authored
      
      The memblock current limit value is used to limit early boot memory
      allocations below max low memory address by default, as the kernel can
      access only to the low memory.
      
      Hence, set memblock current limit value to the max mapped low memory
      address instead of max mapped memory address.
      Signed-off-by: default avatarSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b6e5295
  18. 13 Jan, 2014 1 commit
  19. 29 Dec, 2013 2 commits
    • Dave Young's avatar
      x86: Reserve setup_data ranges late after parsing memmap cmdline · 77ea8c94
      Dave Young authored
      
      Currently e820_reserve_setup_data() is called before parsing early
      params, it works in normal case. But for memmap=exactmap, the final
      memory ranges are created after parsing memmap= cmdline params, so the
      previous e820_reserve_setup_data() has no effect. For example,
      setup_data ranges will still be marked as normal system ram, thus when
      later sysfs driver ioremap them kernel will warn about mapping normal
      ram.
      
      This patch fix it by moving the e820_reserve_setup_data() callback after
      parsing early params so they can be set as reserved ranges and later
      ioremap will be fine with it.
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarToshi Kani <toshi.kani@hp.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      77ea8c94
    • Dave Young's avatar
      x86/efi: Pass necessary EFI data for kexec via setup_data · 1fec0533
      Dave Young authored
      
      Add a new setup_data type SETUP_EFI for kexec use.  Passing the saved
      fw_vendor, runtime, config tables and EFI runtime mappings.
      
      When entering virtual mode, directly mapping the EFI runtime regions
      which we passed in previously. And skip the step to call
      SetVirtualAddressMap().
      
      Specially for HP z420 workstation we need save the smbios physical
      address.  The kernel boot sequence proceeds in the following order.
      Step 2 requires efi.smbios to be the physical address.  However, I found
      that on HP z420 EFI system table has a virtual address of SMBIOS in step
      1.  Hence, we need set it back to the physical address with the smbios
      in efi_setup_data.  (When it is still the physical address, it simply
      sets the same value.)
      
      1. efi_init() - Set efi.smbios from EFI system table
      2. dmi_scan_machine() - Temporary map efi.smbios to access SMBIOS table
      3. efi_enter_virtual_mode() - Map EFI ranges
      
      Tested on ovmf+qemu, lenovo thinkpad, a dell laptop and an
      HP z420 workstation.
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Tested-by: default avatarToshi Kani <toshi.kani@hp.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      1fec0533
  20. 13 Nov, 2013 1 commit
    • Tang Chen's avatar
      x86, acpi, crash, kdump: do reserve_crashkernel() after SRAT is parsed. · fa591c4a
      Tang Chen authored
      
      Memory reserved for crashkernel could be large.  So we should not allocate
      this memory bottom up from the end of kernel image.
      
      When SRAT is parsed, we will be able to know which memory is hotpluggable,
      and we can avoid allocating this memory for the kernel.  So reorder
      reserve_crashkernel() after SRAT is parsed.
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fa591c4a
  21. 23 Oct, 2013 1 commit
  22. 13 Oct, 2013 1 commit
  23. 14 Aug, 2013 2 commits
  24. 06 Aug, 2013 1 commit
  25. 14 Jul, 2013 1 commit
    • Paul Gortmaker's avatar
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker authored
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  26. 03 Jul, 2013 1 commit
  27. 01 May, 2013 1 commit
    • Tejun Heo's avatar
      dump_stack: implement arch-specific hardware description in task dumps · 98e5e1bf
      Tejun Heo authored
      
      x86 and ia64 can acquire extra hardware identification information
      from DMI and print it along with task dumps; however, the usage isn't
      consistent.
      
      * x86 show_regs() collects vendor, product and board strings and print
        them out with PID, comm and utsname.  Some of the information is
        printed again later in the same dump.
      
      * warn_slowpath_common() explicitly accesses the DMI board and prints
        it out with "Hardware name:" label.  This applies to both x86 and
        ia64 but is irrelevant on all other archs.
      
      * ia64 doesn't show DMI information on other non-WARN dumps.
      
      This patch introduces arch-specific hardware description used by
      dump_stack().  It can be set by calling dump_stack_set_arch_desc()
      during boot and, if exists, printed out in a separate line with
      "Hardware name:" label.
      
      dmi_set_dump_stack_arch_desc() is added which sets arch-specific
      description from DMI data.  It uses dmi_ids_string[] which is set from
      dmi_present() used for DMI debug message.  It is superset of the
      information x86 show_regs() is using.  The function is called from x86
      and ia64 boot code right after dmi_scan_machine().
      
      This makes the explicit DMI handling in warn_slowpath_common()
      unnecessary.  Removed.
      
      show_regs() isn't yet converted to use generic debug information
      printing and this patch doesn't remove the duplicate DMI handling in
      x86 show_regs().  The next patch will unify show_regs() handling and
      remove the duplication.
      
      An example WARN dump follows.
      
       WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
       Modules linked in:
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #3
       Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
        0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
        ffffffff8108f500 ffffffff82228240 0000000000000040 ffffffff8234a08e
        0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
       Call Trace:
        [<ffffffff81c614dc>] dump_stack+0x19/0x1b
        [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
        [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff8234a0c3>] init_workqueues+0x35/0x505
        ...
      
      v2: Use the same string as the debug message from dmi_present() which
          also contains BIOS information.  Move hardware name into its own
          line as warn_slowpath_common() did.  This change was suggested by
          Bjorn Helgaas.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      98e5e1bf
  28. 25 Apr, 2013 1 commit
  29. 17 Apr, 2013 3 commits
  30. 02 Apr, 2013 1 commit
  31. 07 Mar, 2013 1 commit
  32. 02 Mar, 2013 1 commit
    • Yinghai Lu's avatar
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu authored
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751
      
       ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Reported-by: default avatarDon Morris <don.morris@hp.com>
      Bisected-by: default avatarDon Morris <don.morris@hp.com>
      Tested-by: default avatarDon Morris <don.morris@hp.com>
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  33. 24 Feb, 2013 1 commit
    • Tang Chen's avatar
      acpi, memory-hotplug: parse SRAT before memblock is ready · e8d19552
      Tang Chen authored
      
      On linux, the pages used by kernel could not be migrated.  As a result,
      if a memory range is used by kernel, it cannot be hot-removed.  So if we
      want to hot-remove memory, we should prevent kernel from using it.
      
      The way now used to prevent this is specify a memory range by
      movablemem_map boot option and set it as ZONE_MOVABLE.
      
      But when the system is booting, memblock will allocate memory, and
      reserve the memory for kernel.  And before we parse SRAT, and know the
      node memory ranges, memblock is working.  And it may allocate memory in
      ranges to be set as ZONE_MOVABLE.  This memory can be used by kernel,
      and never be freed.
      
      So, let's parse SRAT before memblock is called first.  And it is early
      enough.
      
      The first call of memblock_find_in_range_node() is in:
      
        setup_arch()
          |-->setup_real_mode()
      
      so, this patch add a function early_parse_srat() to parse SRAT, and call
      it before setup_real_mode() is called.
      
      NOTE:
      
      1) early_parse_srat() is called before numa_init(), and has initialized
         numa_meminfo.  So DO NOT clear numa_nodes_parsed in numa_init() and DO
         NOT zero numa_meminfo in numa_init(), otherwise we will lose memory
         numa info.
      
      2) I don't know why using count of memory affinities parsed from SRAT
         as a return value in original acpi_numa_init().  So I add a static
         variable srat_mem_cnt to remember this count and use it as the return
         value of the new acpi_numa_init()
      
      [mhocko@suse.cz: parse SRAT before memblock is ready fix]
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e8d19552
  34. 14 Feb, 2013 1 commit