1. 08 Apr, 2014 10 commits
    • Jens Axboe's avatar
      lib/percpu_counter.c: fix bad percpu counter state during suspend · e39435ce
      Jens Axboe authored
      I got a bug report yesterday from Laszlo Ersek in which he states that
      his kvm instance fails to suspend.  Laszlo bisected it down to this
      commit 1cf7e9c6 ("virtio_blk: blk-mq support") where virtio-blk is
      converted to use the blk-mq infrastructure.
      
      After digging a bit, it became clear that the issue was with the queue
      drain.  blk-mq tracks queue usage in a percpu counter, which is
      incremented on request alloc and decremented when the request is freed.
      The initial hunt was for an inconsistency in blk-mq, but everything
      seemed fine.  In fact, the counter only returned crazy values when
      suspend was in progress.
      
      When a CPU is unplugged, the percpu counters merges that CPU state with
      the general state.  blk-mq takes care to register a hotcpu notifier with
      the appropriate priority, so we know it runs after the percpu counter
      notifier.  However, the percpu counter notifier only merges the state
      when the CPU is fully gone.  This leaves a state transition where the
      CPU going away is no longer in the online mask, yet it still holds
      private values.  This means that in this state, percpu_counter_sum()
      returns invalid results, and the suspend then hangs waiting for
      abs(dead-cpu-value) requests to complete which of course will never
      happen.
      
      Fix this by clearing the state earlier, so we never have a case where
      the CPU isn't in online mask but still holds private state.  This bug
      has been there since forever, I guess we don't have a lot of users where
      percpu counters needs to be reliable during the suspend cycle.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Reported-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Tested-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e39435ce
    • Sasha Levin's avatar
      autofs4: check dev ioctl size before allocating · e53d77eb
      Sasha Levin authored
      There wasn't any check of the size passed from userspace before trying
      to allocate the memory required.
      
      This meant that userspace might request more space than allowed,
      triggering an OOM.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarIan Kent <raven@themaw.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e53d77eb
    • Johannes Weiner's avatar
      mm: vmscan: do not swap anon pages just because free+file is low · 0bf1457f
      Johannes Weiner authored
      Page reclaim force-scans / swaps anonymous pages when file cache drops
      below the high watermark of a zone in order to prevent what little cache
      remains from thrashing.
      
      However, on bigger machines the high watermark value can be quite large
      and when the workload is dominated by a static anonymous/shmem set, the
      file set might just be a small window of used-once cache.  In such
      situations, the VM starts swapping heavily when instead it should be
      recycling the no longer used cache.
      
      This is a longer-standing problem, but it's more likely to trigger after
      commit 81c0a2bb ("mm: page_alloc: fair zone allocator policy")
      because file pages can no longer accumulate in a single zone and are
      dispersed into smaller fractions among the available zones.
      
      To resolve this, do not force scan anon when file pages are low but
      instead rely on the scan/rotation ratios to make the right prediction.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRafael Aquini <aquini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: <stable@kernel.org>		[3.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0bf1457f
    • Linus Torvalds's avatar
      Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux · e9f37d3a
      Linus Torvalds authored
      Pull drm updates from Dave Airlie:
       "Highlights:
      
         - drm:
      
           Generic display port aux features, primary plane support, drm
           master management fixes, logging cleanups, enforced locking checks
           (instead of docs), documentation improvements, minor number
           handling cleanup, pseudofs for shared inodes.
      
         - ttm:
      
           add ability to allocate from both ends
      
         - i915:
      
           broadwell features, power domain and runtime pm, per-process
           address space infrastructure (not enabled)
      
         - msm:
      
           power management, hdmi audio support
      
         - nouveau:
      
           ongoing GPU fault recovery, initial maxwell support, random fixes
      
         - exynos:
      
           refactored driver to clean up a lot of abstraction, DP support
           moved into drm, LVDS bridge support added, parallel panel support
      
         - gma500:
      
           SGX MMU support, SGX irq handling, asle irq work fixes
      
         - radeon:
      
           video engine bringup, ring handling fixes, use dp aux helpers
      
         - vmwgfx:
      
           add rendernode support"
      
      * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (849 commits)
        DRM: armada: fix corruption while loading cursors
        drm/dp_helper: don't return EPROTO for defers (v2)
        drm/bridge: export ptn3460_init function
        drm/exynos: remove MODULE_DEVICE_TABLE definitions
        ARM: dts: exynos4412-trats2: enable exynos/fimd node
        ARM: dts: exynos4210-trats: enable exynos/fimd node
        ARM: dts: exynos4412-trats2: add panel node
        ARM: dts: exynos4210-trats: add panel node
        ARM: dts: exynos4: add MIPI DSI Master node
        drm/panel: add S6E8AA0 driver
        ARM: dts: exynos4210-universal_c210: add proper panel node
        drm/panel: add ld9040 driver
        panel/ld9040: add DT bindings
        panel/s6e8aa0: add DT bindings
        drm/exynos: add DSIM driver
        exynos/dsim: add DT bindings
        drm/exynos: disallow fbdev initialization if no device is connected
        drm/mipi_dsi: create dsi devices only for nodes with reg property
        drm/mipi_dsi: add flags to DSI messages
        Skip intel_crt_init for Dell XPS 8700
        ...
      e9f37d3a
    • Heiko Carstens's avatar
    • Linus Torvalds's avatar
      Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs · a7963eb7
      Linus Torvalds authored
      Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
       "various cleanups for ext2, ext3, udf, isofs, a documentation update
        for quota, and a fix of a race in reiserfs readdir implementation"
      
      * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
        reiserfs: fix race in readdir
        ext2: acl: remove unneeded include of linux/capability.h
        ext3: explicitly remove inode from orphan list after failed direct io
        fs/isofs/inode.c add __init to init_inodecache()
        ext3: Speedup WB_SYNC_ALL pass
        fs/quota/Kconfig: Update filesystems
        ext3: Update outdated comment before ext3_ordered_writepage()
        ext3: Update PF_MEMALLOC handling in ext3_write_inode()
        ext2/3: use prandom_u32() instead of get_random_bytes()
        ext3: remove an unneeded check in ext3_new_blocks()
        ext3: remove unneeded check in ext3_ordered_writepage()
        fs: Mark function as static in ext3/xattr_security.c
        fs: Mark function as static in ext3/dir.c
        fs: Mark function as static in ext2/xattr_security.c
        ext3: Add __init macro to init_inodecache
        ext2: Add __init macro to init_inodecache
        udf: Add __init macro to init_inodecache
        fs: udf: parse_options: blocksize check
      a7963eb7
    • Linus Torvalds's avatar
      Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · b003d770
      Linus Torvalds authored
      Pull kbuild changes from Michal Marek:
       - cleanups in the main Makefiles and Documentation/DocBook/Makefile
       - make O=...  directory is automatically created if needed
       - mrproper/distclean removes the old include/linux/version.h to make
         life easier when bisecting across the commit that moved the version.h
         file
      
      * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kbuild: docbook: fix the include error when executing "make help"
        kbuild: create a build directory automatically for out-of-tree build
        kbuild: remove redundant '.*.cmd' pattern from make distclean
        kbuild: move "quote" to Kbuild.include to be consistent
        kbuild: docbook: use $(obj) and $(src) rather than specific path
        kbuild: unconditionally clobber include/linux/version.h on distclean
        kbuild: docbook: specify KERNELDOC dependency correctly
        kbuild: docbook: include cmd files more simply
        kbuild: specify build_docproc as a phony target
      b003d770
    • Linus Torvalds's avatar
      Merge tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 3573d386
      Linus Torvalds authored
      Pull ARC changes from Vineet Gupta:
       - Support for external initrd from Noam
       - Fix broken serial console in nsimosci Virtual Platform
       - Reuse of ENTRY/END assembler macros across hand asm code
       - Other minor fixes here and there
      
      * tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: [nsimosci] Unbork console
        ARC: [nsimosci] Change .dts to use generic 8250 UART
        ARC: [SMP] General Fixes
        ARC: Remove unused DT template file
        ARC: [clockevent] simplify timer ISR
        ARC: [clockevent] can't be SoC specific
        ARC: Remove ARC_HAS_COH_RTSC
        ARC: switch to generic ENTRY/END assembler annotations
        ARC: support external initrd
        ARC: add uImage to .gitignore
        ARC: [arcfpga] Fix __initconst data const-correctness
      3573d386
    • Russell King's avatar
      DRM: armada: fix corruption while loading cursors · c39b0695
      Russell King authored
      Loading cursors to the LCD controller's SRAM can be corrupted when the
      configured pixel clock is relatively slow.  This seems to be caused
      when we write back-to-back to the SRAM registers.
      
      There doesn't appear to be any status register we can read to check
      when an access has completed.
      
      Inserting a dummy read between the writes appears to fix the problem.
      
      Cc: <stable@vger.kernel.org> # 3.13
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      c39b0695
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · c8d9762a
      Linus Torvalds authored
      Pull Xen build fix from David Vrabel:
       "Fix arm build of drivers/xen/events/
      
        The merge of irq-core-for-linus branch broke it"
      
      * tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        Xen: do hv callback accounting only on x86
      c8d9762a
  2. 07 Apr, 2014 30 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (incoming from Andrew) · 26c12d93
      Linus Torvalds authored
      Merge second patch-bomb from Andrew Morton:
       - the rest of MM
       - zram updates
       - zswap updates
       - exit
       - procfs
       - exec
       - wait
       - crash dump
       - lib/idr
       - rapidio
       - adfs, affs, bfs, ufs
       - cris
       - Kconfig things
       - initramfs
       - small amount of IPC material
       - percpu enhancements
       - early ioremap support
       - various other misc things
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
        MAINTAINERS: update Intel C600 SAS driver maintainers
        fs/ufs: remove unused ufs_super_block_third pointer
        fs/ufs: remove unused ufs_super_block_second pointer
        fs/ufs: remove unused ufs_super_block_first pointer
        fs/ufs/super.c: add __init to init_inodecache()
        doc/kernel-parameters.txt: add early_ioremap_debug
        arm64: add early_ioremap support
        arm64: initialize pgprot info earlier in boot
        x86: use generic early_ioremap
        mm: create generic early_ioremap() support
        x86/mm: sparse warning fix for early_memremap
        lglock: map to spinlock when !CONFIG_SMP
        percpu: add preemption checks to __this_cpu ops
        vmstat: use raw_cpu_ops to avoid false positives on preemption checks
        slub: use raw_cpu_inc for incrementing statistics
        net: replace __this_cpu_inc in route.c with raw_cpu_inc
        modules: use raw_cpu_write for initialization of per cpu refcount.
        mm: use raw_cpu ops for determining current NUMA node
        percpu: add raw_cpu_ops
        slub: fix leak of 'name' in sysfs_slab_add
        ...
      26c12d93
    • Lukasz Dorau's avatar
      MAINTAINERS: update Intel C600 SAS driver maintainers · fdc5813f
      Lukasz Dorau authored
      Signed-off-by: default avatarLukasz Dorau <lukasz.dorau@intel.com>
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarMaciej Patelczyk <maciej.patelczyk@intel.com>
      Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fdc5813f
    • Christian Engelmayer's avatar
      fs/ufs: remove unused ufs_super_block_third pointer · fe4487d1
      Christian Engelmayer authored
      Pointer 'usb3' to struct ufs_super_block_third acquired via
      ubh_get_usb_third() is never used in function
      ufs_read_cylinder_structures().  Thus remove it.
      
      Detected by Coverity: CID 139939.
      Signed-off-by: default avatarChristian Engelmayer <cengelma@gmx.at>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe4487d1
    • Christian Engelmayer's avatar
      fs/ufs: remove unused ufs_super_block_second pointer · 48968a11
      Christian Engelmayer authored
      Pointer 'usb2' to struct ufs_super_block_second acquired via
      ubh_get_usb_second() is never used in function ufs_statfs().  Thus
      remove it.
      
      Detected by Coverity: CID 139940.
      Signed-off-by: default avatarChristian Engelmayer <cengelma@gmx.at>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      48968a11
    • Christian Engelmayer's avatar
      fs/ufs: remove unused ufs_super_block_first pointer · 6e0bd34c
      Christian Engelmayer authored
      Remove occurences of unused pointers to struct ufs_super_block_first
      that were acquired via ubh_get_usb_first().
      
      Detected by Coverity: CID 139929 - CID 139936, CID 139940.
      Signed-off-by: default avatarChristian Engelmayer <cengelma@gmx.at>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6e0bd34c
    • Fabian Frederick's avatar
      fs/ufs/super.c: add __init to init_inodecache() · 76ee4735
      Fabian Frederick authored
      init_inodecache is only called by __init init_ufs_fs.
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76ee4735
    • Mark Salter's avatar
      doc/kernel-parameters.txt: add early_ioremap_debug · 56aeeba8
      Mark Salter authored
      Add description of early_ioremap_debug kernel parameter.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      56aeeba8
    • Mark Salter's avatar
      arm64: add early_ioremap support · bf4b558e
      Mark Salter authored
      Add support for early IO or memory mappings which are needed before the
      normal ioremap() is usable.  This also adds fixmap support for permanent
      fixed mappings such as that used by the earlyprintk device register
      region.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bf4b558e
    • Mark Salter's avatar
      arm64: initialize pgprot info earlier in boot · 0bf757c7
      Mark Salter authored
      Presently, paging_init() calls init_mem_pgprot() to initialize pgprot
      values used by macros such as PAGE_KERNEL, PAGE_KERNEL_EXEC, etc.
      
      The new fixmap and early_ioremap support also needs to use these macros
      before paging_init() is called.  This patch moves the init_mem_pgprot()
      call out of paging_init() and into setup_arch() so that pgprot_default
      gets initialized in time for fixmap and early_ioremap.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0bf757c7
    • Mark Salter's avatar
      x86: use generic early_ioremap · 5b7c73e0
      Mark Salter authored
      Move x86 over to the generic early ioremap implementation.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5b7c73e0
    • Mark Salter's avatar
      mm: create generic early_ioremap() support · 9e5c33d7
      Mark Salter authored
      This patch creates a generic implementation of early_ioremap() support
      based on the existing x86 implementation.  early_ioremp() is useful for
      early boot code which needs to temporarily map I/O or memory regions
      before normal mapping functions such as ioremap() are available.
      
      Some architectures have optional MMU.  In the no-MMU case, the remap
      functions simply return the passed in physical address and the unmap
      functions do nothing.
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9e5c33d7
    • Dave Young's avatar
      x86/mm: sparse warning fix for early_memremap · 6b550f6f
      Dave Young authored
      This patch series takes the common bits from the x86 early ioremap
      implementation and creates a generic implementation which may be used by
      other architectures.  The early ioremap interfaces are intended for
      situations where boot code needs to make temporary virtual mappings
      before the normal ioremap interfaces are available.  Typically, this
      means before paging_init() has run.
      
      This patch (of 6):
      
      There's a lot of sparse warnings for code like below: void *a =
      early_memremap(phys_addr, size);
      
      early_memremap intend to map kernel memory with ioremap facility, the
      return pointer should be a kernel ram pointer instead of iomem one.
      
      For making the function clearer and supressing sparse warnings this patch
      do below two things:
      1. cast to (__force void *) for the return value of early_memremap
      2. add early_memunmap function and pass (__force void __iomem *) to iounmap
      
      From Boris:
        "Ingo told me yesterday, it makes sense too.  I'd guess we can try it.
         FWIW, all callers of early_memremap use the memory they get remapped
         as normal memory so we should be safe"
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Signed-off-by: default avatarMark Salter <msalter@redhat.com>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6b550f6f
    • Josh Triplett's avatar
      lglock: map to spinlock when !CONFIG_SMP · 64b47e8f
      Josh Triplett authored
      When the system has only one CPU, lglock is effectively a spinlock; map
      it directly to spinlock to eliminate the indirection and duplicate code.
      
      In addition to removing overhead, this drops 1.6k of code with a
      defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config.
      Signed-off-by: default avatarJosh Triplett <josh@joshtriplett.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      64b47e8f
    • Christoph Lameter's avatar
      percpu: add preemption checks to __this_cpu ops · 188a8140
      Christoph Lameter authored
      We define a check function in order to avoid trouble with the include
      files.  Then the higher level __this_cpu macros are modified to invoke
      the preemption check.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Tested-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      188a8140
    • Christoph Lameter's avatar
      vmstat: use raw_cpu_ops to avoid false positives on preemption checks · 293b6a4c
      Christoph Lameter authored
      vm counters are allowed to be racy.  Use raw_cpu_ops to avoid the
      local_irq_disable overhead and to avoid preemption checks which will be
      added to the __this_cpu operations.
      
      [akpm@linux-foundation.org: Add comment.  Again.]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Reported-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      293b6a4c
    • Christoph Lameter's avatar
      slub: use raw_cpu_inc for incrementing statistics · 88da03a6
      Christoph Lameter authored
      Statistics are not critical to the operation of the allocation but
      should also not cause too much overhead.
      
      When __this_cpu_inc is altered to check if preemption is disabled this
      triggers.  Use raw_cpu_inc to avoid the checks.  Using this_cpu_ops may
      cause interrupt disable/enable sequences on various arches which may
      significantly impact allocator performance.
      
      [akpm@linux-foundation.org: add comment]
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      88da03a6
    • Christoph Lameter's avatar
      net: replace __this_cpu_inc in route.c with raw_cpu_inc · 3ed66e91
      Christoph Lameter authored
      The RT_CACHE_STAT_INC macro triggers the new preemption checks
      for __this_cpu ops.
      
      I do not see any other synchronization that would allow the use of a
      __this_cpu operation here however in commit dbd2915c ("[IPV4]:
      RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
      raw_smp_processor_id() here because "we do not care" about races.  In
      the past we agreed that the price of disabling interrupts here to get
      consistent counters would be too high.  These counters may be inaccurate
      due to race conditions.
      
      The use of __this_cpu op improves the situation already from what commit
      dbd2915c did since the single instruction emitted on x86 does not
      allow the race to occur anymore.  However, non x86 platforms could still
      experience a race here.
      
      Trace:
      
        __this_cpu_add operation in preemptible [00000000] code: avahi-daemon/1193
        caller is __this_cpu_preempt_check+0x38/0x60
        CPU: 1 PID: 1193 Comm: avahi-daemon Tainted: GF            3.12.0-rc4+ #187
        Call Trace:
          check_preemption_disabled+0xec/0x110
          __this_cpu_preempt_check+0x38/0x60
          __ip_route_output_key+0x575/0x8c0
          ip_route_output_flow+0x27/0x70
          udp_sendmsg+0x825/0xa20
          inet_sendmsg+0x85/0xc0
          sock_sendmsg+0x9c/0xd0
          ___sys_sendmsg+0x37c/0x390
          __sys_sendmsg+0x49/0x90
          SyS_sendmsg+0x12/0x20
          tracesys+0xe1/0xe6
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3ed66e91
    • Christoph Lameter's avatar
      modules: use raw_cpu_write for initialization of per cpu refcount. · 08f141d3
      Christoph Lameter authored
      The initialization of a structure is not subject to synchronization.
      The use of __this_cpu would trigger a false positive with the additional
      preemption checks for __this_cpu ops.
      
      So simply disable the check through the use of raw_cpu ops.
      
      Trace:
      
        __this_cpu_write operation in preemptible [00000000] code: modprobe/286
        caller is __this_cpu_preempt_check+0x38/0x60
        CPU: 3 PID: 286 Comm: modprobe Tainted: GF            3.12.0-rc4+ #187
        Call Trace:
          dump_stack+0x4e/0x82
          check_preemption_disabled+0xec/0x110
          __this_cpu_preempt_check+0x38/0x60
          load_module+0xcfd/0x2650
          SyS_init_module+0xa6/0xd0
          tracesys+0xe1/0xe6
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08f141d3
    • Christoph Lameter's avatar
      mm: use raw_cpu ops for determining current NUMA node · dc322a99
      Christoph Lameter authored
      With the preempt checking logic for __this_cpu_ops we will get false
      positives from locations in the code that use numa_node_id.
      
      Before the __this_cpu ops where introduced there were no checks for
      preemption present either.  smp_raw_processor_id() was used.  See
      
        http://www.spinics.net/lists/linux-numa/msg00641.html
      
      Therefore we need to use raw_cpu_read here to avoid false postives.
      
      Note that this issue has been discussed in prior years.  If the process
      changes nodes after retrieving the current numa node then that is
      acceptable since most uses of numa_node etc are for optimization and not
      for correctness.
      
      There were suggestions to implement a raw_numa_node_id in order to do
      preempt checks for numa_node_id as well.  But I think we better defer
      that to another patch since that would mean investigating how
      numa_node_id() is used throughout the kernel which would increase the
      scope of this patchset significantly.  After all preemption was never
      checked before when numa_node_id() was used.
      
      Some sample traces:
      
      __this_cpu_read operation in preemptible [00000000] code: login/1456
      caller is __this_cpu_preempt_check+0x2b/0x2d
      CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3b-dirty #185
      Call Trace:
        dump_stack+0x4e/0x82
        check_preemption_disabled+0xc5/0xe0
        __this_cpu_preempt_check+0x2b/0x2d
        get_task_policy+0x1d/0x49
        get_vma_policy+0x14/0x76
        alloc_pages_vma+0x35/0xff
        handle_mm_fault+0x290/0x73b
        __do_page_fault+0x3fe/0x44d
        do_page_fault+0x9/0xc
        page_fault+0x22/0x30
        generic_file_aio_read+0x38e/0x624
        do_sync_read+0x54/0x73
        vfs_read+0x9d/0x12a
        SyS_read+0x47/0x7e
        cstar_dispatch+0x7/0x23
      
      caller is __this_cpu_preempt_check+0x2b/0x2d
      CPU: 0 PID: 1456 Comm: login Not tainted 3.12.0-rc4-cl-00062-g2fe80d3b-dirty #185
      Call Trace:
        dump_stack+0x4e/0x82
        check_preemption_disabled+0xc5/0xe0
        __this_cpu_preempt_check+0x2b/0x2d
        alloc_pages_current+0x8f/0xbc
        __page_cache_alloc+0xb/0xd
        __do_page_cache_readahead+0xf4/0x219
        ra_submit+0x1c/0x20
        ondemand_readahead+0x28c/0x2b4
        page_cache_sync_readahead+0x38/0x3a
        generic_file_aio_read+0x261/0x624
        do_sync_read+0x54/0x73
        vfs_read+0x9d/0x12a
        SyS_read+0x47/0x7e
        cstar_dispatch+0x7/0x23
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Alex Shi <alex.shi@intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc322a99
    • Christoph Lameter's avatar
      percpu: add raw_cpu_ops · b3ca1c10
      Christoph Lameter authored
      The kernel has never been audited to ensure that this_cpu operations are
      consistently used throughout the kernel.  The code generated in many
      places can be improved through the use of this_cpu operations (which
      uses a segment register for relocation of per cpu offsets instead of
      performing address calculations).
      
      The patch set also addresses various consistency issues in general with
      the per cpu macros.
      
      A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
         because checks are skipped. This is typically shown through a raw_
         prefix. So this patch set changes the places where __this_cpu_ptr()
         is used to raw_cpu_ptr().
      
      B. There has been the long term wish by some that __this_cpu operations
         would check for preemption. However, there are cases where preemption
         checks need to be skipped. This patch set adds raw_cpu operations that
         do not check for preemption and then adds preemption checks to the
         __this_cpu operations.
      
      C. The use of __get_cpu_var is always a reference to a percpu variable
         that can also be handled via a this_cpu operation. This patch set
         replaces all uses of __get_cpu_var with this_cpu operations.
      
      D. We can then use this_cpu RMW operations in various places replacing
         sequences of instructions by a single one.
      
      E. The use of this_cpu operations throughout will allow other arches than
         x86 to implement optimized references and RMV operations to work with
         per cpu local data.
      
      F. The use of this_cpu operations opens up the possibility to
         further optimize code that relies on synchronization through
         per cpu data.
      
      The patch set works in a couple of stages:
      
      I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
          Also converts the existing __this_cpu_xx_# primitive in the x86
          code to raw_cpu_xx_#.
      
      II. Patch 2-4 use the raw_cpu operations in places that would give
           us false positives once they are enabled.
      
      III. Patch 5 adds preemption checks to __this_cpu operations to allow
          checking if preemption is properly disabled when these functions
          are used.
      
      IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
         with this_cpu_ptr. They do not depend on any changes to the percpu
         code. No preemption tests are skipped if they are applied.
      
      V. Patches 21-46 are conversion patches that use this_cpu operations
         in various kernel subsystems/drivers or arch code.
      
      VI.  Patches 47/48 (not included in this series) remove no longer used
          functions (__this_cpu_ptr and __get_cpu_var).  These should only be
          applied after all the conversion patches have made it and after we
          have done additional passes through the kernel to ensure that none of
          the uses of these functions remain.
      
      This patch (of 46):
      
      The patches following this one will add preemption checks to __this_cpu
      ops so we need to have an alternative way to use this_cpu operations
      without preemption checks.
      
      raw_cpu_ops will be the basis for all other ops since these will be the
      operations that do not implement any checks.
      
      Primitive operations are renamed by this patch from __this_cpu_xxx to
      raw_cpu_xxxx.
      
      Also change the uses of the x86 percpu primitives in preempt.h.
      These depend directly on asm/percpu.h (header #include nesting issue).
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Alex Shi <alex.shi@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bryan Wu <cooloney@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: David Daney <david.daney@cavium.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Hedi Berriche <hedi@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wim Van Sebroeck <wim@iguana.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b3ca1c10
    • Dave Jones's avatar
      slub: fix leak of 'name' in sysfs_slab_add · 54b6a731
      Dave Jones authored
      The failure paths of sysfs_slab_add don't release the allocation of
      'name' made by create_unique_id() a few lines above the context of the
      diff below.  Create a common exit path to make it more obvious what
      needs freeing.
      
      [vdavydov@parallels.com: free the name only if !unmergeable]
      Signed-off-by: default avatarDave Jones <davej@fedoraproject.org>
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      54b6a731
    • Vladimir Davydov's avatar
      slub: rework sysfs layout for memcg caches · 9a41707b
      Vladimir Davydov authored
      Currently, we try to arrange sysfs entries for memcg caches in the same
      manner as for global caches.  Apart from turning /sys/kernel/slab into a
      mess when there are a lot of kmem-active memcgs created, it actually
      does not work properly - we won't create more than one link to a memcg
      cache in case its parent is merged with another cache.  For instance, if
      A is a root cache merged with another root cache B, we will have the
      following sysfs setup:
      
        X
        A -> X
        B -> X
      
      where X is some unique id (see create_unique_id()).  Now if memcgs M and
      N start to allocate from cache A (or B, which is the same), we will get:
      
        X
        X:M
        X:N
        A -> X
        B -> X
        A:M -> X:M
        A:N -> X:N
      
      Since B is an alias for A, we won't get entries B:M and B:N, which is
      confusing.
      
      It is more logical to have entries for memcg caches under the
      corresponding root cache's sysfs directory.  This would allow us to keep
      sysfs layout clean, and avoid such inconsistencies like one described
      above.
      
      This patch does the trick.  It creates a "cgroup" kset in each root
      cache kobject to keep its children caches there.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a41707b
    • Vladimir Davydov's avatar
      slub: adjust memcg caches when creating cache alias · 84d0ddd6
      Vladimir Davydov authored
      Otherwise, kzalloc() called from a memcg won't clear the whole object.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      84d0ddd6
    • Vladimir Davydov's avatar
      memcg, slab: do not destroy children caches if parent has aliases · b8529907
      Vladimir Davydov authored
      Currently we destroy children caches at the very beginning of
      kmem_cache_destroy().  This is wrong, because the root cache will not
      necessarily be destroyed in the end - if it has aliases (refcount > 0),
      kmem_cache_destroy() will simply decrement its refcount and return.  In
      this case, at best we will get a bunch of warnings in dmesg, like this
      one:
      
        kmem_cache_destroy kmalloc-32:0: Slab cache still has objects
        CPU: 1 PID: 7139 Comm: modprobe Tainted: G    B   W    3.13.0+ #117
        Call Trace:
          dump_stack+0x49/0x5b
          kmem_cache_destroy+0xdf/0xf0
          kmem_cache_destroy_memcg_children+0x97/0xc0
          kmem_cache_destroy+0xf/0xf0
          xfs_mru_cache_uninit+0x21/0x30 [xfs]
          exit_xfs_fs+0x2e/0xc44 [xfs]
          SyS_delete_module+0x198/0x1f0
          system_call_fastpath+0x16/0x1b
      
      At worst - if kmem_cache_destroy() will race with an allocation from a
      memcg cache - the kernel will panic.
      
      This patch fixes this by moving children caches destruction after the
      check if the cache has aliases.  Plus, it forbids destroying a root
      cache if it still has children caches, because each children cache keeps
      a reference to its parent.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b8529907
    • Vladimir Davydov's avatar
      memcg, slab: unregister cache from memcg before starting to destroy it · 051dd460
      Vladimir Davydov authored
      Currently, memcg_unregister_cache(), which deletes the cache being
      destroyed from the memcg_slab_caches list, is called after
      __kmem_cache_shutdown() (see kmem_cache_destroy()), which starts to
      destroy the cache.
      
      As a result, one can access a partially destroyed cache while traversing
      a memcg_slab_caches list, which can have deadly consequences (for
      instance, cache_show() called for each cache on a memcg_slab_caches list
      from mem_cgroup_slabinfo_read() will dereference pointers to already
      freed data).
      
      To fix this, let's move memcg_unregister_cache() before the cache
      destruction process beginning, issuing memcg_register_cache() on failure.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      051dd460
    • Vladimir Davydov's avatar
      memcg, slab: separate memcg vs root cache creation paths · 794b1248
      Vladimir Davydov authored
      Memcg-awareness turned kmem_cache_create() into a dirty interweaving of
      memcg-only and except-for-memcg calls.  To clean this up, let's move the
      code responsible for memcg cache creation to a separate function.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      794b1248
    • Vladimir Davydov's avatar
      memcg, slab: cleanup memcg cache creation · 5722d094
      Vladimir Davydov authored
      This patch cleans up the memcg cache creation path as follows:
      
      - Move memcg cache name creation to a separate function to be called
        from kmem_cache_create_memcg().  This allows us to get rid of the mutex
        protecting the temporary buffer used for the name formatting, because
        the whole cache creation path is protected by the slab_mutex.
      
      - Get rid of memcg_create_kmem_cache().  This function serves as a proxy
        to kmem_cache_create_memcg().  After separating the cache name creation
        path, it would be reduced to a function call, so let's inline it.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5722d094
    • Vladimir Davydov's avatar
      memcg, slab: never try to merge memcg caches · a44cb944
      Vladimir Davydov authored
      When a kmem cache is created (kmem_cache_create_memcg()), we first try to
      find a compatible cache that already exists and can handle requests from
      the new cache, i.e.  has the same object size, alignment, ctor, etc.  If
      there is such a cache, we do not create any new caches, instead we simply
      increment the refcount of the cache found and return it.
      
      Currently we do this procedure not only when creating root caches, but
      also for memcg caches.  However, there is no point in that, because, as
      every memcg cache has exactly the same parameters as its parent and cache
      merging cannot be turned off in runtime (only on boot by passing
      "slub_nomerge"), the root caches of any two potentially mergeable memcg
      caches should be merged already, i.e.  it must be the same root cache, and
      therefore we couldn't even get to the memcg cache creation, because it
      already exists.
      
      The only exception is boot caches - they are explicitly forbidden to be
      merged by setting their refcount to -1.  There are currently only two of
      them - kmem_cache and kmem_cache_node, which are used in slab internals (I
      do not count kmalloc caches as their refcount is set to 1 immediately
      after creation).  Since they are prevented from merging preliminary I
      guess we should avoid to merge their children too.
      
      So let's remove the useless code responsible for merging memcg caches.
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a44cb944
    • David Howells's avatar
      asm/system.h: um: arch_align_stack() moved to asm/exec.h · cf7bc58f
      David Howells authored
      arch_align_stack() moved to asm/exec.h, so change the comment referring to
      asm/system.h which no longer exists.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cf7bc58f
    • David Howells's avatar
      asm/system.h: clean asm/system.h from docs · 95663285
      David Howells authored
      Clean asm/system.h from docs as nothing should refer to that header anymore.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95663285