1. 17 Jul, 2007 23 commits
    • Andy Whitcroft's avatar
      Lumpy Reclaim V4 · 5ad333eb
      Andy Whitcroft authored
      When we are out of memory of a suitable size we enter reclaim.  The current
      reclaim algorithm targets pages in LRU order, which is great for fairness at
      order-0 but highly unsuitable if you desire pages at higher orders.  To get
      pages of higher order we must shoot down a very high proportion of memory;
      >95% in a lot of cases.
      
      This patch set adds a lumpy reclaim algorithm to the allocator.  It targets
      groups of pages at the specified order anchored at the end of the active and
      inactive lists.  This encourages groups of pages at the requested orders to
      move from active to inactive, and active to free lists.  This behaviour is
      only triggered out of direct reclaim when higher order pages have been
      requested.
      
      This patch set is particularly effective when utilised with an
      anti-fragmentation scheme which groups pages of similar reclaimability
      together.
      
      This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch which forms
      the foundation.  Credit to Mel Gorman for sanitity checking.
      
      Mel said:
      
        The patches have an application with hugepage pool resizing.
      
        When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can
        be resized with greater reliability.  Testing on a desktop machine with 2GB
        of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own
        was very slow as the success rate was quite low.  Without lumpy-reclaim,
        each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages.
        With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical.
      
      [akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup]
      [bunk@stusta.de: static declarations for internal functions]
      [a.p.zijlstra@chello.nl: initial lumpy V2 implementation]
      Signed-off-by: default avatarAndy Whitcroft <apw@shadowen.org>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Bob Picco <bob.picco@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ad333eb
    • Mel Gorman's avatar
      Add a movablecore= parameter for sizing ZONE_MOVABLE · 7e63efef
      Mel Gorman authored
      This patch adds a new parameter for sizing ZONE_MOVABLE called
      movablecore=.  While kernelcore= is used to specify the minimum amount of
      memory that must be available for all allocation types, movablecore= is
      used to specify the minimum amount of memory that is used for migratable
      allocations.  The amount of memory used for migratable allocations
      determines how large the huge page pool could be dynamically resized to at
      runtime for example.
      
      How movablecore is actually handled is that the total number of pages in
      the system is calculated and a value is set for kernelcore that is
      
      kernelcore == totalpages - movablecore
      
      Both kernelcore= and movablecore= can be safely specified at the same time.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Acked-by: default avatarAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7e63efef
    • Mel Gorman's avatar
      handle kernelcore=: generic · ed7ed365
      Mel Gorman authored
      This patch adds the kernelcore= parameter for x86.
      
      Once all patches are applied, a new command-line parameter exist and a new
      sysctl.  This patch adds the necessary documentation.
      
      From: Yasunori Goto <y-goto@jp.fujitsu.com>
      
        When "kernelcore" boot option is specified, kernel can't boot up on ia64
        because of an infinite loop.  In addition, the parsing code can be handled
        in an architecture-independent manner.
      
        This patch uses common code to handle the kernelcore= parameter.  It is
        only available to architectures that support arch-independent zone-sizing
        (i.e.  define CONFIG_ARCH_POPULATES_NODE_MAP).  Other architectures will
        ignore the boot parameter.
      
      [bunk@stusta.de: make cmdline_parse_kernelcore() static]
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarYasunori Goto <y-goto@jp.fujitsu.com>
      Acked-by: default avatarAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed7ed365
    • Mel Gorman's avatar
      Allow huge page allocations to use GFP_HIGH_MOVABLE · 396faf03
      Mel Gorman authored
      Huge pages are not movable so are not allocated from ZONE_MOVABLE.  However,
      as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it
      can be used to satisfy hugepage allocations even when the system has been
      running a long time.  This allows an administrator to resize the hugepage pool
      at runtime depending on the size of ZONE_MOVABLE.
      
      This patch adds a new sysctl called hugepages_treat_as_movable.  When a
      non-zero value is written to it, future allocations for the huge page pool
      will use ZONE_MOVABLE.  Despite huge pages being non-movable, we do not
      introduce additional external fragmentation of note as huge pages are always
      the largest contiguous block we care about.
      
      [akpm@linux-foundation.org: various fixes]
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      396faf03
    • Mel Gorman's avatar
      Create the ZONE_MOVABLE zone · 2a1e274a
      Mel Gorman authored
      The following 8 patches against 2.6.20-mm2 create a zone called ZONE_MOVABLE
      that is only usable by allocations that specify both __GFP_HIGHMEM and
      __GFP_MOVABLE.  This has the effect of keeping all non-movable pages within a
      single memory partition while allowing movable allocations to be satisfied
      from either partition.  The patches may be applied with the list-based
      anti-fragmentation patches that groups pages together based on mobility.
      
      The size of the zone is determined by a kernelcore= parameter specified at
      boot-time.  This specifies how much memory is usable by non-movable
      allocations and the remainder is used for ZONE_MOVABLE.  Any range of pages
      within ZONE_MOVABLE can be released by migrating the pages or by reclaiming.
      
      When selecting a zone to take pages from for ZONE_MOVABLE, there are two
      things to consider.  First, only memory from the highest populated zone is
      used for ZONE_MOVABLE.  On the x86, this is probably going to be ZONE_HIGHMEM
      but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64.  Second,
      the amount of memory usable by the kernel will be spread evenly throughout
      NUMA nodes where possible.  If the nodes are not of equal size, the amount of
      memory usable by the kernel on some nodes may be greater than others.
      
      By default, the zone is not as useful for hugetlb allocations because they are
      pinned and non-migratable (currently at least).  A sysctl is provided that
      allows huge pages to be allocated from that zone.  This means that the huge
      page pool can be resized to the size of ZONE_MOVABLE during the lifetime of
      the system assuming that pages are not mlocked.  Despite huge pages being
      non-movable, we do not introduce additional external fragmentation of note as
      huge pages are always the largest contiguous block we care about.
      
      Credit goes to Andy Whitcroft for catching a large variety of problems during
      review of the patches.
      
      This patch creates an additional zone, ZONE_MOVABLE.  This zone is only usable
      by allocations which specify both __GFP_HIGHMEM and __GFP_MOVABLE.  Hot-added
      memory continues to be placed in their existing destination as there is no
      mechanism to redirect them to a specific zone.
      
      [y-goto@jp.fujitsu.com: Fix section mismatch of memory hotplug related code]
      [akpm@linux-foundation.org: various fixes]
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a1e274a
    • Mel Gorman's avatar
      Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated · 769848c0
      Mel Gorman authored
      It is often known at allocation time whether a page may be migrated or not.
      This patch adds a flag called __GFP_MOVABLE and a new mask called
      GFP_HIGH_MOVABLE.  Allocations using the __GFP_MOVABLE can be either migrated
      using the page migration mechanism or reclaimed by syncing with backing
      storage and discarding.
      
      An API function very similar to alloc_zeroed_user_highpage() is added for
      __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable().  The
      flags used by alloc_zeroed_user_highpage() are not changed because it would
      change the semantics of an existing API.  After this patch is applied there
      are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
      be marked deprecated if this patch is merged.
      
      Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
      shmem.c to keep all flag modifications to inode->mapping in the
      shmem_dir_alloc() helper function.  This clean-up suggestion is courtesy of
      Hugh Dickens.
      
      Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
      concept.  Credit to Hugh Dickens for catching issues with shmem swap vector
      and ramfs allocations.
      
      [akpm@linux-foundation.org: build fix]
      [hugh@veritas.com: __GFP_ZERO cleanup]
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      769848c0
    • NeilBrown's avatar
      Fix read/truncate race · a32ea1e1
      NeilBrown authored
      do_generic_mapping_read currently samples the i_size at the start and doesn't
      do so again unless it needs to call ->readpage to load a page.  After
      ->readpage it has to re-sample i_size as a truncate may have caused that page
      to be filled with zeros, and the read() call should not see these.
      
      However there are other activities that might cause ->readpage to be called on
      a page between the time that do_generic_mapping_read samples i_size and when
      it finds that it has an uptodate page.  These include at least read-ahead and
      possibly another thread performing a read.
      
      So do_generic_mapping_read must sample i_size *after* it has an uptodate page.
       Thus the current sampling at the start and after a read can be replaced with
      a sampling before the copy-out.
      
      The same change applied to __generic_file_splice_read.
      
      Note that this fixes any race with truncate_complete_page, but does not fix a
      possible race with truncate_partial_page.  If a partial truncate happens after
      do_generic_mapping_read samples i_size and before the copy_out, the nuls that
      truncate_partial_page place in the page could be copied out incorrectly.
      
      I think the best fix for that is to *not* zero out parts of the page in
      truncate_partial_page, but rather to zero out the tail of a page when
      increasing i_size.
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Acked-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a32ea1e1
    • Martin Schwidefsky's avatar
      mm: remove ptep_test_and_clear_dirty and ptep_clear_flush_dirty · e21ea246
      Martin Schwidefsky authored
      Nobody is using ptep_test_and_clear_dirty and ptep_clear_flush_dirty.  Remove
      the functions from all architectures.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e21ea246
    • Martin Schwidefsky's avatar
      mm: remove ptep_establish() · f0e47c22
      Martin Schwidefsky authored
      The last user of ptep_establish in mm/ is long gone.  Remove the architecture
      primitive as well.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0e47c22
    • Yoann Padioleau's avatar
      parse error, drivers/i2c/busses/i2c-pmcmsp.c · 5ee403f5
      Yoann Padioleau authored
      Signed-off-by: default avatarYoann Padioleau <padator@wanadoo.fr>
      Cc: Jean Delvare <khali@linux-fr.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ee403f5
    • Linus Torvalds's avatar
      Merge branch 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · ae6f4a8b
      Linus Torvalds authored
      * 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
        drm: add idr_init to drm_stub.c
        drm: fix problem with SiS typedef with sisfb enabled.
      ae6f4a8b
    • Dave Airlie's avatar
      drm: add idr_init to drm_stub.c · 45ea5dcd
      Dave Airlie authored
      Brown paper bag for me this patch chunk didn't make it in the first application
      Signed-off-by: default avatarDave Airlie <airlied@linux.ie>
      45ea5dcd
    • Dave Airlie's avatar
      drm: fix problem with SiS typedef with sisfb enabled. · cca5307d
      Dave Airlie authored
      Reported by: Avuton Olrich <avuton@gmail.com>
      Signed-off-by: default avatarDave Airlie <airlied@linux.ie>
      cca5307d
    • Linus Torvalds's avatar
      Merge branch 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · a5fcaa21
      Linus Torvalds authored
      * 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
        drm: convert drawable code to using idr
        drm: convert drm context code to use Linux idr
      a5fcaa21
    • Dave Airlie's avatar
      drm: convert drawable code to using idr · d4e2cbe9
      Dave Airlie authored
      This converts the code for allocating drawables to the Linux idr,
      
      Fixes from: Michel Dänzer <michel@tungstengraphics.com>, Kristian Høgsberg <krh@redhat.com>
      Signed-off-by: default avatarDave Airlie <airlied@linux.ie>
      d4e2cbe9
    • Dave Airlie's avatar
      drm: convert drm context code to use Linux idr · 62968144
      Dave Airlie authored
      This converts the drm context allocator to an idr, using the new idr
      interface features from Kristian.
      
      Fixes from Kristian Hoegsberg <krh@redhat.com>
      Signed-off-by: default avatarDave Airlie <airlied@linux.ie>
      62968144
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc · 489de302
      Linus Torvalds authored
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (209 commits)
        [POWERPC] Create add_rtc() function to enable the RTC CMOS driver
        [POWERPC] Add H_ILLAN_ATTRIBUTES hcall number
        [POWERPC] xilinxfb: Parameterize xilinxfb platform device registration
        [POWERPC] Oprofile support for Power 5++
        [POWERPC] Enable arbitary speed tty ioctls and split input/output speed
        [POWERPC] Make drivers/char/hvc_console.c:khvcd() static
        [POWERPC] Remove dead code for preventing pread() and pwrite() calls
        [POWERPC] Remove unnecessary #undef printk from prom.c
        [POWERPC] Fix typo in Ebony default DTS
        [POWERPC] Check for NULL ppc_md.init_IRQ() before calling
        [POWERPC] Remove extra return statement
        [POWERPC] pasemi: Don't auto-select CONFIG_EMBEDDED
        [POWERPC] pasemi: Rename platform
        [POWERPC] arch/powerpc/kernel/sysfs.c: Move NUMA exports
        [POWERPC] Add __read_mostly support for powerpc
        [POWERPC] Modify sched_clock() to make CONFIG_PRINTK_TIME more sane
        [POWERPC] Create a dummy zImage if no valid platform has been selected
        [POWERPC] PS3: Bootwrapper support.
        [POWERPC] powermac i2c: Use mutex
        [POWERPC] Schedule removal of arch/ppc
        ...
      
      Fixed up conflicts manually in:
      
      	Documentation/feature-removal-schedule.txt
      	arch/powerpc/kernel/pci_32.c
      	arch/powerpc/kernel/pci_64.c
      	include/asm-powerpc/pci.h
      
      and asked the powerpc people to double-check the result..
      489de302
    • Linus Torvalds's avatar
      Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 · 1f1c2881
      Linus Torvalds authored
      * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (37 commits)
        forcedeth bug fix: realtek phy
        forcedeth bug fix: vitesse phy
        forcedeth bug fix: cicada phy
        atl1: reorder atl1_main functions
        atl1: fix excessively indented code
        atl1: cleanup atl1_main
        atl1: header file cleanup
        atl1: remove irq_sem
        cdc-subset to support new vendor/product ID
        8139cp: implement the missing dev->tx_timeout
        myri10ge: Remove nonsensical limit in the tx done routine
        gianfar: kill unused header
        EP93XX_ETH must select MII
        macb: Add multicast capability
        macb: Use generic PHY layer
        s390: add barriers to qeth driver
        s390: scatter-gather for inbound traffic in qeth driver
        eHEA: Introducing support vor DLPAR memory add
        Fix a potential NULL pointer dereference in free_shared_mem() in drivers/net/s2io.c
        [PATCH] softmac: Fix ESSID problem
        ...
      1f1c2881
    • Linus Torvalds's avatar
      Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 · 7608a864
      Linus Torvalds authored
      * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
        [SERIAL] SUNHV: Fix jerky console on LDOM guests.
        [SPARC64]: Fix race between MD update and dr-cpu add.
        [SPARC64]: SMP build fix.
      7608a864
    • David Miller's avatar
      [HRTIMER] Fix cpu pointer arg to clockevents_notify() · 7713a7d1
      David Miller authored
      All of the clockevent notifiers expect a pointer to
      an "unsigned int" cpu argument, but hrtimer_cpu_notify()
      passes in a pointer to a long.
      
      [ Discussed with and ok by Thomas Gleixner ]
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7713a7d1
    • David S. Miller's avatar
      [SERIAL] SUNHV: Fix jerky console on LDOM guests. · f798634d
      David S. Miller authored
      Mixing putchar() and write() hvcalls does not work %100
      correctly.  But we should be using write() all the time
      if we can, even from ->start_tx(), anyways.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f798634d
    • David S. Miller's avatar
      [SPARC64]: Fix race between MD update and dr-cpu add. · 778feeb4
      David S. Miller authored
      We need to make sure the MD update occurs before we try to
      process dr-cpu configure requests.  MD update and dr-cpu
      were being processed by seperate threads so that did not
      happen occaisionally.
      
      Fix this by executing all domain services data packets from
      a single thread, in order.
      
      This will help simplify some other things as well.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      778feeb4
    • Fabio Massimo Di Nitto's avatar
      [SPARC64]: SMP build fix. · 3ac66e33
      Fabio Massimo Di Nitto authored
      The UP build fix had some unintended consequences.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ac66e33
  2. 16 Jul, 2007 17 commits