1. 23 Jun, 2006 40 commits
    • James Morris's avatar
      [PATCH] lsm: add task_setioprio hook · 03e68060
      James Morris authored
      Implement an LSM hook for setting a task's IO priority, similar to the hook
      for setting a tasks's nice value.
      
      A previous version of this LSM hook was included in an older version of
      multiadm by Jan Engelhardt, although I don't recall it being submitted
      upstream.
      
      Also included is the corresponding SELinux hook, which re-uses the setsched
      permission in the proccess class.
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      Acked-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      03e68060
    • Christoph Lameter's avatar
      [PATCH] move_pages: fix 32 -> 64 bit compat function · 9216dfad
      Christoph Lameter authored
      The definition of the third parameter is a pointer to an array of virtual
      addresses which give us some trouble.  The existing code calculated the
      wrong address in the array since I used void to avoid having to specify a
      type.
      
      I now use the correct type "compat_uptr_t __user *" in the definition of
      the function in kernel/compat.c.
      
      However, I used __u32 in syscalls.h.  Would have to include compat.h there
      in order to provide the same definition which would generate an ugly
      include situation.
      
      On both ia64 and x86_64 compat_uptr_t is u32. So this works although
      parameter declarations differ.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9216dfad
    • Christoph Lameter's avatar
      [PATCH] sys_move_pages: 32bit support (i386, x86_64) · 1b2db9fb
      Christoph Lameter authored
      sys_move_pages() support for 32bit (i386 plus x86_64 compat layer)
      
      Add support for move_pages() on i386 and also add the compat functions
      necessary to run 32 bit binaries on x86_64.
      
      Add compat_sys_move_pages to the x86_64 32bit binary layer.  Note that it is
      not up to date so I added the missing pieces.  Not sure if this is done the
      right way.
      
      [akpm@osdl.org: compile fix]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1b2db9fb
    • Christoph Lameter's avatar
      [PATCH] sys_move_pages: x86_64 support · b63d64a3
      Christoph Lameter authored
      sys_move_pages support for x86_64
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b63d64a3
    • Christoph Lameter's avatar
      [PATCH] page migration: sys_move_pages(): support moving of individual pages · 742755a1
      Christoph Lameter authored
      move_pages() is used to move individual pages of a process. The function can
      be used to determine the location of pages and to move them onto the desired
      node. move_pages() returns status information for each page.
      
      long move_pages(pid, number_of_pages_to_move,
      		addresses_of_pages[],
      		nodes[] or NULL,
      		status[],
      		flags);
      
      The addresses of pages is an array of void * pointing to the
      pages to be moved.
      
      The nodes array contains the node numbers that the pages should be moved
      to. If a NULL is passed instead of an array then no pages are moved but
      the status array is updated. The status request may be used to determine
      the page state before issuing another move_pages() to move pages.
      
      The status array will contain the state of all individual page migration
      attempts when the function terminates. The status array is only valid if
      move_pages() completed successfullly.
      
      Possible page states in status[]:
      
      0..MAX_NUMNODES	The page is now on the indicated node.
      
      -ENOENT		Page is not present
      
      -EACCES		Page is mapped by multiple processes and can only
      		be moved if MPOL_MF_MOVE_ALL is specified.
      
      -EPERM		The page has been mlocked by a process/driver and
      		cannot be moved.
      
      -EBUSY		Page is busy and cannot be moved. Try again later.
      
      -EFAULT		Invalid address (no VMA or zero page).
      
      -ENOMEM		Unable to allocate memory on target node.
      
      -EIO		Unable to write back page. The page must be written
      		back in order to move it since the page is dirty and the
      		filesystem does not provide a migration function that
      		would allow the moving of dirty pages.
      
      -EINVAL		A dirty page cannot be moved. The filesystem does not provide
      		a migration function and has no ability to write back pages.
      
      The flags parameter indicates what types of pages to move:
      
      MPOL_MF_MOVE	Move pages that are only mapped by the process.
      
      MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
      		Requires sufficient capabilities.
      
      Possible return codes from move_pages()
      
      -ENOENT		No pages found that would require moving. All pages
      		are either already on the target node, not present, had an
      		invalid address or could not be moved because they were
      		mapped by multiple processes.
      
      -EINVAL		Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
      		to migrate pages in a kernel thread.
      
      -EPERM		MPOL_MF_MOVE_ALL specified without sufficient priviledges.
      		or an attempt to move a process belonging to another user.
      
      -EACCES		One of the target nodes is not allowed by the current cpuset.
      
      -ENODEV		One of the target nodes is not online.
      
      -ESRCH		Process does not exist.
      
      -E2BIG		Too many pages to move.
      
      -ENOMEM		Not enough memory to allocate control array.
      
      -EFAULT		Parameters could not be accessed.
      
      A test program for move_pages() may be found with the patches
      on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3
      
      From: Christoph Lameter <clameter@sgi.com>
      
        Detailed results for sys_move_pages()
      
        Pass a pointer to an integer to get_new_page() that may be used to
        indicate where the completion status of a migration operation should be
        placed.  This allows sys_move_pags() to report back exactly what happened to
        each page.
      
        Wish there would be a better way to do this. Looks a bit hacky.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      742755a1
    • Christoph Lameter's avatar
      [PATCH] page migration: use allocator function for migrate_pages() · 95a402c3
      Christoph Lameter authored
      Instead of passing a list of new pages, pass a function to allocate a new
      page.  This allows the correct placement of MPOL_INTERLEAVE pages during page
      migration.  It also further simplifies the callers of migrate pages.
      migrate_pages() becomes similar to migrate_pages_to() so drop
      migrate_pages_to().  The batching of new page allocations becomes unnecessary.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      95a402c3
    • Christoph Lameter's avatar
      [PATCH] page migration: handle freeing of pages in migrate_pages() · aaa994b3
      Christoph Lameter authored
      Do not leave pages on the lists passed to migrate_pages().  Seems that we will
      not need any postprocessing of pages.  This will simplify the handling of
      pages by the callers of migrate_pages().
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      aaa994b3
    • Christoph Lameter's avatar
      [PATCH] page migration: simplify migrate_pages() · e24f0b8f
      Christoph Lameter authored
      Currently migrate_pages() is mess with lots of goto.  Extract two functions
      from migrate_pages() and get rid of the gotos.
      
      Plus we can just unconditionally set the locked bit on the new page since we
      are the only one holding a reference.  Locking is to stop others from
      accessing the page once we establish references to the new page.
      
      Remove the list_del from move_to_lru in order to have finer control over list
      processing.
      
      [akpm@osdl.org: add debug check]
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e24f0b8f
    • Kirill Korotaev's avatar
      [PATCH] printk() should not be called under zone->lock · 8f9de51a
      Kirill Korotaev authored
      This patch fixes printk() under zone->lock in show_free_areas().  It can be
      unsafe to call printk() under this lock, since caller can try to
      allocate/free some memory and selfdeadlock on this lock.  I found
      allocations/freeing mem both in netconsole and serial console.
      
      This issue was faced in reallity when meminfo was periodically printed for
      debug purposes and netconsole was used.
      Signed-off-by: default avatarKirill Korotaev <dev@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8f9de51a
    • Ralf Baechle's avatar
      d501e62b
    • Randy Dunlap's avatar
      [PATCH] kernel-doc for mm/filemap.c · 485bb99b
      Randy Dunlap authored
      mm/filemap.c:
      - add lots of kernel-doc;
      - fix some typos and kernel-doc errors;
      - drop some blank lines between function close and EXPORT_SYMBOL();
      Signed-off-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      485bb99b
    • Paul Drynoff's avatar
      [PATCH] slab: kmalloc, kzalloc comments cleanup and fix · 800590f5
      Paul Drynoff authored
      - Move comments for kmalloc to right place, currently it near __do_kmalloc
      
      - Comments for kzalloc
      
      - More detailed comments for kmalloc
      
      - Appearance of "kmalloc" and "kzalloc" man pages after "make mandocs"
      
      [rdunlap@xenotime.net: simplification]
      Signed-off-by: default avatarPaul Drynoff <pauldrynoff@gmail.com>
      Acked-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      800590f5
    • KAMEZAWA Hiroyuki's avatar
    • Andrew Morton's avatar
      [PATCH] initialise total_memory() earlier · bd1e22b8
      Andrew Morton authored
      Initialise total_memory earlier in boot.  Because if for some reason we run
      page reclaim early in boot, we don't want total_memory to be zero when we use
      it as a divisor.
      
      And rename total_memory to vm_total_pages to avoid naming clashes with
      architectures.
      
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Martin Bligh <mbligh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bd1e22b8
    • Ingo Molnar's avatar
      [PATCH] mm/slab.c: fix early init assumption · e0a42726
      Ingo Molnar authored
      The SLAB bootstrap code assumes that the first two kmalloc caches created
      (the INDEX_AC and INDEX_L3 kmalloc caches) wont be off-slab.  But due to AC
      and L3 structure size increase in lockdep, one of them ended up being
      off-slab, and subsequently crashing with:
      
      Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
       [<ffffffff80267478>] kmem_cache_alloc+0x26/0x7d
      
      The fix is to introduce a bootstrap flag and to use it to prevent off-slab
      caches being created so early during bootup.
      
      (The calculation for off-slab caches is quite complex so i didnt want to
      complicate things with introducing yet another INDEX_ calculation, the flag
      approach is simpler and smaller.)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e0a42726
    • Hugh Dickins's avatar
      [PATCH] fix update_mmu_cache in fremap.c · 668e0d8f
      Hugh Dickins authored
      There are two calls to update_mmu_cache in fremap.c, both defective.
      The one in install_page needs to be accompanied by lazy_mmu_prot_update
      (some other cleanup time, move that into ia64 update_mmu_cache itself); and
      the one in install_file_pte should be removed since the pte is not present.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      668e0d8f
    • Hugh Dickins's avatar
      [PATCH] remove unused o_flags from do_shmat · 185606fc
      Hugh Dickins authored
      Remove the unused variable o_flags from do_shmat.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      185606fc
    • Hugh Dickins's avatar
      [PATCH] swapoff: use atomic_inc_not_zero() on mm_users · 70af7c5c
      Hugh Dickins authored
      Now that we have atomic_inc_not_zero, it's more elegant for try_to_unuse to
      use that on mm_users: doesn't actually matter at present, but safer to be
      sure that once mm_users has gone to 0, nothing raises it for an instant.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      70af7c5c
    • David Howells's avatar
      [PATCH] add page_mkwrite() vm_operations method · 9637a5ef
      David Howells authored
      Add a new VMA operation to notify a filesystem or other driver about the
      MMU generating a fault because userspace attempted to write to a page
      mapped through a read-only PTE.
      
      This facility permits the filesystem or driver to:
      
       (*) Implement storage allocation/reservation on attempted write, and so to
           deal with problems such as ENOSPC more gracefully (perhaps by generating
           SIGBUS).
      
       (*) Delay making the page writable until the contents have been written to a
           backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS.
           It permits the filesystem to have some guarantee about the state of the
           cache.
      
       (*) Account and limit number of dirty pages. This is one piece of the puzzle
           needed to make shared writable mapping work safely in FUSE.
      
      Needed by cachefs (Or is it cachefiles?  Or fscache? <head spins>).
      
      At least four other groups have stated an interest in it or a desire to use
      the functionality it provides: FUSE, OCFS2, NTFS and JFFS2.  Also, things like
      EXT3 really ought to use it to deal with the case of shared-writable mmap
      encountering ENOSPC before we permit the page to be dirtied.
      
      From: Peter Zijlstra <a.p.zijlstra@chello.nl>
      
        get_user_pages(.write=1, .force=1) can generate COW hits on read-only
        shared mappings, this patch traps those as mkpage_write candidates and fails
        to handle them the old way.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Joel Becker <Joel.Becker@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      9637a5ef
    • Con Kolivas's avatar
      [PATCH] mm: fix swap unused warning · bd96b9eb
      Con Kolivas authored
      If CONFIG_SWAP is not defined we get:
      
      mm/vmscan.c: In function ‘remove_mapping’:
      mm/vmscan.c:387: warning: unused variable ‘swap’
      
      Convert defines in swap.h into blank inline functions to fix this warning
      and be consistent.
      Signed-off-by: default avatarCon Kolivas <kernel@kolivas.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      bd96b9eb
    • Andy Whitcroft's avatar
      [PATCH] sparsemem: record nid during memory present · 30c253e6
      Andy Whitcroft authored
      Record the node id as we mark sections for instantiation.  Use this nid
      during instantiation to direct allocations.
      Signed-off-by: default avatarAndy Whitcroft <apw@shadowen.org>
      Cc: Mike Kravetz <kravetz@us.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Bob Picco <bob.picco@hp.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Martin Bligh <mbligh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      30c253e6
    • Pekka Enberg's avatar
      [PATCH] slab: verify pointers before free · ddc2e812
      Pekka Enberg authored
      Passing an invalid pointer to kfree() and kmem_cache_free() is likely to
      cause bad memory corruption or even take down the whole system because the
      bad pointer is likely reused immediately due to the per-CPU caches.  Until
      now, we don't do any verification for this if CONFIG_DEBUG_SLAB is
      disabled.
      
      As suggested by Linus, add PageSlab check to page_to_cache() and
      page_to_slab() to verify pointers passed to kfree().  Also, move the
      stronger check from cache_free_debugcheck() to kmem_cache_free() to ensure
      the passed pointer actually belongs to the cache we're about to free the
      object.
      
      For page_to_cache() and page_to_slab(), the assertions should have
      virtually no extra cost (two instructions, no data cache pressure) and for
      kmem_cache_free() the overhead should be minimal.
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Linus Torvalds <torvalds@osdl.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ddc2e812
    • Christoph Lameter's avatar
      8d3c138b
    • Christoph Lameter's avatar
      [PATCH] More page migration: use migration entries for file pages · 04e62a29
      Christoph Lameter authored
      This implements the use of migration entries to preserve ptes of file backed
      pages during migration.  Processes can therefore be migrated back and forth
      without loosing their connection to pagecache pages.
      
      Note that we implement the migration entries only for linear mappings.
      Nonlinear mappings still require the unmapping of the ptes for migration.
      
      And another writepage() ugliness shows up.  writepage() can drop the page
      lock.  Therefore we have to remove migration ptes before calling writepages()
      in order to avoid having migration entries point to unlocked pages.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      04e62a29
    • Christoph Lameter's avatar
      [PATCH] More page migration: do not inc/dec rss counters · 442c9137
      Christoph Lameter authored
      If we install a migration entry then the rss not really decreases since the
      page is just moved somewhere else.  We can save ourselves the work of
      decrementing and later incrementing which will just eventually cause cacheline
      bouncing.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      442c9137
    • Christoph Lameter's avatar
      [PATCH] Swapless page migration: modify core logic · 6c5240ae
      Christoph Lameter authored
      Use the migration entries for page migration
      
      This modifies the migration code to use the new migration entries.  It now
      becomes possible to migrate anonymous pages without having to add a swap
      entry.
      
      We add a couple of new functions to replace migration entries with the proper
      ptes.
      
      We cannot take the tree_lock for migrating anonymous pages anymore.  However,
      we know that we hold the only remaining reference to the page when the page
      count reaches 1.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      6c5240ae
    • Christoph Lameter's avatar
      [PATCH] Swapless page migration: rip out swap based logic · d75a0fcd
      Christoph Lameter authored
      Rip the page migration logic out.
      
      Remove all code that has to do with swapping during page migration.
      
      This also guts the ability to migrate pages to swap.  No one used that so lets
      let it go for good.
      
      Page migration should be a bit broken after this patch.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d75a0fcd
    • Christoph Lameter's avatar
      [PATCH] Swapless page migration: add R/W migration entries · 0697212a
      Christoph Lameter authored
      Implement read/write migration ptes
      
      We take the upper two swapfiles for the two types of migration ptes and define
      a series of macros in swapops.h.
      
      The VM is modified to handle the migration entries.  migration entries can
      only be encountered when the page they are pointing to is locked.  This limits
      the number of places one has to fix.  We also check in copy_pte_range and in
      mprotect_pte_range() for migration ptes.
      
      We check for migration ptes in do_swap_cache and call a function that will
      then wait on the page lock.  This allows us to effectively stop all accesses
      to apge.
      
      Migration entries are created by try_to_unmap if called for migration and
      removed by local functions in migrate.c
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration (I've no NUMA, just
        hacking it up to migrate recklessly while running load), I've hit the
        BUG_ON(!PageLocked(p)) in migration_entry_to_page.
      
        This comes from an orphaned migration entry, unrelated to the current
        correctly locked migration, but hit by remove_anon_migration_ptes as it
        checks an address in each vma of the anon_vma list.
      
        Such an orphan may be left behind if an earlier migration raced with fork:
        copy_one_pte can duplicate a migration entry from parent to child, after
        remove_anon_migration_ptes has checked the child vma, but before it has
        removed it from the parent vma.  (If the process were later to fault on this
        orphaned entry, it would hit the same BUG from migration_entry_wait.)
      
        This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
        not.  There's no such problem with file pages, because vma_prio_tree_add
        adds child vma after parent vma, and the page table locking at each end is
        enough to serialize.  Follow that example with anon_vma: add new vmas to the
        tail instead of the head.
      
        (There's no corresponding problem when inserting migration entries,
        because a missed pte will leave the page count and mapcount high, which is
        allowed for.  And there's no corresponding problem when migrating via swap,
        because a leftover swap entry will be correctly faulted.  But the swapless
        method has no refcounting of its entries.)
      
      From: Ingo Molnar <mingo@elte.hu>
      
        pte_unmap_unlock() takes the pte pointer as an argument.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration, gcc has tried to exec
        a pointer instead of a string: smells like COW mappings are not being
        properly write-protected on fork.
      
        The protection in copy_one_pte looks very convincing, until at last you
        realize that the second arg to make_migration_entry is a boolean "write",
        and SWP_MIGRATION_READ is 30.
      
        Anyway, it's better done like in change_pte_range, using
        is_write_migration_entry and make_migration_entry_read.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Remove unnecessary obfuscation from sys_swapon's range check on swap type,
        which blew up causing memory corruption once swapless migration made
        MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarHugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarChristoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      From: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      0697212a
    • Christoph Lameter's avatar
      [PATCH] page migration cleanup: move fallback handling into special function · 8351a6e4
      Christoph Lameter authored
      Move the fallback code into a new fallback function and make the function
      behave like any other migration function.  This requires retaking the lock if
      pageout() drops it.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      8351a6e4
    • Christoph Lameter's avatar
      [PATCH] page migration cleanup: pass "mapping" to migration functions · 2d1db3b1
      Christoph Lameter authored
      Change handling of address spaces.
      
      Pass a pointer to the address space in which the page is migrated to all
      migration function.  This avoids repeatedly having to retrieve the address
      space pointer from the page and checking it for validity.  The old page
      mapping will change once migration has gone to a certain step, so it is less
      confusing to have the pointer always available.
      
      Move the setting of the mapping and index for the new page into
      migrate_pages().
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      2d1db3b1
    • Christoph Lameter's avatar
      [PATCH] page migration cleanup: extract try_to_unmap from migration functions · c3fcf8a5
      Christoph Lameter authored
      Extract try_to_unmap and rename remove_references -> move_mapping
      
      try_to_unmap() may significantly change the page state by for example setting
      the dirty bit.  It is therefore best to unmap in migrate_pages() before
      calling any migration functions.
      
      migrate_page_remove_references() will then only move the new page in place of
      the old page in the mapping.  Rename the function to
      migrate_page_move_mapping().
      
      This allows us to get rid of the special unmapping for the fallback path.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c3fcf8a5
    • Christoph Lameter's avatar
      [PATCH] page migration cleanup: drop nr_refs in remove_references() · 5b5c7120
      Christoph Lameter authored
      Drop nr_refs parameter from migrate_page_remove_references()
      
      The nr_refs parameter is not really useful since the number of remaining
      references is always
      
      1 for anonymous pages without a mapping
      2 for pages with a mapping
      3 for pages with a mapping and PagePrivate set.
      
      Remove the early check for the number of references since we are checking
      page_mapcount() earlier.  Ultimately only the refcount matters after the
      tree_lock has been obtained.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.coim>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      5b5c7120
    • Christoph Lameter's avatar
      [PATCH] page migration cleanup: remove useless definitions · e7340f73
      Christoph Lameter authored
      Remove the export for migrate_page_remove_references() and migrate_page_copy()
      that are unlikely to be used directly by filesystems implementing migration.
      The export was useful when buffer_migrate_page() lived in fs/buffer.c but it
      has now been moved to migrate.c in the migration reorg.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      e7340f73
    • Christoph Lameter's avatar
      [PATCH] page migration cleanup: group functions · 1d8b85cc
      Christoph Lameter authored
      Reorder functions in migrate.c.  Group all migration functions for struct
      address_space_operations together.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1d8b85cc
    • Christoph Lameter's avatar
      [PATCH] page migration cleanup: rename "ignrefs" to "migration" · 7352349a
      Christoph Lameter authored
      migrate is a better name since it is only used by page migration.
      Signed-off-by: default avatarChristoph Lameter <clameter@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7352349a
    • OGAWA Hirofumi's avatar
      [PATCH] writeback: fix range handling · 111ebb6e
      OGAWA Hirofumi authored
      When a writeback_control's `start' and `end' fields are used to
      indicate a one-byte-range starting at file offset zero, the required
      values of .start=0,.end=0 mean that the ->writepages() implementation
      has no way of telling that it is being asked to perform a range
      request.  Because we're currently overloading (start == 0 && end == 0)
      to mean "this is not a write-a-range request".
      
      To make all this sane, the patch changes range of writeback_control.
      
      So caller does: If it is calling ->writepages() to write pages, it
      sets range (range_start/end or range_cyclic) always.
      
      And if range_cyclic is true, ->writepages() thinks the range is
      cyclic, otherwise it just uses range_start and range_end.
      
      This patch does,
      
          - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
            -1 is usually ok for range_end (type is long long). But, if someone did,
      
      		range_end += val;		range_end is "val - 1"
      		u64val = range_end >> bits;	u64val is "~(0ULL)"
      
            or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
            things, and uses LLONG_MAX for range_end.
      
          - All callers of ->writepages() sets range_start/end or range_cyclic.
      
          - Fix updates of ->writeback_index. It seems already bit strange.
            If it starts at 0 and ended by check of nr_to_write, this last
            index may reduce chance to scan end of file.  So, this updates
            ->writeback_index only if range_cyclic is true or whole-file is
            scanned.
      Signed-off-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: "Vladimir V. Saveliev" <vs@namesys.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      111ebb6e
    • Peter Zijlstra's avatar
      [PATCH] buglet in radix_tree_tag_set · 4c91c364
      Peter Zijlstra authored
      The comment states: 'Setting a tag on a not-present item is a BUG.' Hence
      if 'index' is larger than the maxindex; the item _cannot_ be presen; it
      should also be a BUG.
      
      Also, this allows the following statement (assume a fresh tree):
      
        radix_tree_tag_set(root, 16, 1);
      
      to fail silently, but when preceded by:
      
        radix_tree_insert(root, 32, item);
      
      it would BUG, because the height has been extended by the insert.
      
      In neither case was 16 present.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4c91c364
    • Pekka Enberg's avatar
      [PATCH] slab: redzone double-free detection · 58ce1fd5
      Pekka Enberg authored
      At present our slab debugging tells us that it detected a double-free or
      corruption - it does not distinguish between them.  Sometimes it's useful
      to be able to differentiate between these two types of information.
      
      Add double-free detection to redzone verification when freeing an object.
      As explained by Manfred, when we are freeing an object, both redzones
      should be RED_ACTIVE.  However, if both are RED_INACTIVE, we are trying to
      free an object that was already free'd.
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: default avatarPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      58ce1fd5
    • Hua Zhong's avatar
      [PATCH] likely cleanup: remove unlikely in sys_mprotect() · b344e05c
      Hua Zhong authored
      With likely/unlikely profiling on my not-so-busy-typical-developmentsystem
      there are 5k misses vs 2k hits.  So I guess we should remove the unlikely.
      Signed-off-by: default avatarHua Zhong <hzhong@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b344e05c
    • Nick Piggin's avatar
      [PATCH] radix-tree: small · cfd9b7df
      Nick Piggin authored
      Reduce radix tree node memory usage by about a factor of 4 for small files
      (< 64K).  There are pointer traversal and memory usage costs for large
      files with dense pagecache.
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      cfd9b7df