Commit 89261aab authored by Andrew Morton's avatar Andrew Morton Committed by Linus Torvalds

[PATCH] make the pagecache lock irq-safe.

Intro to these patches:

- Major surgery against the pagecache, radix-tree and writeback code.  This
  work is to address the O_DIRECT-vs-buffered data exposure horrors which
  we've been struggling with for months.

  As a side-effect, 32 bytes are saved from struct inode and eight bytes
  are removed from struct page.  At a cost of approximately 2.5 bits per page
  in the radix tree nodes on 4k pagesize, assuming the pagecache is densely
  populated.  Not all pages are pagecache; other pages gain the full 8 byte
  saving.

  This change will break any arch code which is using page->list and will
  also break any arch code which is using page->lru of memory which was
  obtained from slab.

  The basic problem which we (mainly Daniel McNeil) have been struggling
  with is in getting a really reliable fsync() across the page lists while
  other processes are performing writeback against the same file.  It's like
  juggling four bars of wet soap with your eyes shut while someone is
  whacking you with a baseball bat.  Daniel pretty much has the problem
  plugged but I suspect that's just because we don't have testcases to
  trigger the remaining problems.  The complexity and additional locking
  which those patches add is worrisome.

  So the approach taken here is to remove the page lists altogether and
  replace the list-based writeback and wait operations with in-order
  radix-tree walks.

  The radix-tree code has been enhanced to support "tagging" of pages, for
  later searches for pages which have a particular tag set.  This means that
  we can ask the radix tree code "find me the next 16 dirty pages starting at
  pagecache index N" and it will do that in O(log64(N)) time.

  This affects I/O scheduling potentially quite significantly.  It is no
  longer the case that the kernel will submit pages for I/O in the order in
  which the application dirtied them.  We instead submit them in file-offset
  order all the time.

  This is likely to be advantageous when applications are seeking all over
  a large file randomly writing small amounts of data.  I haven't performed
  much benchmarking, but tiobench random write throughput seems to be
  increased by 30%.  Other tests appear to be unaltered.  dbench may have got
  10-20% quicker, but it's variable.

  There is one large file which everyone seeks all over randomly writing
  small amounts of data: the blockdev mapping which caches filesystem
  metadata.  The kernel's IO submission patterns for this are now ideal.


  Because writeback and wait-for-writeback use a tree walk instead of a
  list walk they are no longer livelockable.  This probably means that we no
  longer need to hold i_sem across O_SYNC writes and perhaps fsync() and
  fdatasync().  This may be beneficial for databases: multiple processes
  writing and syncing different parts of the same file at the same time can
  now all submit and wait upon writes to just their own little bit of the
  file, so we can get a lot more data into the queues.

  It is trivial to implement a part-file-fdatasync() as well, so
  applications can say "sync the file from byte N to byte M", and multiple
  applications can do this concurrently.  This is easy for ext2 filesystems,
  but probably needs lots of work for data-journalled filesystems and XFS and
  it probably doesn't offer much benefit over an i_semless O_SYNC write.


  These patches can end up making ext3 (even) slower:

	for i in 1 2 3 4
	do
		dd if=/dev/zero of=$i bs=1M count=2000 &
	done          

  runs awfully slow on SMP.  This is, yet again, because all the file
  blocks are jumbled up and the per-file linear writeout causes tons of
  seeking.  The above test runs sweetly on UP because the on UP we don't
  allocate blocks to different files in parallel.

  Mingming and Badari are working on getting block reservation working for
  ext3 (preallocation on steroids).  That should fix ext3 up.


This patch:

- Later, we'll need to access the radix trees from inside disk I/O
  completion handlers.  So make mapping->page_lock irq-safe.  And rename it
  to tree_lock to reliably break any missed conversions.
parent 8691fb83
...@@ -396,7 +396,7 @@ asmlinkage long sys_fdatasync(unsigned int fd) ...@@ -396,7 +396,7 @@ asmlinkage long sys_fdatasync(unsigned int fd)
* Hack idea: for the blockdev mapping, i_bufferlist_lock contention * Hack idea: for the blockdev mapping, i_bufferlist_lock contention
* may be quite high. This code could TryLock the page, and if that * may be quite high. This code could TryLock the page, and if that
* succeeds, there is no need to take private_lock. (But if * succeeds, there is no need to take private_lock. (But if
* private_lock is contended then so is mapping->page_lock). * private_lock is contended then so is mapping->tree_lock).
*/ */
static struct buffer_head * static struct buffer_head *
__find_get_block_slow(struct block_device *bdev, sector_t block, int unused) __find_get_block_slow(struct block_device *bdev, sector_t block, int unused)
...@@ -867,14 +867,14 @@ int __set_page_dirty_buffers(struct page *page) ...@@ -867,14 +867,14 @@ int __set_page_dirty_buffers(struct page *page)
spin_unlock(&mapping->private_lock); spin_unlock(&mapping->private_lock);
if (!TestSetPageDirty(page)) { if (!TestSetPageDirty(page)) {
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
if (page->mapping) { /* Race with truncate? */ if (page->mapping) { /* Race with truncate? */
if (!mapping->backing_dev_info->memory_backed) if (!mapping->backing_dev_info->memory_backed)
inc_page_state(nr_dirty); inc_page_state(nr_dirty);
list_del(&page->list); list_del(&page->list);
list_add(&page->list, &mapping->dirty_pages); list_add(&page->list, &mapping->dirty_pages);
} }
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES); __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
} }
...@@ -1254,7 +1254,7 @@ __getblk_slow(struct block_device *bdev, sector_t block, int size) ...@@ -1254,7 +1254,7 @@ __getblk_slow(struct block_device *bdev, sector_t block, int size)
* inode to its superblock's dirty inode list. * inode to its superblock's dirty inode list.
* *
* mark_buffer_dirty() is atomic. It takes bh->b_page->mapping->private_lock, * mark_buffer_dirty() is atomic. It takes bh->b_page->mapping->private_lock,
* mapping->page_lock and the global inode_lock. * mapping->tree_lock and the global inode_lock.
*/ */
void fastcall mark_buffer_dirty(struct buffer_head *bh) void fastcall mark_buffer_dirty(struct buffer_head *bh)
{ {
......
...@@ -898,11 +898,9 @@ static void cifs_copy_cache_pages(struct address_space *mapping, ...@@ -898,11 +898,9 @@ static void cifs_copy_cache_pages(struct address_space *mapping,
if(list_empty(pages)) if(list_empty(pages))
break; break;
spin_lock(&mapping->page_lock);
page = list_entry(pages->prev, struct page, list); page = list_entry(pages->prev, struct page, list);
list_del(&page->list); list_del(&page->list);
spin_unlock(&mapping->page_lock);
if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) { if (add_to_page_cache(page, mapping, page->index, GFP_KERNEL)) {
page_cache_release(page); page_cache_release(page);
...@@ -962,14 +960,10 @@ cifs_readpages(struct file *file, struct address_space *mapping, ...@@ -962,14 +960,10 @@ cifs_readpages(struct file *file, struct address_space *mapping,
pagevec_init(&lru_pvec, 0); pagevec_init(&lru_pvec, 0);
for(i = 0;i<num_pages;) { for(i = 0;i<num_pages;) {
spin_lock(&mapping->page_lock); if(list_empty(page_list))
if(list_empty(page_list)) {
spin_unlock(&mapping->page_lock);
break; break;
}
page = list_entry(page_list->prev, struct page, list); page = list_entry(page_list->prev, struct page, list);
offset = (loff_t)page->index << PAGE_CACHE_SHIFT; offset = (loff_t)page->index << PAGE_CACHE_SHIFT;
spin_unlock(&mapping->page_lock);
/* for reads over a certain size could initiate async read ahead */ /* for reads over a certain size could initiate async read ahead */
...@@ -989,12 +983,10 @@ cifs_readpages(struct file *file, struct address_space *mapping, ...@@ -989,12 +983,10 @@ cifs_readpages(struct file *file, struct address_space *mapping,
cFYI(1,("Read error in readpages: %d",rc)); cFYI(1,("Read error in readpages: %d",rc));
/* clean up remaing pages off list */ /* clean up remaing pages off list */
spin_lock(&mapping->page_lock);
while (!list_empty(page_list) && (i < num_pages)) { while (!list_empty(page_list) && (i < num_pages)) {
page = list_entry(page_list->prev, struct page, list); page = list_entry(page_list->prev, struct page, list);
list_del(&page->list); list_del(&page->list);
} }
spin_unlock(&mapping->page_lock);
break; break;
} else if (bytes_read > 0) { } else if (bytes_read > 0) {
pSMBr = (struct smb_com_read_rsp *)smb_read_data; pSMBr = (struct smb_com_read_rsp *)smb_read_data;
......
...@@ -159,10 +159,10 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc) ...@@ -159,10 +159,10 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc)
* read speculatively by this cpu before &= ~I_DIRTY -- mikulas * read speculatively by this cpu before &= ~I_DIRTY -- mikulas
*/ */
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
if (wait || !wbc->for_kupdate || list_empty(&mapping->io_pages)) if (wait || !wbc->for_kupdate || list_empty(&mapping->io_pages))
list_splice_init(&mapping->dirty_pages, &mapping->io_pages); list_splice_init(&mapping->dirty_pages, &mapping->io_pages);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
spin_unlock(&inode_lock); spin_unlock(&inode_lock);
ret = do_writepages(mapping, wbc); ret = do_writepages(mapping, wbc);
......
...@@ -187,7 +187,7 @@ void inode_init_once(struct inode *inode) ...@@ -187,7 +187,7 @@ void inode_init_once(struct inode *inode)
sema_init(&inode->i_sem, 1); sema_init(&inode->i_sem, 1);
init_rwsem(&inode->i_alloc_sem); init_rwsem(&inode->i_alloc_sem);
INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC); INIT_RADIX_TREE(&inode->i_data.page_tree, GFP_ATOMIC);
spin_lock_init(&inode->i_data.page_lock); spin_lock_init(&inode->i_data.tree_lock);
init_MUTEX(&inode->i_data.i_shared_sem); init_MUTEX(&inode->i_data.i_shared_sem);
atomic_set(&inode->i_data.truncate_count, 0); atomic_set(&inode->i_data.truncate_count, 0);
INIT_LIST_HEAD(&inode->i_data.private_list); INIT_LIST_HEAD(&inode->i_data.private_list);
......
...@@ -635,7 +635,7 @@ mpage_writepages(struct address_space *mapping, ...@@ -635,7 +635,7 @@ mpage_writepages(struct address_space *mapping,
if (get_block == NULL) if (get_block == NULL)
writepage = mapping->a_ops->writepage; writepage = mapping->a_ops->writepage;
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
while (!list_empty(&mapping->io_pages) && !done) { while (!list_empty(&mapping->io_pages) && !done) {
struct page *page = list_entry(mapping->io_pages.prev, struct page *page = list_entry(mapping->io_pages.prev,
struct page, list); struct page, list);
...@@ -655,10 +655,10 @@ mpage_writepages(struct address_space *mapping, ...@@ -655,10 +655,10 @@ mpage_writepages(struct address_space *mapping,
list_add(&page->list, &mapping->locked_pages); list_add(&page->list, &mapping->locked_pages);
page_cache_get(page); page_cache_get(page);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
/* /*
* At this point we hold neither mapping->page_lock nor * At this point we hold neither mapping->tree_lock nor
* lock on the page itself: the page may be truncated or * lock on the page itself: the page may be truncated or
* invalidated (changing page->mapping to NULL), or even * invalidated (changing page->mapping to NULL), or even
* swizzled back from swapper_space to tmpfs file mapping. * swizzled back from swapper_space to tmpfs file mapping.
...@@ -695,12 +695,12 @@ mpage_writepages(struct address_space *mapping, ...@@ -695,12 +695,12 @@ mpage_writepages(struct address_space *mapping,
unlock_page(page); unlock_page(page);
} }
page_cache_release(page); page_cache_release(page);
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
} }
/* /*
* Leave any remaining dirty pages on ->io_pages * Leave any remaining dirty pages on ->io_pages
*/ */
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
if (bio) if (bio)
mpage_bio_submit(WRITE, bio); mpage_bio_submit(WRITE, bio);
return ret; return ret;
......
...@@ -322,7 +322,7 @@ struct backing_dev_info; ...@@ -322,7 +322,7 @@ struct backing_dev_info;
struct address_space { struct address_space {
struct inode *host; /* owner: inode, block_device */ struct inode *host; /* owner: inode, block_device */
struct radix_tree_root page_tree; /* radix tree of all pages */ struct radix_tree_root page_tree; /* radix tree of all pages */
spinlock_t page_lock; /* and spinlock protecting it */ spinlock_t tree_lock; /* and spinlock protecting it */
struct list_head clean_pages; /* list of clean pages */ struct list_head clean_pages; /* list of clean pages */
struct list_head dirty_pages; /* list of dirty pages */ struct list_head dirty_pages; /* list of dirty pages */
struct list_head locked_pages; /* list of locked pages */ struct list_head locked_pages; /* list of locked pages */
......
...@@ -380,9 +380,7 @@ static void shm_get_stat(unsigned long *rss, unsigned long *swp) ...@@ -380,9 +380,7 @@ static void shm_get_stat(unsigned long *rss, unsigned long *swp)
if (is_file_hugepages(shp->shm_file)) { if (is_file_hugepages(shp->shm_file)) {
struct address_space *mapping = inode->i_mapping; struct address_space *mapping = inode->i_mapping;
spin_lock(&mapping->page_lock);
*rss += (HPAGE_SIZE/PAGE_SIZE)*mapping->nrpages; *rss += (HPAGE_SIZE/PAGE_SIZE)*mapping->nrpages;
spin_unlock(&mapping->page_lock);
} else { } else {
struct shmem_inode_info *info = SHMEM_I(inode); struct shmem_inode_info *info = SHMEM_I(inode);
spin_lock(&info->lock); spin_lock(&info->lock);
......
...@@ -59,7 +59,7 @@ ...@@ -59,7 +59,7 @@
* ->private_lock (__free_pte->__set_page_dirty_buffers) * ->private_lock (__free_pte->__set_page_dirty_buffers)
* ->swap_list_lock * ->swap_list_lock
* ->swap_device_lock (exclusive_swap_page, others) * ->swap_device_lock (exclusive_swap_page, others)
* ->mapping->page_lock * ->mapping->tree_lock
* *
* ->i_sem * ->i_sem
* ->i_shared_sem (truncate->invalidate_mmap_range) * ->i_shared_sem (truncate->invalidate_mmap_range)
...@@ -78,12 +78,12 @@ ...@@ -78,12 +78,12 @@
* *
* ->inode_lock * ->inode_lock
* ->sb_lock (fs/fs-writeback.c) * ->sb_lock (fs/fs-writeback.c)
* ->mapping->page_lock (__sync_single_inode) * ->mapping->tree_lock (__sync_single_inode)
* *
* ->page_table_lock * ->page_table_lock
* ->swap_device_lock (try_to_unmap_one) * ->swap_device_lock (try_to_unmap_one)
* ->private_lock (try_to_unmap_one) * ->private_lock (try_to_unmap_one)
* ->page_lock (try_to_unmap_one) * ->tree_lock (try_to_unmap_one)
* ->zone.lru_lock (follow_page->mark_page_accessed) * ->zone.lru_lock (follow_page->mark_page_accessed)
* *
* ->task->proc_lock * ->task->proc_lock
...@@ -93,7 +93,7 @@ ...@@ -93,7 +93,7 @@
/* /*
* Remove a page from the page cache and free it. Caller has to make * Remove a page from the page cache and free it. Caller has to make
* sure the page is locked and that nobody else uses it - or that usage * sure the page is locked and that nobody else uses it - or that usage
* is safe. The caller must hold a write_lock on the mapping's page_lock. * is safe. The caller must hold a write_lock on the mapping's tree_lock.
*/ */
void __remove_from_page_cache(struct page *page) void __remove_from_page_cache(struct page *page)
{ {
...@@ -114,9 +114,9 @@ void remove_from_page_cache(struct page *page) ...@@ -114,9 +114,9 @@ void remove_from_page_cache(struct page *page)
if (unlikely(!PageLocked(page))) if (unlikely(!PageLocked(page)))
PAGE_BUG(page); PAGE_BUG(page);
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
__remove_from_page_cache(page); __remove_from_page_cache(page);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
} }
static inline int sync_page(struct page *page) static inline int sync_page(struct page *page)
...@@ -148,9 +148,9 @@ static int __filemap_fdatawrite(struct address_space *mapping, int sync_mode) ...@@ -148,9 +148,9 @@ static int __filemap_fdatawrite(struct address_space *mapping, int sync_mode)
if (mapping->backing_dev_info->memory_backed) if (mapping->backing_dev_info->memory_backed)
return 0; return 0;
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
list_splice_init(&mapping->dirty_pages, &mapping->io_pages); list_splice_init(&mapping->dirty_pages, &mapping->io_pages);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
ret = do_writepages(mapping, &wbc); ret = do_writepages(mapping, &wbc);
return ret; return ret;
} }
...@@ -185,7 +185,7 @@ int filemap_fdatawait(struct address_space * mapping) ...@@ -185,7 +185,7 @@ int filemap_fdatawait(struct address_space * mapping)
restart: restart:
progress = 0; progress = 0;
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
while (!list_empty(&mapping->locked_pages)) { while (!list_empty(&mapping->locked_pages)) {
struct page *page; struct page *page;
...@@ -199,7 +199,7 @@ int filemap_fdatawait(struct address_space * mapping) ...@@ -199,7 +199,7 @@ int filemap_fdatawait(struct address_space * mapping)
if (!PageWriteback(page)) { if (!PageWriteback(page)) {
if (++progress > 32) { if (++progress > 32) {
if (need_resched()) { if (need_resched()) {
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
__cond_resched(); __cond_resched();
goto restart; goto restart;
} }
...@@ -209,16 +209,16 @@ int filemap_fdatawait(struct address_space * mapping) ...@@ -209,16 +209,16 @@ int filemap_fdatawait(struct address_space * mapping)
progress = 0; progress = 0;
page_cache_get(page); page_cache_get(page);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
wait_on_page_writeback(page); wait_on_page_writeback(page);
if (PageError(page)) if (PageError(page))
ret = -EIO; ret = -EIO;
page_cache_release(page); page_cache_release(page);
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
} }
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
/* Check for outstanding write errors */ /* Check for outstanding write errors */
if (test_and_clear_bit(AS_ENOSPC, &mapping->flags)) if (test_and_clear_bit(AS_ENOSPC, &mapping->flags))
...@@ -267,7 +267,7 @@ int add_to_page_cache(struct page *page, struct address_space *mapping, ...@@ -267,7 +267,7 @@ int add_to_page_cache(struct page *page, struct address_space *mapping,
if (error == 0) { if (error == 0) {
page_cache_get(page); page_cache_get(page);
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
error = radix_tree_insert(&mapping->page_tree, offset, page); error = radix_tree_insert(&mapping->page_tree, offset, page);
if (!error) { if (!error) {
SetPageLocked(page); SetPageLocked(page);
...@@ -275,7 +275,7 @@ int add_to_page_cache(struct page *page, struct address_space *mapping, ...@@ -275,7 +275,7 @@ int add_to_page_cache(struct page *page, struct address_space *mapping,
} else { } else {
page_cache_release(page); page_cache_release(page);
} }
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
radix_tree_preload_end(); radix_tree_preload_end();
} }
return error; return error;
...@@ -411,11 +411,11 @@ struct page * find_get_page(struct address_space *mapping, unsigned long offset) ...@@ -411,11 +411,11 @@ struct page * find_get_page(struct address_space *mapping, unsigned long offset)
* We scan the hash list read-only. Addition to and removal from * We scan the hash list read-only. Addition to and removal from
* the hash-list needs a held write-lock. * the hash-list needs a held write-lock.
*/ */
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
page = radix_tree_lookup(&mapping->page_tree, offset); page = radix_tree_lookup(&mapping->page_tree, offset);
if (page) if (page)
page_cache_get(page); page_cache_get(page);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
return page; return page;
} }
...@@ -428,11 +428,11 @@ struct page *find_trylock_page(struct address_space *mapping, unsigned long offs ...@@ -428,11 +428,11 @@ struct page *find_trylock_page(struct address_space *mapping, unsigned long offs
{ {
struct page *page; struct page *page;
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
page = radix_tree_lookup(&mapping->page_tree, offset); page = radix_tree_lookup(&mapping->page_tree, offset);
if (page && TestSetPageLocked(page)) if (page && TestSetPageLocked(page))
page = NULL; page = NULL;
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
return page; return page;
} }
...@@ -454,15 +454,15 @@ struct page *find_lock_page(struct address_space *mapping, ...@@ -454,15 +454,15 @@ struct page *find_lock_page(struct address_space *mapping,
{ {
struct page *page; struct page *page;
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
repeat: repeat:
page = radix_tree_lookup(&mapping->page_tree, offset); page = radix_tree_lookup(&mapping->page_tree, offset);
if (page) { if (page) {
page_cache_get(page); page_cache_get(page);
if (TestSetPageLocked(page)) { if (TestSetPageLocked(page)) {
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
lock_page(page); lock_page(page);
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
/* Has the page been truncated while we slept? */ /* Has the page been truncated while we slept? */
if (page->mapping != mapping || page->index != offset) { if (page->mapping != mapping || page->index != offset) {
...@@ -472,7 +472,7 @@ struct page *find_lock_page(struct address_space *mapping, ...@@ -472,7 +472,7 @@ struct page *find_lock_page(struct address_space *mapping,
} }
} }
} }
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
return page; return page;
} }
...@@ -546,12 +546,12 @@ unsigned int find_get_pages(struct address_space *mapping, pgoff_t start, ...@@ -546,12 +546,12 @@ unsigned int find_get_pages(struct address_space *mapping, pgoff_t start,
unsigned int i; unsigned int i;
unsigned int ret; unsigned int ret;
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
ret = radix_tree_gang_lookup(&mapping->page_tree, ret = radix_tree_gang_lookup(&mapping->page_tree,
(void **)pages, start, nr_pages); (void **)pages, start, nr_pages);
for (i = 0; i < ret; i++) for (i = 0; i < ret; i++)
page_cache_get(pages[i]); page_cache_get(pages[i]);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
return ret; return ret;
} }
......
...@@ -472,12 +472,12 @@ int write_one_page(struct page *page, int wait) ...@@ -472,12 +472,12 @@ int write_one_page(struct page *page, int wait)
if (wait) if (wait)
wait_on_page_writeback(page); wait_on_page_writeback(page);
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
list_del(&page->list); list_del(&page->list);
if (test_clear_page_dirty(page)) { if (test_clear_page_dirty(page)) {
list_add(&page->list, &mapping->locked_pages); list_add(&page->list, &mapping->locked_pages);
page_cache_get(page); page_cache_get(page);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
ret = mapping->a_ops->writepage(page, &wbc); ret = mapping->a_ops->writepage(page, &wbc);
if (ret == 0 && wait) { if (ret == 0 && wait) {
wait_on_page_writeback(page); wait_on_page_writeback(page);
...@@ -487,7 +487,7 @@ int write_one_page(struct page *page, int wait) ...@@ -487,7 +487,7 @@ int write_one_page(struct page *page, int wait)
page_cache_release(page); page_cache_release(page);
} else { } else {
list_add(&page->list, &mapping->clean_pages); list_add(&page->list, &mapping->clean_pages);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
unlock_page(page); unlock_page(page);
} }
return ret; return ret;
...@@ -515,7 +515,7 @@ int __set_page_dirty_nobuffers(struct page *page) ...@@ -515,7 +515,7 @@ int __set_page_dirty_nobuffers(struct page *page)
struct address_space *mapping = page->mapping; struct address_space *mapping = page->mapping;
if (mapping) { if (mapping) {
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
if (page->mapping) { /* Race with truncate? */ if (page->mapping) { /* Race with truncate? */
BUG_ON(page->mapping != mapping); BUG_ON(page->mapping != mapping);
if (!mapping->backing_dev_info->memory_backed) if (!mapping->backing_dev_info->memory_backed)
...@@ -523,7 +523,7 @@ int __set_page_dirty_nobuffers(struct page *page) ...@@ -523,7 +523,7 @@ int __set_page_dirty_nobuffers(struct page *page)
list_del(&page->list); list_del(&page->list);
list_add(&page->list, &mapping->dirty_pages); list_add(&page->list, &mapping->dirty_pages);
} }
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
if (!PageSwapCache(page)) if (!PageSwapCache(page))
__mark_inode_dirty(mapping->host, __mark_inode_dirty(mapping->host,
I_DIRTY_PAGES); I_DIRTY_PAGES);
......
...@@ -230,7 +230,7 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp, ...@@ -230,7 +230,7 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
/* /*
* Preallocate as many pages as we will need. * Preallocate as many pages as we will need.
*/ */
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
for (page_idx = 0; page_idx < nr_to_read; page_idx++) { for (page_idx = 0; page_idx < nr_to_read; page_idx++) {
unsigned long page_offset = offset + page_idx; unsigned long page_offset = offset + page_idx;
...@@ -241,16 +241,16 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp, ...@@ -241,16 +241,16 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp,
if (page) if (page)
continue; continue;
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
page = page_cache_alloc_cold(mapping); page = page_cache_alloc_cold(mapping);
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
if (!page) if (!page)
break; break;
page->index = page_offset; page->index = page_offset;
list_add(&page->list, &page_pool); list_add(&page->list, &page_pool);
ret++; ret++;
} }
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
/* /*
* Now start the IO. We ignore I/O errors - if the page is not * Now start the IO. We ignore I/O errors - if the page is not
......
...@@ -25,7 +25,7 @@ extern struct address_space_operations swap_aops; ...@@ -25,7 +25,7 @@ extern struct address_space_operations swap_aops;
struct address_space swapper_space = { struct address_space swapper_space = {
.page_tree = RADIX_TREE_INIT(GFP_ATOMIC), .page_tree = RADIX_TREE_INIT(GFP_ATOMIC),
.page_lock = SPIN_LOCK_UNLOCKED, .tree_lock = SPIN_LOCK_UNLOCKED,
.clean_pages = LIST_HEAD_INIT(swapper_space.clean_pages), .clean_pages = LIST_HEAD_INIT(swapper_space.clean_pages),
.dirty_pages = LIST_HEAD_INIT(swapper_space.dirty_pages), .dirty_pages = LIST_HEAD_INIT(swapper_space.dirty_pages),
.io_pages = LIST_HEAD_INIT(swapper_space.io_pages), .io_pages = LIST_HEAD_INIT(swapper_space.io_pages),
...@@ -182,9 +182,9 @@ void delete_from_swap_cache(struct page *page) ...@@ -182,9 +182,9 @@ void delete_from_swap_cache(struct page *page)
entry.val = page->index; entry.val = page->index;
spin_lock(&swapper_space.page_lock); spin_lock_irq(&swapper_space.tree_lock);
__delete_from_swap_cache(page); __delete_from_swap_cache(page);
spin_unlock(&swapper_space.page_lock); spin_unlock_irq(&swapper_space.tree_lock);
swap_free(entry); swap_free(entry);
page_cache_release(page); page_cache_release(page);
...@@ -195,8 +195,8 @@ int move_to_swap_cache(struct page *page, swp_entry_t entry) ...@@ -195,8 +195,8 @@ int move_to_swap_cache(struct page *page, swp_entry_t entry)
struct address_space *mapping = page->mapping; struct address_space *mapping = page->mapping;
int err; int err;
spin_lock(&swapper_space.page_lock); spin_lock_irq(&swapper_space.tree_lock);
spin_lock(&mapping->page_lock); spin_lock(&mapping->tree_lock);
err = radix_tree_insert(&swapper_space.page_tree, entry.val, page); err = radix_tree_insert(&swapper_space.page_tree, entry.val, page);
if (!err) { if (!err) {
...@@ -204,8 +204,8 @@ int move_to_swap_cache(struct page *page, swp_entry_t entry) ...@@ -204,8 +204,8 @@ int move_to_swap_cache(struct page *page, swp_entry_t entry)
___add_to_page_cache(page, &swapper_space, entry.val); ___add_to_page_cache(page, &swapper_space, entry.val);
} }
spin_unlock(&mapping->page_lock); spin_unlock(&mapping->tree_lock);
spin_unlock(&swapper_space.page_lock); spin_unlock_irq(&swapper_space.tree_lock);
if (!err) { if (!err) {
if (!swap_duplicate(entry)) if (!swap_duplicate(entry))
...@@ -231,8 +231,8 @@ int move_from_swap_cache(struct page *page, unsigned long index, ...@@ -231,8 +231,8 @@ int move_from_swap_cache(struct page *page, unsigned long index,
entry.val = page->index; entry.val = page->index;
spin_lock(&swapper_space.page_lock); spin_lock_irq(&swapper_space.tree_lock);
spin_lock(&mapping->page_lock); spin_lock(&mapping->tree_lock);
err = radix_tree_insert(&mapping->page_tree, index, page); err = radix_tree_insert(&mapping->page_tree, index, page);
if (!err) { if (!err) {
...@@ -240,8 +240,8 @@ int move_from_swap_cache(struct page *page, unsigned long index, ...@@ -240,8 +240,8 @@ int move_from_swap_cache(struct page *page, unsigned long index,
___add_to_page_cache(page, mapping, index); ___add_to_page_cache(page, mapping, index);
} }
spin_unlock(&mapping->page_lock); spin_unlock(&mapping->tree_lock);
spin_unlock(&swapper_space.page_lock); spin_unlock_irq(&swapper_space.tree_lock);
if (!err) { if (!err) {
swap_free(entry); swap_free(entry);
......
...@@ -253,10 +253,10 @@ static int exclusive_swap_page(struct page *page) ...@@ -253,10 +253,10 @@ static int exclusive_swap_page(struct page *page)
/* Is the only swap cache user the cache itself? */ /* Is the only swap cache user the cache itself? */
if (p->swap_map[swp_offset(entry)] == 1) { if (p->swap_map[swp_offset(entry)] == 1) {
/* Recheck the page count with the pagecache lock held.. */ /* Recheck the page count with the pagecache lock held.. */
spin_lock(&swapper_space.page_lock); spin_lock_irq(&swapper_space.tree_lock);
if (page_count(page) - !!PagePrivate(page) == 2) if (page_count(page) - !!PagePrivate(page) == 2)
retval = 1; retval = 1;
spin_unlock(&swapper_space.page_lock); spin_unlock_irq(&swapper_space.tree_lock);
} }
swap_info_put(p); swap_info_put(p);
} }
...@@ -324,13 +324,13 @@ int remove_exclusive_swap_page(struct page *page) ...@@ -324,13 +324,13 @@ int remove_exclusive_swap_page(struct page *page)
retval = 0; retval = 0;
if (p->swap_map[swp_offset(entry)] == 1) { if (p->swap_map[swp_offset(entry)] == 1) {
/* Recheck the page count with the pagecache lock held.. */ /* Recheck the page count with the pagecache lock held.. */
spin_lock(&swapper_space.page_lock); spin_lock_irq(&swapper_space.tree_lock);
if ((page_count(page) == 2) && !PageWriteback(page)) { if ((page_count(page) == 2) && !PageWriteback(page)) {
__delete_from_swap_cache(page); __delete_from_swap_cache(page);
SetPageDirty(page); SetPageDirty(page);
retval = 1; retval = 1;
} }
spin_unlock(&swapper_space.page_lock); spin_unlock_irq(&swapper_space.tree_lock);
} }
swap_info_put(p); swap_info_put(p);
......
...@@ -62,7 +62,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page) ...@@ -62,7 +62,7 @@ truncate_complete_page(struct address_space *mapping, struct page *page)
* This is for invalidate_inode_pages(). That function can be called at * This is for invalidate_inode_pages(). That function can be called at
* any time, and is not supposed to throw away dirty pages. But pages can * any time, and is not supposed to throw away dirty pages. But pages can
* be marked dirty at any time too. So we re-check the dirtiness inside * be marked dirty at any time too. So we re-check the dirtiness inside
* ->page_lock. That provides exclusion against the __set_page_dirty * ->tree_lock. That provides exclusion against the __set_page_dirty
* functions. * functions.
*/ */
static int static int
...@@ -74,13 +74,13 @@ invalidate_complete_page(struct address_space *mapping, struct page *page) ...@@ -74,13 +74,13 @@ invalidate_complete_page(struct address_space *mapping, struct page *page)
if (PagePrivate(page) && !try_to_release_page(page, 0)) if (PagePrivate(page) && !try_to_release_page(page, 0))
return 0; return 0;
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
if (PageDirty(page)) { if (PageDirty(page)) {
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
return 0; return 0;
} }
__remove_from_page_cache(page); __remove_from_page_cache(page);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
ClearPageUptodate(page); ClearPageUptodate(page);
page_cache_release(page); /* pagecache ref */ page_cache_release(page); /* pagecache ref */
return 1; return 1;
......
...@@ -354,7 +354,6 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned) ...@@ -354,7 +354,6 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned)
goto keep_locked; goto keep_locked;
if (!may_write_to_queue(mapping->backing_dev_info)) if (!may_write_to_queue(mapping->backing_dev_info))
goto keep_locked; goto keep_locked;
spin_lock(&mapping->page_lock);
if (test_clear_page_dirty(page)) { if (test_clear_page_dirty(page)) {
int res; int res;
struct writeback_control wbc = { struct writeback_control wbc = {
...@@ -364,9 +363,6 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned) ...@@ -364,9 +363,6 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned)
.for_reclaim = 1, .for_reclaim = 1,
}; };
list_move(&page->list, &mapping->locked_pages);
spin_unlock(&mapping->page_lock);
SetPageReclaim(page); SetPageReclaim(page);
res = mapping->a_ops->writepage(page, &wbc); res = mapping->a_ops->writepage(page, &wbc);
if (res < 0) if (res < 0)
...@@ -381,7 +377,6 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned) ...@@ -381,7 +377,6 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned)
} }
goto keep; goto keep;
} }
spin_unlock(&mapping->page_lock);
} }
/* /*
...@@ -415,7 +410,7 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned) ...@@ -415,7 +410,7 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned)
if (!mapping) if (!mapping)
goto keep_locked; /* truncate got there first */ goto keep_locked; /* truncate got there first */
spin_lock(&mapping->page_lock); spin_lock_irq(&mapping->tree_lock);
/* /*
* The non-racy check for busy page. It is critical to check * The non-racy check for busy page. It is critical to check
...@@ -423,7 +418,7 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned) ...@@ -423,7 +418,7 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned)
* not in use by anybody. (pagecache + us == 2) * not in use by anybody. (pagecache + us == 2)
*/ */
if (page_count(page) != 2 || PageDirty(page)) { if (page_count(page) != 2 || PageDirty(page)) {
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
goto keep_locked; goto keep_locked;
} }
...@@ -431,7 +426,7 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned) ...@@ -431,7 +426,7 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned)
if (PageSwapCache(page)) { if (PageSwapCache(page)) {
swp_entry_t swap = { .val = page->index }; swp_entry_t swap = { .val = page->index };
__delete_from_swap_cache(page); __delete_from_swap_cache(page);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
swap_free(swap); swap_free(swap);
__put_page(page); /* The pagecache ref */ __put_page(page); /* The pagecache ref */
goto free_it; goto free_it;
...@@ -439,7 +434,7 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned) ...@@ -439,7 +434,7 @@ shrink_list(struct list_head *page_list, unsigned int gfp_mask, int *nr_scanned)
#endif /* CONFIG_SWAP */ #endif /* CONFIG_SWAP */
__remove_from_page_cache(page); __remove_from_page_cache(page);
spin_unlock(&mapping->page_lock); spin_unlock_irq(&mapping->tree_lock);
__put_page(page); __put_page(page);
free_it: free_it:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment