Commit 2cde04e9 authored by Kazuo Ito's avatar Kazuo Ito Committed by Trond Myklebust

pNFS: Avoid read/modify/write when it is not necessary

As the block and SCSI layouts can only read/write fixed-length
blocks, we must perform read-modify-write when data to be written is
not aligned to a block boundary or smaller than the block size.
(612aa983 pnfs: add flag to force read-modify-write in ->write_begin)

The current code tries to see if we have to do read-modify-write
on block-oriented pNFS layouts by just checking !PageUptodate(page),
but the same condition also applies for overwriting of any uncached
potions of existing files, making such operations excessively slow
even it is block-aligned.

The change does not affect the optimization for modify-write-read
cases (38c73044 NFS: read-modify-write page updating),
because partial update of !PageUptodate() pages can only happen
in layouts that can do arbitrary length read/write and never
in block-based ones.

Testing results:

We ran fio on one of the pNFS clients running 4.20 kernel
(vanilla and patched) in this configuration to read/write/overwrite
files on the storage array, exported as pnfs share by the server.

 pNFS clients ---1G Ethernet--- pNFS server
 (HP DL360 G8)                  (HP DL360 G8)
       |                              |
       |                              |
       +------8G Fiber Channel--------+
                     |
               Storage Array
                 (HP P6350)

Throughput of overwrite (both buffered and O_SYNC) is noticeably
improved.

Ops.     |block size|   Throughput   |
         |  (KiB)   |    (MiB/s)     |
         |          |  4.20 | patched|
---------+----------+----------------+
buffered |         4|  21.3 |  232   |
overwrite|        32|  22.2 |  256   |
         |       512|  22.4 |  260   |
---------+----------+----------------+
O_SYNC   |         4|   3.84|    4.77|
overwrite|        32|  12.2 |   32.0 |
         |       512|  18.5 |  152   |
---------+----------+----------------+

Read and write (buffered and O_SYNC) by the same client remain unchanged
by the patch either negatively or positively, as they should do.

Ops.     |block size|   Throughput   |
         |  (KiB)   |    (MiB/s)     |
         |          |  4.20 | patched|
---------+----------+----------------+
read     |         4| 548   |  550   |
         |        32| 547   |  551   |
         |       512| 548   |  551   |
---------+----------+----------------+
buffered |         4| 237   |  244   |
write    |        32| 261   |  268   |
         |       512| 265   |  272   |
---------+----------+----------------+
O_SYNC   |         4|   0.46|    0.46|
write    |        32|   3.60|    3.57|
         |       512| 105   |  106   |
---------+----------+----------------+
Signed-off-by: default avatarKazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
Tested-by: default avatarHiroyuki Watanabe <watanabe.hiroyuki@lab.ntt.co.jp>
Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
parent 97ae91bb
...@@ -276,6 +276,12 @@ EXPORT_SYMBOL_GPL(nfs_file_fsync); ...@@ -276,6 +276,12 @@ EXPORT_SYMBOL_GPL(nfs_file_fsync);
* then a modify/write/read cycle when writing to a page in the * then a modify/write/read cycle when writing to a page in the
* page cache. * page cache.
* *
* Some pNFS layout drivers can only read/write at a certain block
* granularity like all block devices and therefore we must perform
* read/modify/write whenever a page hasn't read yet and the data
* to be written there is not aligned to a block boundary and/or
* smaller than the block size.
*
* The modify/write/read cycle may occur if a page is read before * The modify/write/read cycle may occur if a page is read before
* being completely filled by the writer. In this situation, the * being completely filled by the writer. In this situation, the
* page must be completely written to stable storage on the server * page must be completely written to stable storage on the server
...@@ -291,26 +297,32 @@ EXPORT_SYMBOL_GPL(nfs_file_fsync); ...@@ -291,26 +297,32 @@ EXPORT_SYMBOL_GPL(nfs_file_fsync);
* and that the new data won't completely replace the old data in * and that the new data won't completely replace the old data in
* that range of the file. * that range of the file.
*/ */
static int nfs_want_read_modify_write(struct file *file, struct page *page, static bool nfs_full_page_write(struct page *page, loff_t pos, unsigned int len)
loff_t pos, unsigned len)
{ {
unsigned int pglen = nfs_page_length(page); unsigned int pglen = nfs_page_length(page);
unsigned int offset = pos & (PAGE_SIZE - 1); unsigned int offset = pos & (PAGE_SIZE - 1);
unsigned int end = offset + len; unsigned int end = offset + len;
if (pnfs_ld_read_whole_page(file->f_mapping->host)) { return !pglen || (end >= pglen && !offset);
if (!PageUptodate(page) && !PagePrivate(page)) }
return 1;
return 0;
}
if ((file->f_mode & FMODE_READ) && /* open for read? */ static bool nfs_want_read_modify_write(struct file *file, struct page *page,
!PageUptodate(page) && /* Uptodate? */ loff_t pos, unsigned int len)
!PagePrivate(page) && /* i/o request already? */ {
pglen && /* valid bytes of file? */ /*
(end < pglen || offset)) /* replace all valid bytes? */ * Up-to-date pages, those with ongoing or full-page write
return 1; * don't need read/modify/write
return 0; */
if (PageUptodate(page) || PagePrivate(page) ||
nfs_full_page_write(page, pos, len))
return false;
if (pnfs_ld_read_whole_page(file->f_mapping->host))
return true;
/* Open for reading too? */
if (file->f_mode & FMODE_READ)
return true;
return false;
} }
/* /*
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment