Commit cdde581d authored by Kirill Smelkov's avatar Kirill Smelkov

.

parent 3054e4a3
...@@ -431,43 +431,44 @@ class tFile: ...@@ -431,43 +431,44 @@ class tFile:
# mmap the file past the end up to _max_tracked_pages and setup # mmap the file past the end up to _max_tracked_pages and setup
# invariants on which we rely to verify OS cache state: # invariants on which we rely to verify OS cache state:
# #
# 1. lock pages with MLOCK_ONFAULT: this way when a page is read by # 1. lock pages with MLOCK_ONFAULT: this way after a page is read by
# mmap access we have the guarantee from kernel that the page will # mmap access we have the guarantee from kernel that the page will
# stay in pagecache. # stay in pagecache.
# #
# 2. madvise in interleaved mode blocks memory to be either # 2. madvise memory with MADV_NORMAL and MADV_RANDOM in interleaved
# MADV_NORMAL or MAD_RANDOM. This adjusts kernel readahead (which # mode. This adjusts kernel readahead (which triggers for MADV_NORMAL
# triggers for MADV_NORMAL memory) to not go over to next block and # vma) to not go over to next block and thus a read access to one
# thus a read access to one block won't trigger implicit read access # block won't trigger implicit read access to its neighbour block.
# to neighbour block.
# #
# https://www.quora.com/What-heuristics-does-the-adaptive-readahead-implementation-in-the-Linux-kernel-use # https://www.quora.com/What-heuristics-does-the-adaptive-readahead-implementation-in-the-Linux-kernel-use
# https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/madvise.c?h=v5.2-rc4#n51 # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/madvise.c?h=v5.2-rc4#n51
# #
# don't disable readahead universally, since enabled readahead helps # don't disable readahead universally, since enabled readahead helps
# to test how wcfs handles simultaneous read vs wcfs uploading data # to test how wcfs handles simultaneous read triggered by async
# for the same block into OS cache. # kernel readahead vs wcfs uploading data for the same block into OS
# cache. Also, fully enabled readahead is how wcfs is actually used.
assert t.blksize % mm.PAGE_SIZE == 0 assert t.blksize % mm.PAGE_SIZE == 0
t.fmmap = mm.map_ro(t.f.fileno(), 0, t._max_tracked_pages*t.blksize) t.fmmap = mm.map_ro(t.f.fileno(), 0, t._max_tracked_pages*t.blksize)
mm.lock(t.fmmap, mm.MLOCK_ONFAULT) mm.lock(t.fmmap, mm.MLOCK_ONFAULT)
for blk in range(t._max_tracked_pages): for blk in range(t._max_tracked_pages):
blkmmap = t.fmmap[blk*t.blksize:(blk+1)*t.blksize] blkmmap = t.fmmap[blk*t.blksize:(blk+1)*t.blksize]
# FIXME somehow does not completely prevent readahead to go into MADV_RANDOM page # NOTE the kernel does not start readahead from access to
# NOTE with MADV_RANDOM the kernel issues 4K sized reads; wcfs # MADV_RANDOM vma, but for MADV_NORMAL vma it starts readhead which
# starts uploading into cache almost immediately, but the kernel # can go _beyond_ vma that was used to decide RA start. For this
# still issues many reads to read the full 2MB of the block. This # reason - to prevent RA started at one block to overlap with the
# works slow. # next block, we put MADV_RANDOM vma at the end of every block
# XXX -> make read(while-uploading) wait for uploading to complete # covering last 1/4 of it.
# and only then return? (maybe it will help performance even in normal case) # XXX implicit assumption that RA window is < 1/4·blksize
#mm.advise(blkmmap, (mm.MADV_NORMAL, mm.MADV_RANDOM)[blk%2]) #
#mm.advise(blkmmap, (mm.MADV_RANDOM, mm.MADV_NORMAL)[blk%2]) # NOTE with a block completely covered by MADV_RANDOM the kernel
#mm.advise(blkmmap, mm.MADV_NORMAL) # issues 4K sized reads; wcfs starts uploading into cache almost
#mm.advise(blkmmap, mm.MADV_RANDOM) # immediately, but the kernel still issues many reads to read the
# full 2MB of the block. This works slow.
# XXX vvv works - at the end of every block there is MAD_RANDOM # XXX -> investigate and maybe make read(while-uploading) wait for
# range which is wider than RA window (XXX implicit) and so RA # uploading to complete and only then return? (maybe it will help
# triggered before that, even if it overlaps with that last 1/4, # performance even in normal case)
# don't trigger RA that overlaps with next block.
mm.advise(blkmmap[len(blkmmap)*3//4:], mm.MADV_RANDOM) mm.advise(blkmmap[len(blkmmap)*3//4:], mm.MADV_RANDOM)
tdb._files.add(t) tdb._files.add(t)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment