• Kirill Smelkov's avatar
    virtmem: Benchmarks for pagefault handling · 3cfc2728
    Kirill Smelkov authored
    Benchmark the time it takes for virtmem to handle pagefault with noop
    loadblk for loadblk both implemented in C and in Python.
    
    On my computer it is:
    
    	name          µs/op
    	PagefaultC    269 ± 0%
    	pagefault_py  291 ± 0%
    
    Quite a big time in other words.
    
    It turned out to be mostly spent in fallocate'ing pages on tmpfs from
    /dev/shm. Part of the above 269 µs/op is taken by freeing (reclaiming)
    pages back when benchmarking work size exceed /dev/shm size, and part to
    allocating.
    
    If I limit the work size (via npage in benchmem.c) to be less than whole
    /dev/shm it starts to be ~ 170 µs/op and with additional tracing it
    shows as something like this:
    
        	.. on_pagefault_start   0.954 µs
        	.. vma_on_pagefault_pre 0.954 µs
        	.. ramh_alloc_page_pre  0.954 µs
        	.. ramh_alloc_page      169.992 µs
        	.. vma_on_pagefault     172.853 µs
        	.. vma_on_pagefault_pre 172.853 µs
        	.. vma_on_pagefault     174.046 µs
        	.. on_pagefault_end     174.046 µs
        	.. whole:               171.900 µs
    
    so almost all time is spent in ramh_alloc_page which is doing the fallocate:
    
    	https://lab.nexedi.com/nexedi/wendelin.core/blob/f11386a4/bigfile/ram_shmfs.c#L125
    
    Simple benchmark[1] confirmed it is indeed the case for fallocate(tmpfs) to be
    relatively slow[2] (and that for recent kernels it regressed somewhat
    compared to Linux 3.16). Profile flamegraph for that benchmark[3] shows
    internal loading of shmem_fallocate which for 1 hardware page is not
    that too slow (e.g. <1µs) but when a request comes for a region
    internally performs it page by page and so accumulates that ~ 170µs for 2M.
    
    I've tried to briefly rerun the benchmark with huge pages activated on /dev/shm via
    
    	mount /dev/shm -o huge=always,remount
    
    as both regular user and as root but it was executing several times
    slower. Probably something to investigate more later.
    
    [1] https://lab.nexedi.com/kirr/misc/blob/4f84a06e/tmpfs/t_fallocate.c
    [2] https://lab.nexedi.com/kirr/misc/blob/4f84a06e/tmpfs/1.txt
    [3] https://lab.nexedi.com/kirr/misc/raw/4f84a06e/tmpfs/fallocate-2M-nohuge.svg
    3cfc2728
Name
Last commit
Last update
3rdparty Loading commit data...
bigarray Loading commit data...
bigfile Loading commit data...
demo Loading commit data...
include/wendelin Loading commit data...
lib Loading commit data...
t Loading commit data...
.gitignore Loading commit data...
.gitmodules Loading commit data...
CHANGELOG.rst Loading commit data...
COPYING Loading commit data...
Makefile Loading commit data...
README.rst Loading commit data...
setup.py Loading commit data...
tox.ini Loading commit data...
wendelin.py Loading commit data...