t/t_utils.h · f13aa600 · Kirill Smelkov / wendelin.core

virtmem: Benchmarks for pagefault handling · 3cfc2728
Kirill Smelkov authored Dec 06, 2017
Benchmark the time it takes for virtmem to handle pagefault with noop
loadblk for loadblk both implemented in C and in Python.

On my computer it is:

	name          µs/op
	PagefaultC    269 ± 0%
	pagefault_py  291 ± 0%

Quite a big time in other words.

It turned out to be mostly spent in fallocate'ing pages on tmpfs from
/dev/shm. Part of the above 269 µs/op is taken by freeing (reclaiming)
pages back when benchmarking work size exceed /dev/shm size, and part to
allocating.

If I limit the work size (via npage in benchmem.c) to be less than whole
/dev/shm it starts to be ~ 170 µs/op and with additional tracing it
shows as something like this:

    	.. on_pagefault_start   0.954 µs
    	.. vma_on_pagefault_pre 0.954 µs
    	.. ramh_alloc_page_pre  0.954 µs
    	.. ramh_alloc_page      169.992 µs
    	.. vma_on_pagefault     172.853 µs
    	.. vma_on_pagefault_pre 172.853 µs
    	.. vma_on_pagefault     174.046 µs
    	.. on_pagefault_end     174.046 µs
    	.. whole:               171.900 µs

so almost all time is spent in ramh_alloc_page which is doing the fallocate:

	https://lab.nexedi.com/nexedi/wendelin.core/blob/f11386a4/bigfile/ram_shmfs.c#L125

Simple benchmark[1] confirmed it is indeed the case for fallocate(tmpfs) to be
relatively slow[2] (and that for recent kernels it regressed somewhat
compared to Linux 3.16). Profile flamegraph for that benchmark[3] shows
internal loading of shmem_fallocate which for 1 hardware page is not
that too slow (e.g. <1µs) but when a request comes for a region
internally performs it page by page and so accumulates that ~ 170µs for 2M.

I've tried to briefly rerun the benchmark with huge pages activated on /dev/shm via

	mount /dev/shm -o huge=always,remount

as both regular user and as root but it was executing several times
slower. Probably something to investigate more later.

[1] https://lab.nexedi.com/kirr/misc/blob/4f84a06e/tmpfs/t_fallocate.c
[2] https://lab.nexedi.com/kirr/misc/blob/4f84a06e/tmpfs/1.txt
[3] https://lab.nexedi.com/kirr/misc/raw/4f84a06e/tmpfs/fallocate-2M-nohuge.svg
3cfc2728
t_utils.h 1.63 KB
Replace t_utils.h