Memory is managed per-process not per whole-system (#5) · Issues · nexedi / wendelin.core

Memory is managed per-process not per whole-system

We had a case when there are several processes using wendelin.core working together at the same time, and if first process starts with some time advance, and allocates most of the memory, those who come later frequently get into OOM condition.

Here is what's happening:

wendelin.core inside every process allocates RAM from /dev/shm
when a page cannot be allocated wendelin.core tries to free some already allocated memory and then retries allocation
if wendelin.core sees it cannot free any memory, then it reports out of memory (OOM) happened.

https://lab.nexedi.com/nexedi/wendelin.core/blob/e73e22ea/bigfile/virtmem.c#L566
the problem is: when there is memory pressure different processes do not feel the pressure caused by each other. I mean if process A is tight on memory it only tries to reclaim memory A allocated and there is no message to second process B to reclaim some memory.

so what happens is: first process runs, after some time allocates almost all memory from /dev/shm:

root@debian-iliya:/srv/slapgrid/slappart17/srv/runner/CollatzConjecture# df -h /dev/shm/
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           1.5G  1.5G   51M  97% /dev/shm

root@debian-iliya:/srv/slapgrid/slappart17/srv/runner/CollatzConjecture# lsof /dev/shm/
COMMAND   PID USER   FD   TYPE DEVICE      SIZE/OFF    NODE NAME
python  21681 root  DEL    REG   0,17               3314022 /dev/shm/ramh.SeNAWD
python  21681 root    9u   REG   0,17 1072080879616 3302369 /dev/shm/ramh.ANROVP (deleted)
python  21681 root   10u   REG   0,17 1099442421760 3314022 /dev/shm/ramh.SeNAWD (deleted)

when second process is run it has only very small pool of free RAM in /dev/shm (51M) in the above example, and needs to constantly reclaim.
the first process also continues to allocate memory and do periodic reclaims, so at some point in time there can be situation when process 2 has really no memory left to allocate from /dev/shm.

The only solution is to make reclaim part of virtual memory manager to work not per-one process but system-wide. This is what wendelin.core v2 is supposed to do by moving virtual memory manager back into the kernel.

/cc @Tyagov, @Thetechguy, @klaus

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information