Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
W wendelin.core
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 4
    • Merge requests 4
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • nexedi
  • wendelin.core
  • Issues
  • #5

Closed
Open
Created Oct 31, 2016 by Kirill Smelkov@kirrOwner

Memory is managed per-process not per whole-system

We had a case when there are several processes using wendelin.core working together at the same time, and if first process starts with some time advance, and allocates most of the memory, those who come later frequently get into OOM condition.

Here is what's happening:

  • wendelin.core inside every process allocates RAM from /dev/shm

  • when a page cannot be allocated wendelin.core tries to free some already allocated memory and then retries allocation

  • if wendelin.core sees it cannot free any memory, then it reports out of memory (OOM) happened.

    https://lab.nexedi.com/nexedi/wendelin.core/blob/e73e22ea/bigfile/virtmem.c#L566

  • the problem is: when there is memory pressure different processes do not feel the pressure caused by each other. I mean if process A is tight on memory it only tries to reclaim memory A allocated and there is no message to second process B to reclaim some memory.

  • so what happens is: first process runs, after some time allocates almost all memory from /dev/shm:

    root@debian-iliya:/srv/slapgrid/slappart17/srv/runner/CollatzConjecture# df -h /dev/shm/
    Filesystem      Size  Used Avail Use% Mounted on
    tmpfs           1.5G  1.5G   51M  97% /dev/shm
    
    root@debian-iliya:/srv/slapgrid/slappart17/srv/runner/CollatzConjecture# lsof /dev/shm/
    COMMAND   PID USER   FD   TYPE DEVICE      SIZE/OFF    NODE NAME
    python  21681 root  DEL    REG   0,17               3314022 /dev/shm/ramh.SeNAWD
    python  21681 root    9u   REG   0,17 1072080879616 3302369 /dev/shm/ramh.ANROVP (deleted)
    python  21681 root   10u   REG   0,17 1099442421760 3314022 /dev/shm/ramh.SeNAWD (deleted)
  • when second process is run it has only very small pool of free RAM in /dev/shm (51M) in the above example, and needs to constantly reclaim.

  • the first process also continues to allocate memory and do periodic reclaims, so at some point in time there can be situation when process 2 has really no memory left to allocate from /dev/shm.

The only solution is to make reclaim part of virtual memory manager to work not per-one process but system-wide. This is what wendelin.core v2 is supposed to do by moving virtual memory manager back into the kernel.

/cc @Tyagov, @Thetechguy, @klaus

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7