• Jamie Lokier's avatar
    [PATCH] Unpinned futexes v2: indexing changes · 968f11a8
    Jamie Lokier authored
    This changes the way futexes are indexed, so that they don't pin pages. 
    It also fixes some bugs with private mappings and COW pages.
    
    Currently, all futexes look up the page at the userspace address and pin
    it, using the pair (page,offset) as an index into a table of waiting
    futexes.  Any page with a futex waiting on it remains pinned in RAM,
    which is a problem when many futexes are used, especially with FUTEX_FD.
    
    Another problem is that the page is not always the correct one, if it
    can be changed later by a COW (copy on write) operation.  This can
    happen when waiting on a futex without writing to it after fork(),
    exec() or mmap(), if the page is then written to before attempting to
    wake a futex at the same adress. 
    
    There are two symptoms of the COW problem:
     - The wrong process can receive wakeups
     - A process can fail to receive required wakeups. 
    
    This patch fixes both by changing the indexing so that VM_SHARED
    mappings use the triple (inode,offset,index), and private mappings use
    the pair (mm,virtual_address).
    
    The former correctly handles all shared mappings, including tmpfs and
    therefore all kinds of shared memory (IPC shm, /dev/shm and
    MAP_ANON|MAP_SHARED).  This works because every mapping which is
    VM_SHARED has an associated non-zero vma->vm_file, and hence inode.
    (This is ensured in do_mmap_pgoff, where it calls shmem_zero_setup). 
    
    The latter handles all private mappings, both files and anonymous.  It
    isn't affected by COW, because it doesn't care about the actual pages,
    just the virtual address.
    
    The patch has a few bonuses:
    
            1. It removes the vcache implementation, as only futexes were
               using it, and they don't any more.
    
            2. Removing the vcache should make COW page faults a bit faster.
    
            3. Futex operations no longer take the page table lock, walk
               the page table, fault in pages that aren't mapped in the
               page table, or do a vcache hash lookup - they are mostly a
               simple offset calculation with one hash for the futex
               table.  So they should be noticably faster.
    
    Special thanks to Hugh Dickins, Andrew Morton and Rusty Russell for
    insightful feedback.  All suggestions are included.
    968f11a8
futex.c 13.5 KB