• Andrii Nakryiko's avatar
    bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY · fc970227
    Andrii Nakryiko authored
    Add ability to memory-map contents of BPF array map. This is extremely useful
    for working with BPF global data from userspace programs. It allows to avoid
    typical bpf_map_{lookup,update}_elem operations, improving both performance
    and usability.
    
    There had to be special considerations for map freezing, to avoid having
    writable memory view into a frozen map. To solve this issue, map freezing and
    mmap-ing is happening under mutex now:
      - if map is already frozen, no writable mapping is allowed;
      - if map has writable memory mappings active (accounted in map->writecnt),
        map freezing will keep failing with -EBUSY;
      - once number of writable memory mappings drops to zero, map freezing can be
        performed again.
    
    Only non-per-CPU plain arrays are supported right now. Maps with spinlocks
    can't be memory mapped either.
    
    For BPF_F_MMAPABLE array, memory allocation has to be done through vmalloc()
    to be mmap()'able. We also need to make sure that array data memory is
    page-sized and page-aligned, so we over-allocate memory in such a way that
    struct bpf_array is at the end of a single page of memory with array->value
    being aligned with the start of the second page. On deallocation we need to
    accomodate this memory arrangement to free vmalloc()'ed memory correctly.
    
    One important consideration regarding how memory-mapping subsystem functions.
    Memory-mapping subsystem provides few optional callbacks, among them open()
    and close().  close() is called for each memory region that is unmapped, so
    that users can decrease their reference counters and free up resources, if
    necessary. open() is *almost* symmetrical: it's called for each memory region
    that is being mapped, **except** the very first one. So bpf_map_mmap does
    initial refcnt bump, while open() will do any extra ones after that. Thus
    number of close() calls is equal to number of open() calls plus one more.
    Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Acked-by: default avatarSong Liu <songliubraving@fb.com>
    Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
    Link: https://lore.kernel.org/bpf/20191117172806.2195367-4-andriin@fb.com
    fc970227
syscall.c 72.2 KB