• Roman Gushchin's avatar
    bpf: decouple the lifetime of cgroup_bpf from cgroup itself · 4bfc0bb2
    Roman Gushchin authored
    Currently the lifetime of bpf programs attached to a cgroup is bound
    to the lifetime of the cgroup itself. It means that if a user
    forgets (or intentionally avoids) to detach a bpf program before
    removing the cgroup, it will stay attached up to the release of the
    cgroup. Since the cgroup can stay in the dying state (the state
    between being rmdir()'ed and being released) for a very long time, it
    leads to a waste of memory. Also, it blocks a possibility to implement
    the memcg-based memory accounting for bpf objects, because a circular
    reference dependency will occur. Charged memory pages are pinning the
    corresponding memory cgroup, and if the memory cgroup is pinning
    the attached bpf program, nothing will be ever released.
    
    A dying cgroup can not contain any processes, so the only chance for
    an attached bpf program to be executed is a live socket associated
    with the cgroup. So in order to release all bpf data early, let's
    count associated sockets using a new percpu refcounter. On cgroup
    removal the counter is transitioned to the atomic mode, and as soon
    as it reaches 0, all bpf programs are detached.
    
    Because cgroup_bpf_release() can block, it can't be called from
    the percpu ref counter callback directly, so instead an asynchronous
    work is scheduled.
    
    The reference counter is not socket specific, and can be used for any
    other types of programs, which can be executed from a cgroup-bpf hook
    outside of the process context, had such a need arise in the future.
    Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
    Cc: jolsa@redhat.com
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    4bfc0bb2
cgroup.c 167 KB