• Sebastian Andrzej Siewior's avatar
    net: Reference bpf_redirect_info via task_struct on PREEMPT_RT. · 401cb7da
    Sebastian Andrzej Siewior authored
    The XDP redirect process is two staged:
    - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
      packet and makes decisions. While doing that, the per-CPU variable
      bpf_redirect_info is used.
    
    - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
      and it may also access other per-CPU variables like xskmap_flush_list.
    
    At the very end of the NAPI callback, xdp_do_flush() is invoked which
    does not access bpf_redirect_info but will touch the individual per-CPU
    lists.
    
    The per-CPU variables are only used in the NAPI callback hence disabling
    bottom halves is the only protection mechanism. Users from preemptible
    context (like cpu_map_kthread_run()) explicitly disable bottom halves
    for protections reasons.
    Without locking in local_bh_disable() on PREEMPT_RT this data structure
    requires explicit locking.
    
    PREEMPT_RT has forced-threaded interrupts enabled and every
    NAPI-callback runs in a thread. If each thread has its own data
    structure then locking can be avoided.
    
    Create a struct bpf_net_context which contains struct bpf_redirect_info.
    Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
    it, bpf_net_ctx_clear() removes it again.
    The bpf_net_ctx_set() may nest. For instance a function can be used from
    within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
    NET_TX_SOFTIRQ which does not. Therefore only the first invocations
    updates the pointer.
    Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
    bpf_redirect_info. The returned data structure is zero initialized to
    ensure nothing is leaked from stack. This is done on first usage of the
    struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to
    note that initialisation is required. First invocation of
    bpf_net_ctx_get_ri() will memset() the data structure and update
    bpf_redirect_info::kern_flags.
    bpf_redirect_info::nh is excluded from memset because it is only used
    once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is
    moved past nh to exclude it from memset.
    
    The pointer to bpf_net_context is saved task's task_struct. Using
    always the bpf_net_context approach has the advantage that there is
    almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.
    
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Cc: Eduard Zingerman <eddyz87@gmail.com>
    Cc: Hao Luo <haoluo@google.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: John Fastabend <john.fastabend@gmail.com>
    Cc: KP Singh <kpsingh@kernel.org>
    Cc: Martin KaFai Lau <martin.lau@linux.dev>
    Cc: Song Liu <song@kernel.org>
    Cc: Stanislav Fomichev <sdf@google.com>
    Cc: Yonghong Song <yonghong.song@linux.dev>
    Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Acked-by: default avatarJesper Dangaard Brouer <hawk@kernel.org>
    Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
    Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
    Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    401cb7da
dev.c 307 KB