• Eduard Zingerman's avatar
    bpf: Allow reads from uninit stack · 6715df8d
    Eduard Zingerman authored
    This commits updates the following functions to allow reads from
    uninitialized stack locations when env->allow_uninit_stack option is
    enabled:
    - check_stack_read_fixed_off()
    - check_stack_range_initialized(), called from:
      - check_stack_read_var_off()
      - check_helper_mem_access()
    
    Such change allows to relax logic in stacksafe() to treat STACK_MISC
    and STACK_INVALID in a same way and make the following stack slot
    configurations equivalent:
    
      |  Cached state    |  Current state   |
      |   stack slot     |   stack slot     |
      |------------------+------------------|
      | STACK_INVALID or | STACK_INVALID or |
      | STACK_MISC       | STACK_SPILL   or |
      |                  | STACK_MISC    or |
      |                  | STACK_ZERO    or |
      |                  | STACK_DYNPTR     |
    
    This leads to significant verification speed gains (see below).
    
    The idea was suggested by Andrii Nakryiko [1] and initial patch was
    created by Alexei Starovoitov [2].
    
    Currently the env->allow_uninit_stack is allowed for programs loaded
    by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities.
    
    A number of test cases from verifier/*.c were expecting uninitialized
    stack access to be an error. These test cases were updated to execute
    in unprivileged mode (thus preserving the tests).
    
    The test progs/test_global_func10.c expected "invalid indirect read
    from stack" error message because of the access to uninitialized
    memory region. This error is no longer possible in privileged mode.
    The test is updated to provoke an error "invalid indirect access to
    stack" because of access to invalid stack address (such error is not
    verified by progs/test_global_func*.c series of tests).
    
    The following tests had to be removed because these can't be made
    unprivileged:
    - verifier/sock.c:
      - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init
      stack_value"
      BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode.
    - verifier/var_off.c:
      - "indirect variable-offset stack access, max_off+size > max_initialized"
      - "indirect variable-offset stack access, uninitialized"
      These tests verify that access to uninitialized stack values is
      detected when stack offset is not a constant. However, variable
      stack access is prohibited in unprivileged mode, thus these tests
      are no longer valid.
    
     * * *
    
    Here is veristat log comparing this patch with current master on a
    set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg
    and cilium BPF binaries (see [3]):
    
    $ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log
    File                        Program                     States (A)  States (B)  States    (DIFF)
    --------------------------  --------------------------  ----------  ----------  ----------------
    bpf_host.o                  tail_handle_ipv6_from_host         349         244    -105 (-30.09%)
    bpf_host.o                  tail_handle_nat_fwd_ipv4          1320         895    -425 (-32.20%)
    bpf_lxc.o                   tail_handle_nat_fwd_ipv4          1320         895    -425 (-32.20%)
    bpf_sock.o                  cil_sock4_connect                   70          48     -22 (-31.43%)
    bpf_sock.o                  cil_sock4_sendmsg                   68          46     -22 (-32.35%)
    bpf_xdp.o                   tail_handle_nat_fwd_ipv4          1554         803    -751 (-48.33%)
    bpf_xdp.o                   tail_lb_ipv4                      6457        2473   -3984 (-61.70%)
    bpf_xdp.o                   tail_lb_ipv6                      7249        3908   -3341 (-46.09%)
    pyperf600_bpf_loop.bpf.o    on_event                           287         145    -142 (-49.48%)
    strobemeta.bpf.o            on_event                         15915        4772  -11143 (-70.02%)
    strobemeta_nounroll2.bpf.o  on_event                         17087        3820  -13267 (-77.64%)
    xdp_synproxy_kern.bpf.o     syncookie_tc                     21271        6635  -14636 (-68.81%)
    xdp_synproxy_kern.bpf.o     syncookie_xdp                    23122        6024  -17098 (-73.95%)
    --------------------------  --------------------------  ----------  ----------  ----------------
    
    Note: I limited selection by states_pct<-30%.
    
    Inspection of differences in pyperf600_bpf_loop behavior shows that
    the following patch for the test removes almost all differences:
    
        - a/tools/testing/selftests/bpf/progs/pyperf.h
        + b/tools/testing/selftests/bpf/progs/pyperf.h
        @ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args *ctx)
                }
    
                if (event->pthread_match || !pidData->use_tls) {
        -               void* frame_ptr;
        -               FrameData frame;
        +               void* frame_ptr = 0;
        +               FrameData frame = {};
                        Symbol sym = {};
                        int cur_cpu = bpf_get_smp_processor_id();
    
    W/o this patch the difference comes from the following pattern
    (for different variables):
    
        static bool get_frame_data(... FrameData *frame ...)
        {
            ...
            bpf_probe_read_user(&frame->f_code, ...);
            if (!frame->f_code)
                return false;
            ...
            bpf_probe_read_user(&frame->co_name, ...);
            if (frame->co_name)
                ...;
        }
    
        int __on_event(struct bpf_raw_tracepoint_args *ctx)
        {
            FrameData frame;
            ...
            get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback
            ...
        }
    
        SEC("raw_tracepoint/kfree_skb")
        int on_event(struct bpf_raw_tracepoint_args* ctx)
        {
            ...
            ret |= __on_event(ctx);
            ret |= __on_event(ctx);
            ...
        }
    
    With regards to value `frame->co_name` the following is important:
    - Because of the conditional `if (!frame->f_code)` each call to
      __on_event() produces two states, one with `frame->co_name` marked
      as STACK_MISC, another with it as is (and marked STACK_INVALID on a
      first call).
    - The call to bpf_probe_read_user() does not mark stack slots
      corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks
      these slots as BPF_MISC, this happens because of the following loop
      in the check_helper_call():
    
    	for (i = 0; i < meta.access_size; i++) {
    		err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B,
    				       BPF_WRITE, -1, false);
    		if (err)
    			return err;
    	}
    
      Note the size of the write, it is a one byte write for each byte
      touched by a helper. The BPF_B write does not lead to write marks
      for the target stack slot.
    - Which means that w/o this patch when second __on_event() call is
      verified `if (frame->co_name)` will propagate read marks first to a
      stack slot with STACK_MISC marks and second to a stack slot with
      STACK_INVALID marks and these states would be considered different.
    
    [1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@mail.gmail.com/
    [2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/
    [3] git@github.com:anakryiko/cilium.git
    Suggested-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Co-developed-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
    Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20230219200427.606541-2-eddyz87@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    6715df8d
sock.c 20.9 KB