• Eduard Zingerman's avatar
    bpf: exact states comparison for iterator convergence checks · 2793a8b0
    Eduard Zingerman authored
    Convergence for open coded iterators is computed in is_state_visited()
    by examining states with branches count > 1 and using states_equal().
    states_equal() computes sub-state relation using read and precision marks.
    Read and precision marks are propagated from children states,
    thus are not guaranteed to be complete inside a loop when branches
    count > 1. This could be demonstrated using the following unsafe program:
    
         1. r7 = -16
         2. r6 = bpf_get_prandom_u32()
         3. while (bpf_iter_num_next(&fp[-8])) {
         4.   if (r6 != 42) {
         5.     r7 = -32
         6.     r6 = bpf_get_prandom_u32()
         7.     continue
         8.   }
         9.   r0 = r10
        10.   r0 += r7
        11.   r8 = *(u64 *)(r0 + 0)
        12.   r6 = bpf_get_prandom_u32()
        13. }
    
    Here verifier would first visit path 1-3, create a checkpoint at 3
    with r7=-16, continue to 4-7,3 with r7=-32.
    
    Because instructions at 9-12 had not been visitied yet existing
    checkpoint at 3 does not have read or precision mark for r7.
    Thus states_equal() would return true and verifier would discard
    current state, thus unsafe memory access at 11 would not be caught.
    
    This commit fixes this loophole by introducing exact state comparisons
    for iterator convergence logic:
    - registers are compared using regs_exact() regardless of read or
      precision marks;
    - stack slots have to have identical type.
    
    Unfortunately, this is too strict even for simple programs like below:
    
        i = 0;
        while(iter_next(&it))
          i++;
    
    At each iteration step i++ would produce a new distinct state and
    eventually instruction processing limit would be reached.
    
    To avoid such behavior speculatively forget (widen) range for
    imprecise scalar registers, if those registers were not precise at the
    end of the previous iteration and do not match exactly.
    
    This a conservative heuristic that allows to verify wide range of
    programs, however it precludes verification of programs that conjure
    an imprecise value on the first loop iteration and use it as precise
    on the second.
    
    Test case iter_task_vma_for_each() presents one of such cases:
    
            unsigned int seen = 0;
            ...
            bpf_for_each(task_vma, vma, task, 0) {
                    if (seen >= 1000)
                            break;
                    ...
                    seen++;
            }
    
    Here clang generates the following code:
    
    <LBB0_4>:
          24:       r8 = r6                          ; stash current value of
                    ... body ...                       'seen'
          29:       r1 = r10
          30:       r1 += -0x8
          31:       call bpf_iter_task_vma_next
          32:       r6 += 0x1                        ; seen++;
          33:       if r0 == 0x0 goto +0x2 <LBB0_6>  ; exit on next() == NULL
          34:       r7 += 0x10
          35:       if r8 < 0x3e7 goto -0xc <LBB0_4> ; loop on seen < 1000
    
    <LBB0_6>:
          ... exit ...
    
    Note that counter in r6 is copied to r8 and then incremented,
    conditional jump is done using r8. Because of this precision mark for
    r6 lags one state behind of precision mark on r8 and widening logic
    kicks in.
    
    Adding barrier_var(seen) after conditional is sufficient to force
    clang use the same register for both counting and conditional jump.
    
    This issue was discussed in the thread [1] which was started by
    Andrew Werner <awerner32@gmail.com> demonstrating a similar bug
    in callback functions handling. The callbacks would be addressed
    in a followup patch.
    
    [1] https://lore.kernel.org/bpf/97a90da09404c65c8e810cf83c94ac703705dc0e.camel@gmail.com/Co-developed-by: default avatarAndrii Nakryiko <andrii.nakryiko@gmail.com>
    Co-developed-by: default avatarAlexei Starovoitov <alexei.starovoitov@gmail.com>
    Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
    Link: https://lore.kernel.org/r/20231024000917.12153-4-eddyz87@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    2793a8b0
iters_task_vma.c 829 Bytes