1. 30 Mar, 2020 1 commit
    • Daniel Borkmann's avatar
      bpf: Undo incorrect __reg_bound_offset32 handling · f2d67fec
      Daniel Borkmann authored
      Anatoly has been fuzzing with kBdysch harness and reported a hang in
      one of the outcomes:
      
        0: (b7) r0 = 808464432
        1: (7f) r0 >>= r0
        2: (14) w0 -= 808464432
        3: (07) r0 += 808464432
        4: (b7) r1 = 808464432
        5: (de) if w1 s<= w0 goto pc+0
         R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020;0x10000001f)) R1_w=invP808464432 R10=fp0
        6: (07) r0 += -2144337872
        7: (14) w0 -= -1607454672
        8: (25) if r0 > 0x30303030 goto pc+0
         R0_w=invP(id=0,umin_value=271581184,umax_value=271581311,var_off=(0x10300000;0x7f)) R1_w=invP808464432 R10=fp0
        9: (76) if w0 s>= 0x303030 goto pc+2
        12: (95) exit
      
        from 8 to 9: safe
      
        from 5 to 6: R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020;0x10000001f)) R1_w=invP808464432 R10=fp0
        6: (07) r0 += -2144337872
        7: (14) w0 -= -1607454672
        8: (25) if r0 > 0x30303030 goto pc+0
         R0_w=invP(id=0,umin_value=271581184,umax_value=271581311,var_off=(0x10300000;0x7f)) R1_w=invP808464432 R10=fp0
        9: safe
      
        from 8 to 9: safe
        verification time 589 usec
        stack depth 0
        processed 17 insns (limit 1000000) [...]
      
      The underlying program was xlated as follows:
      
        # bpftool p d x i 9
         0: (b7) r0 = 808464432
         1: (7f) r0 >>= r0
         2: (14) w0 -= 808464432
         3: (07) r0 += 808464432
         4: (b7) r1 = 808464432
         5: (de) if w1 s<= w0 goto pc+0
         6: (07) r0 += -2144337872
         7: (14) w0 -= -1607454672
         8: (25) if r0 > 0x30303030 goto pc+0
         9: (76) if w0 s>= 0x303030 goto pc+2
        10: (05) goto pc-1
        11: (05) goto pc-1
        12: (95) exit
      
      The verifier rewrote original instructions it recognized as dead code with
      'goto pc-1', but reality differs from verifier simulation in that we're
      actually able to trigger a hang due to hitting the 'goto pc-1' instructions.
      
      Taking different examples to make the issue more obvious: in this example
      we're probing bounds on a completely unknown scalar variable in r1:
      
        [...]
        5: R0_w=inv1 R1_w=inv(id=0) R10=fp0
        5: (18) r2 = 0x4000000000
        7: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R10=fp0
        7: (18) r3 = 0x2000000000
        9: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R10=fp0
        9: (18) r4 = 0x400
        11: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R10=fp0
        11: (18) r5 = 0x200
        13: R0_w=inv1 R1_w=inv(id=0) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
        13: (2d) if r1 > r2 goto pc+4
         R0_w=inv1 R1_w=inv(id=0,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
        14: R0_w=inv1 R1_w=inv(id=0,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
        14: (ad) if r1 < r3 goto pc+3
         R0_w=inv1 R1_w=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2_w=inv274877906944 R3_w=inv137438953472 R4_w=inv1024 R5_w=inv512 R10=fp0
        15: R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7fffffffff)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
        15: (2e) if w1 > w4 goto pc+2
         R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
        16: R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
        16: (ae) if w1 < w5 goto pc+1
         R0=inv1 R1=inv(id=0,umin_value=137438953472,umax_value=274877906944,var_off=(0x0; 0x7f00000000)) R2=inv274877906944 R3=inv137438953472 R4=inv1024 R5=inv512 R10=fp0
        [...]
      
      We're first probing lower/upper bounds via jmp64, later we do a similar
      check via jmp32 and examine the resulting var_off there. After fall-through
      in insn 14, we get the following bounded r1 with 0x7fffffffff unknown marked
      bits in the variable section.
      
      Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000:
      
        max: 0b100000000000000000000000000000000000000 / 0x4000000000
        var: 0b111111111111111111111111111111111111111 / 0x7fffffffff
        min: 0b010000000000000000000000000000000000000 / 0x2000000000
      
      Now, in insn 15 and 16, we perform a similar probe with lower/upper bounds
      in jmp32.
      
      Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000 and
                          w1 <= 0x400        and w1 >= 0x200:
      
        max: 0b100000000000000000000000000000000000000 / 0x4000000000
        var: 0b111111100000000000000000000000000000000 / 0x7f00000000
        min: 0b010000000000000000000000000000000000000 / 0x2000000000
      
      The lower/upper bounds haven't changed since they have high bits set in
      u64 space and the jmp32 tests can only refine bounds in the low bits.
      
      However, for the var part the expectation would have been 0x7f000007ff
      or something less precise up to 0x7fffffffff. A outcome of 0x7f00000000
      is not correct since it would contradict the earlier probed bounds
      where we know that the result should have been in [0x200,0x400] in u32
      space. Therefore, tests with such info will lead to wrong verifier
      assumptions later on like falsely predicting conditional jumps to be
      always taken, etc.
      
      The issue here is that __reg_bound_offset32()'s implementation from
      commit 581738a6 ("bpf: Provide better register bounds after jmp32
      instructions") makes an incorrect range assumption:
      
        static void __reg_bound_offset32(struct bpf_reg_state *reg)
        {
              u64 mask = 0xffffFFFF;
              struct tnum range = tnum_range(reg->umin_value & mask,
                                             reg->umax_value & mask);
              struct tnum lo32 = tnum_cast(reg->var_off, 4);
              struct tnum hi32 = tnum_lshift(tnum_rshift(reg->var_off, 32), 32);
      
              reg->var_off = tnum_or(hi32, tnum_intersect(lo32, range));
        }
      
      In the above walk-through example, __reg_bound_offset32() as-is chose
      a range after masking with 0xffffffff of [0x0,0x0] since umin:0x2000000000
      and umax:0x4000000000 and therefore the lo32 part was clamped to 0x0 as
      well. However, in the umin:0x2000000000 and umax:0x4000000000 range above
      we'd end up with an actual possible interval of [0x0,0xffffffff] for u32
      space instead.
      
      In case of the original reproducer, the situation looked as follows at
      insn 5 for r0:
      
        [...]
        5: R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x0; 0x1ffffffff)) R1_w=invP808464432 R10=fp0
                                     0x30303030           0x13030302f
        5: (de) if w1 s<= w0 goto pc+0
         R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x30303020; 0x10000001f)) R1_w=invP808464432 R10=fp0
                                   0x30303030           0x13030302f
        [...]
      
      After the fall-through, we similarly forced the var_off result into
      the wrong range [0x30303030,0x3030302f] suggesting later on that fixed
      bits must only be of 0x30303020 with 0x10000001f unknowns whereas such
      assumption can only be made when both bounds in hi32 range match.
      
      Originally, I was thinking to fix this by moving reg into a temp reg and
      use proper coerce_reg_to_size() helper on the temp reg where we can then
      based on that define the range tnum for later intersection:
      
        static void __reg_bound_offset32(struct bpf_reg_state *reg)
        {
              struct bpf_reg_state tmp = *reg;
              struct tnum lo32, hi32, range;
      
              coerce_reg_to_size(&tmp, 4);
              range = tnum_range(tmp.umin_value, tmp.umax_value);
              lo32 = tnum_cast(reg->var_off, 4);
              hi32 = tnum_lshift(tnum_rshift(reg->var_off, 32), 32);
              reg->var_off = tnum_or(hi32, tnum_intersect(lo32, range));
        }
      
      In the case of the concrete example, this gives us a more conservative unknown
      section. Thus, after knowing r1 <= 0x4000000000 and r1 >= 0x2000000000 and
                                   w1 <= 0x400        and w1 >= 0x200:
      
        max: 0b100000000000000000000000000000000000000 / 0x4000000000
        var: 0b111111111111111111111111111111111111111 / 0x7fffffffff
        min: 0b010000000000000000000000000000000000000 / 0x2000000000
      
      However, above new __reg_bound_offset32() has no effect on refining the
      knowledge of the register contents. Meaning, if the bounds in hi32 range
      mismatch we'll get the identity function given the range reg spans
      [0x0,0xffffffff] and we cast var_off into lo32 only to later on binary
      or it again with the hi32.
      
      Likewise, if the bounds in hi32 range match, then we mask both bounds
      with 0xffffffff, use the resulting umin/umax for the range to later
      intersect the lo32 with it. However, _prior_ called __reg_bound_offset()
      did already such intersection on the full reg and we therefore would only
      repeat the same operation on the lo32 part twice.
      
      Given this has no effect and the original commit had false assumptions,
      this patch reverts the code entirely which is also more straight forward
      for stable trees: apparently 581738a6 got auto-selected by Sasha's
      ML system and misclassified as a fix, so it got sucked into v5.4 where
      it should never have landed. A revert is low-risk also from a user PoV
      since it requires a recent kernel and llc to opt-into -mcpu=v3 BPF CPU
      to generate jmp32 instructions. A proper bounds refinement would need a
      significantly more complex approach which is currently being worked, but
      no stable material [0]. Hence revert is best option for stable. After the
      revert, the original reported program gets rejected as follows:
      
        1: (7f) r0 >>= r0
        2: (14) w0 -= 808464432
        3: (07) r0 += 808464432
        4: (b7) r1 = 808464432
        5: (de) if w1 s<= w0 goto pc+0
         R0_w=invP(id=0,umin_value=808464432,umax_value=5103431727,var_off=(0x0; 0x1ffffffff)) R1_w=invP808464432 R10=fp0
        6: (07) r0 += -2144337872
        7: (14) w0 -= -1607454672
        8: (25) if r0 > 0x30303030 goto pc+0
         R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x3fffffff)) R1_w=invP808464432 R10=fp0
        9: (76) if w0 s>= 0x303030 goto pc+2
         R0=invP(id=0,umax_value=3158063,var_off=(0x0; 0x3fffff)) R1=invP808464432 R10=fp0
        10: (30) r0 = *(u8 *)skb[808464432]
        BPF_LD_[ABS|IND] uses reserved fields
        processed 11 insns (limit 1000000) [...]
      
        [0] https://lore.kernel.org/bpf/158507130343.15666.8018068546764556975.stgit@john-Precision-5820-Tower/T/
      
      Fixes: 581738a6 ("bpf: Provide better register bounds after jmp32 instructions")
      Reported-by: default avatarAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200330160324.15259-2-daniel@iogearbox.net
      f2d67fec
  2. 29 Mar, 2020 12 commits
  3. 28 Mar, 2020 15 commits
  4. 26 Mar, 2020 6 commits
    • YueHaibing's avatar
      bpf: Remove unused vairable 'bpf_xdp_link_lops' · f54a5bba
      YueHaibing authored
      kernel/bpf/syscall.c:2263:34: warning: 'bpf_xdp_link_lops' defined but not used [-Wunused-const-variable=]
       static const struct bpf_link_ops bpf_xdp_link_lops;
                                        ^~~~~~~~~~~~~~~~~
      
      commit 70ed506c ("bpf: Introduce pinnable bpf_link abstraction")
      involded this unused variable, remove it.
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200326031613.19372-1-yuehaibing@huawei.com
      f54a5bba
    • Andrii Nakryiko's avatar
      bpf: Factor out attach_type to prog_type mapping for attach/detach · e28784e3
      Andrii Nakryiko authored
      Factor out logic mapping expected program attach type to program type and
      subsequent handling of program attach/detach. Also list out all supported
      cgroup BPF program types explicitly to prevent accidental bugs once more
      program types are added to a mapping. Do the same for prog_query API.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200325065746.640559-3-andriin@fb.com
      e28784e3
    • Andrii Nakryiko's avatar
      bpf: Factor out cgroup storages operations · 00c4eddf
      Andrii Nakryiko authored
      Refactor cgroup attach/detach code to abstract away common operations
      performed on all types of cgroup storages. This makes high-level logic more
      apparent, plus allows to reuse more code across multiple functions.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200325065746.640559-2-andriin@fb.com
      00c4eddf
    • John Fastabend's avatar
      bpf: Test_verifier, #70 error message updates for 32-bit right shift · aa131ed4
      John Fastabend authored
      After changes to add update_reg_bounds after ALU ops and adding ALU32
      bounds tracking the error message is changed in the 32-bit right shift
      tests.
      
      Test "#70/u bounds check after 32-bit right shift with 64-bit input FAIL"
      now fails with,
      
      Unexpected error message!
      	EXP: R0 invalid mem access
      	RES: func#0 @0
      
      7: (b7) r1 = 2
      8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP2 R10=fp0 fp-8_w=mmmmmmmm
      8: (67) r1 <<= 31
      9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967296 R10=fp0 fp-8_w=mmmmmmmm
      9: (74) w1 >>= 31
      10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm
      10: (14) w1 -= 2
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=invP4294967294 R10=fp0 fp-8_w=mmmmmmmm
      11: (0f) r0 += r1
      math between map_value pointer and 4294967294 is not allowed
      
      And test "#70/p bounds check after 32-bit right shift with 64-bit input
      FAIL" now fails with,
      
      Unexpected error message!
      	EXP: R0 invalid mem access
      	RES: func#0 @0
      
      7: (b7) r1 = 2
      8: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv2 R10=fp0 fp-8_w=mmmmmmmm
      8: (67) r1 <<= 31
      9: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967296 R10=fp0 fp-8_w=mmmmmmmm
      9: (74) w1 >>= 31
      10: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv0 R10=fp0 fp-8_w=mmmmmmmm
      10: (14) w1 -= 2
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,imm=0) R1_w=inv4294967294 R10=fp0 fp-8_w=mmmmmmmm
      11: (0f) r0 += r1
      last_idx 11 first_idx 0
      regs=2 stack=0 before 10: (14) w1 -= 2
      regs=2 stack=0 before 9: (74) w1 >>= 31
      regs=2 stack=0 before 8: (67) r1 <<= 31
      regs=2 stack=0 before 7: (b7) r1 = 2
      math between map_value pointer and 4294967294 is not allowed
      
      Before this series we did not trip the "math between map_value pointer..."
      error because check_reg_sane_offset is never called in
      adjust_ptr_min_max_vals(). Instead we have a register state that looks
      like this at line 11*,
      
      11: R0_w=map_value(id=0,off=0,ks=8,vs=8,
                         smin_value=0,smax_value=0,
                         umin_value=0,umax_value=0,
                         var_off=(0x0; 0x0))
          R1_w=invP(id=0,
                    smin_value=0,smax_value=4294967295,
                    umin_value=0,umax_value=4294967295,
                    var_off=(0xfffffffe; 0x0))
          R10=fp(id=0,off=0,
                 smin_value=0,smax_value=0,
                 umin_value=0,umax_value=0,
                 var_off=(0x0; 0x0)) fp-8_w=mmmmmmmm
      11: (0f) r0 += r1
      
      In R1 'smin_val != smax_val' yet we have a tnum_const as seen
      by 'var_off(0xfffffffe; 0x0))' with a 0x0 mask. So we hit this check
      in adjust_ptr_min_max_vals()
      
       if ((known && (smin_val != smax_val || umin_val != umax_val)) ||
            smin_val > smax_val || umin_val > umax_val) {
             /* Taint dst register if offset had invalid bounds derived from
              * e.g. dead branches.
              */
             __mark_reg_unknown(env, dst_reg);
             return 0;
       }
      
      So we don't throw an error here and instead only throw an error
      later in the verification when the memory access is made.
      
      The root cause in verifier without alu32 bounds tracking is having
      'umin_value = 0' and 'umax_value = U64_MAX' from BPF_SUB which we set
      when 'umin_value < umax_val' here,
      
       if (dst_reg->umin_value < umax_val) {
          /* Overflow possible, we know nothing */
          dst_reg->umin_value = 0;
          dst_reg->umax_value = U64_MAX;
       } else { ...}
      
      Later in adjust_calar_min_max_vals we previously did a
      coerce_reg_to_size() which will clamp the U64_MAX to U32_MAX by
      truncating to 32bits. But either way without a call to update_reg_bounds
      the less precise bounds tracking will fall out of the alu op
      verification.
      
      After latest changes we now exit adjust_scalar_min_max_vals with the
      more precise umin value, due to zero extension propogating bounds from
      alu32 bounds into alu64 bounds and then calling update_reg_bounds.
      This then causes the verifier to trigger an earlier error and we get
      the error in the output above.
      
      This patch updates tests to reflect new error message.
      
      * I have a local patch to print entire verifier state regardless if we
       believe it is a constant so we can get a full picture of the state.
       Usually if tnum_is_const() then bounds are also smin=smax, etc. but
       this is not always true and is a bit subtle. Being able to see these
       states helps understand dataflow imo. Let me know if we want something
       similar upstream.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158507161475.15666.3061518385241144063.stgit@john-Precision-5820-Tower
      aa131ed4
    • John Fastabend's avatar
      bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds() · 294f2fc6
      John Fastabend authored
      Currently, for all op verification we call __red_deduce_bounds() and
      __red_bound_offset() but we only call __update_reg_bounds() in bitwise
      ops. However, we could benefit from calling __update_reg_bounds() in
      BPF_ADD, BPF_SUB, and BPF_MUL cases as well.
      
      For example, a register with state 'R1_w=invP0' when we subtract from
      it,
      
       w1 -= 2
      
      Before coerce we will now have an smin_value=S64_MIN, smax_value=U64_MAX
      and unsigned bounds umin_value=0, umax_value=U64_MAX. These will then
      be clamped to S32_MIN, U32_MAX values by coerce in the case of alu32 op
      as done in above example. However tnum will be a constant because the
      ALU op is done on a constant.
      
      Without update_reg_bounds() we have a scenario where tnum is a const
      but our unsigned bounds do not reflect this. By calling update_reg_bounds
      after coerce to 32bit we further refine the umin_value to U64_MAX in the
      alu64 case or U32_MAX in the alu32 case above.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158507151689.15666.566796274289413203.stgit@john-Precision-5820-Tower
      294f2fc6
    • John Fastabend's avatar
      bpf: Verifer, refactor adjust_scalar_min_max_vals · 07cd2631
      John Fastabend authored
      Pull per op ALU logic into individual functions. We are about to add
      u32 versions of each of these by pull them out the code gets a bit
      more readable here and nicer in the next patch.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158507149518.15666.15672349629329072411.stgit@john-Precision-5820-Tower
      07cd2631
  5. 25 Mar, 2020 3 commits
  6. 23 Mar, 2020 3 commits