1. 11 Apr, 2019 6 commits
  2. 10 Apr, 2019 19 commits
    • Magnus Karlsson's avatar
      libbpf: fix crash in XDP socket part with new larger BPF_LOG_BUF_SIZE · 50bd645b
      Magnus Karlsson authored
      In commit da11b417 ("libbpf: teach libbpf about log_level bit 2"),
      the BPF_LOG_BUF_SIZE was increased to 16M. The XDP socket part of
      libbpf allocated the log_buf on the stack, but for the new 16M buffer
      size this is not going to work. Change the code so it uses a 16K buffer
      instead.
      
      Fixes: da11b417 ("libbpf: teach libbpf about log_level bit 2")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      50bd645b
    • Yonghong Song's avatar
      bpf, bpftool: fix a few ubsan warnings · 69a0f9ec
      Yonghong Song authored
      The issue is reported at https://github.com/libbpf/libbpf/issues/28.
      
      Basically, per C standard, for
        void *memcpy(void *dest, const void *src, size_t n)
      if "dest" or "src" is NULL, regardless of whether "n" is 0 or not,
      the result of memcpy is undefined. clang ubsan reported three such
      instances in bpf.c with the following pattern:
        memcpy(dest, 0, 0).
      
      Although in practice, no known compiler will cause issues when
      copy size is 0. Let us still fix the issue to silence ubsan
      warnings.
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      69a0f9ec
    • Alexei Starovoitov's avatar
      Merge branch 'support-global-data' · 6316f783
      Alexei Starovoitov authored
      Daniel Borkmann says:
      
      ====================
      This series is a major rework of previously submitted libbpf
      patches [0] in order to add global data support for BPF. The
      kernel has been extended to add proper infrastructure that allows
      for full .bss/.data/.rodata sections on BPF loader side based
      upon feedback from LPC discussions [1]. Latter support is then
      also added into libbpf in this series which allows for more
      natural C-like programming of BPF programs. For more information
      on loader, please refer to 'bpf, libbpf: support global data/bss/
      rodata sections' patch in this series.
      
      Thanks a lot!
      
        v5 -> v6:
         - Removed synchronize_rcu() from map freeze (Jann)
         - Rest as-is
        v4 -> v5:
         - Removed index selection again for ldimm64 (Alexei)
         - Adapted related test cases and added new ones to test
           rejection of off != 0
        v3 -> v4:
         - Various fixes in BTF verification e.g. to disallow
           Var and DataSec to be an intermediate type during resolve (Martin)
         - More BTF test cases added
         - Few cleanups in key-less BTF commit (Martin)
         - Bump libbpf minor version from 2 to 3
         - Renamed and simplified read-only locking
         - Various minor improvements all over the place
        v2 -> v3:
         - Implement BTF support in kernel, libbpf, bpftool, add tests
         - Fix idx + off conversion (Andrii)
         - Document lower / higher bits for direct value access (Andrii)
         - Add tests with small value size (Andrii)
         - Add index selection into ldimm64 (Andrii)
         - Fix missing fdput() (Jann)
         - Reject invalid flags in BPF_F_*_PROG (Jakub)
         - Complete rework of libbpf support, includes:
          - Add objname to map name (Stanislav)
          - Make .rodata map full read-only after setup (Andrii)
          - Merge relocation handling into single one (Andrii)
          - Store global maps into obj->maps array (Andrii, Alexei)
          - Debug message when skipping section (Andrii)
          - Reject non-static global data till we have
            semantics for sharing them (Yonghong, Andrii, Alexei)
          - More test cases and completely reworked prog test (Alexei)
         - Fixes, cleanups, etc all over the set
         - Not yet addressed:
          - Make BTF mandatory for these maps (Alexei)
          -> Waiting till BTF support for these lands first
        v1 -> v2:
          - Instead of 32-bit static data, implement full global
            data support (Alexei)
      
        [0] https://patchwork.ozlabs.org/cover/1040290/
        [1] http://vger.kernel.org/lpc-bpf2018.html#session-3
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6316f783
    • Daniel Borkmann's avatar
      bpf, selftest: add test cases for BTF Var and DataSec · c861168b
      Daniel Borkmann authored
      Extend test_btf with various positive and negative tests around
      BTF verification of kind Var and DataSec. All passing as well:
      
        # ./test_btf
        [...]
        BTF raw test[4] (global data test #1): OK
        BTF raw test[5] (global data test #2): OK
        BTF raw test[6] (global data test #3): OK
        BTF raw test[7] (global data test #4, unsupported linkage): OK
        BTF raw test[8] (global data test #5, invalid var type): OK
        BTF raw test[9] (global data test #6, invalid var type (fwd type)): OK
        BTF raw test[10] (global data test #7, invalid var type (fwd type)): OK
        BTF raw test[11] (global data test #8, invalid var size): OK
        BTF raw test[12] (global data test #9, invalid var size): OK
        BTF raw test[13] (global data test #10, invalid var size): OK
        BTF raw test[14] (global data test #11, multiple section members): OK
        BTF raw test[15] (global data test #12, invalid offset): OK
        BTF raw test[16] (global data test #13, invalid offset): OK
        BTF raw test[17] (global data test #14, invalid offset): OK
        BTF raw test[18] (global data test #15, not var kind): OK
        BTF raw test[19] (global data test #16, invalid var referencing sec): OK
        BTF raw test[20] (global data test #17, invalid var referencing var): OK
        BTF raw test[21] (global data test #18, invalid var loop): OK
        BTF raw test[22] (global data test #19, invalid var referencing var): OK
        BTF raw test[23] (global data test #20, invalid ptr referencing var): OK
        BTF raw test[24] (global data test #21, var included in struct): OK
        BTF raw test[25] (global data test #22, array of var): OK
        [...]
        PASS:167 SKIP:0 FAIL:0
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c861168b
    • Joe Stringer's avatar
      bpf, selftest: test global data/bss/rodata sections · b915ebe6
      Joe Stringer authored
      Add tests for libbpf relocation of static variable references
      into the .data, .rodata and .bss sections of the ELF, also add
      read-only test for .rodata. All passing:
      
        # ./test_progs
        [...]
        test_global_data:PASS:load program 0 nsec
        test_global_data:PASS:pass global data run 925 nsec
        test_global_data_number:PASS:relocate .bss reference 925 nsec
        test_global_data_number:PASS:relocate .data reference 925 nsec
        test_global_data_number:PASS:relocate .rodata reference 925 nsec
        test_global_data_number:PASS:relocate .bss reference 925 nsec
        test_global_data_number:PASS:relocate .data reference 925 nsec
        test_global_data_number:PASS:relocate .rodata reference 925 nsec
        test_global_data_number:PASS:relocate .bss reference 925 nsec
        test_global_data_number:PASS:relocate .bss reference 925 nsec
        test_global_data_number:PASS:relocate .rodata reference 925 nsec
        test_global_data_number:PASS:relocate .rodata reference 925 nsec
        test_global_data_number:PASS:relocate .rodata reference 925 nsec
        test_global_data_string:PASS:relocate .rodata reference 925 nsec
        test_global_data_string:PASS:relocate .data reference 925 nsec
        test_global_data_string:PASS:relocate .bss reference 925 nsec
        test_global_data_string:PASS:relocate .data reference 925 nsec
        test_global_data_string:PASS:relocate .bss reference 925 nsec
        test_global_data_struct:PASS:relocate .rodata reference 925 nsec
        test_global_data_struct:PASS:relocate .bss reference 925 nsec
        test_global_data_struct:PASS:relocate .rodata reference 925 nsec
        test_global_data_struct:PASS:relocate .data reference 925 nsec
        test_global_data_rdonly:PASS:test .rodata read-only map 925 nsec
        [...]
        Summary: 229 PASSED, 0 FAILED
      
      Note map helper signatures have been changed to avoid warnings
      when passing in const data.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: default avatarJoe Stringer <joe@wand.net.nz>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b915ebe6
    • Daniel Borkmann's avatar
      bpf, selftest: test {rd, wr}only flags and direct value access · fb2abb73
      Daniel Borkmann authored
      Extend test_verifier with various test cases around the two kernel
      extensions, that is, {rd,wr}only map support as well as direct map
      value access. All passing, one skipped due to xskmap not present
      on test machine:
      
        # ./test_verifier
        [...]
        #948/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK
        #949/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK
        #950/p XDP pkt read, pkt_data <= pkt_meta', good access OK
        #951/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
        #952/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
        Summary: 1410 PASSED, 1 SKIPPED, 0 FAILED
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fb2abb73
    • Daniel Borkmann's avatar
      bpf: bpftool support for dumping data/bss/rodata sections · 817998af
      Daniel Borkmann authored
      Add the ability to bpftool to handle BTF Var and DataSec kinds
      in order to dump them out of btf_dumper_type(). The value has a
      single object with the section name, which itself holds an array
      of variables it dumps. A single variable is an object by itself
      printed along with its name. From there further type information
      is dumped along with corresponding value information.
      
      Example output from .rodata:
      
        # ./bpftool m d i 150
        [{
                "value": {
                    ".rodata": [{
                            "load_static_data.bar": 18446744073709551615
                        },{
                            "num2": 24
                        },{
                            "num5": 43947
                        },{
                            "num6": 171
                        },{
                            "str0": [97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,0,0,0,0,0,0
                            ]
                        },{
                            "struct0": {
                                "a": 42,
                                "b": 4278120431,
                                "c": 1229782938247303441
                            }
                        },{
                            "struct2": {
                                "a": 0,
                                "b": 0,
                                "c": 0
                            }
                        }
                    ]
                }
            }
        ]
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      817998af
    • Daniel Borkmann's avatar
      bpf, libbpf: add support for BTF Var and DataSec · 1713d68b
      Daniel Borkmann authored
      This adds libbpf support for BTF Var and DataSec kinds. Main point
      here is that libbpf needs to do some preparatory work before the
      whole BTF object can be loaded into the kernel, that is, fixing up
      of DataSec size taken from the ELF section size and non-static
      variable offset which needs to be taken from the ELF's string section.
      
      Upstream LLVM doesn't fix these up since at time of BTF emission
      it is too early in the compilation process thus this information
      isn't available yet, hence loader needs to take care of it.
      
      Note, deduplication handling has not been in the scope of this work
      and needs to be addressed in a future commit.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://reviews.llvm.org/D59441Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1713d68b
    • Daniel Borkmann's avatar
      bpf, libbpf: support global data/bss/rodata sections · d859900c
      Daniel Borkmann authored
      This work adds BPF loader support for global data sections
      to libbpf. This allows to write BPF programs in more natural
      C-like way by being able to define global variables and const
      data.
      
      Back at LPC 2018 [0] we presented a first prototype which
      implemented support for global data sections by extending BPF
      syscall where union bpf_attr would get additional memory/size
      pair for each section passed during prog load in order to later
      add this base address into the ldimm64 instruction along with
      the user provided offset when accessing a variable. Consensus
      from LPC was that for proper upstream support, it would be
      more desirable to use maps instead of bpf_attr extension as
      this would allow for introspection of these sections as well
      as potential live updates of their content. This work follows
      this path by taking the following steps from loader side:
      
       1) In bpf_object__elf_collect() step we pick up ".data",
          ".rodata", and ".bss" section information.
      
       2) If present, in bpf_object__init_internal_map() we add
          maps to the obj's map array that corresponds to each
          of the present sections. Given section size and access
          properties can differ, a single entry array map is
          created with value size that is corresponding to the
          ELF section size of .data, .bss or .rodata. These
          internal maps are integrated into the normal map
          handling of libbpf such that when user traverses all
          obj maps, they can be differentiated from user-created
          ones via bpf_map__is_internal(). In later steps when
          we actually create these maps in the kernel via
          bpf_object__create_maps(), then for .data and .rodata
          sections their content is copied into the map through
          bpf_map_update_elem(). For .bss this is not necessary
          since array map is already zero-initialized by default.
          Additionally, for .rodata the map is frozen as read-only
          after setup, such that neither from program nor syscall
          side writes would be possible.
      
       3) In bpf_program__collect_reloc() step, we record the
          corresponding map, insn index, and relocation type for
          the global data.
      
       4) And last but not least in the actual relocation step in
          bpf_program__relocate(), we mark the ldimm64 instruction
          with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
          imm field the map's file descriptor is stored as similarly
          done as in BPF_PSEUDO_MAP_FD, and in the second imm field
          (as ldimm64 is 2-insn wide) we store the access offset
          into the section. Given these maps have only single element
          ldimm64's off remains zero in both parts.
      
       5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
          load will then store the actual target address in order
          to have a 'map-lookup'-free access. That is, the actual
          map value base address + offset. The destination register
          in the verifier will then be marked as PTR_TO_MAP_VALUE,
          containing the fixed offset as reg->off and backing BPF
          map as reg->map_ptr. Meaning, it's treated as any other
          normal map value from verification side, only with
          efficient, direct value access instead of actual call to
          map lookup helper as in the typical case.
      
      Currently, only support for static global variables has been
      added, and libbpf rejects non-static global variables from
      loading. This can be lifted until we have proper semantics
      for how BPF will treat multi-object BPF loads. From BTF side,
      libbpf will set the value type id of the types corresponding
      to the ".bss", ".data" and ".rodata" names which LLVM will
      emit without the object name prefix. The key type will be
      left as zero, thus making use of the key-less BTF option in
      array maps.
      
      Simple example dump of program using globals vars in each
      section:
      
        # bpftool prog
        [...]
        6784: sched_cls  name load_static_dat  tag a7e1291567277844  gpl
              loaded_at 2019-03-11T15:39:34+0000  uid 0
              xlated 1776B  jited 993B  memlock 4096B  map_ids 2238,2237,2235,2236,2239,2240
      
        # bpftool map show id 2237
        2237: array  name test_glo.bss  flags 0x0
              key 4B  value 64B  max_entries 1  memlock 4096B
        # bpftool map show id 2235
        2235: array  name test_glo.data  flags 0x0
              key 4B  value 64B  max_entries 1  memlock 4096B
        # bpftool map show id 2236
        2236: array  name test_glo.rodata  flags 0x80
              key 4B  value 96B  max_entries 1  memlock 4096B
      
        # bpftool prog dump xlated id 6784
        int load_static_data(struct __sk_buff * skb):
        ; int load_static_data(struct __sk_buff *skb)
           0: (b7) r6 = 0
        ; test_reloc(number, 0, &num0);
           1: (63) *(u32 *)(r10 -4) = r6
           2: (bf) r2 = r10
        ; int load_static_data(struct __sk_buff *skb)
           3: (07) r2 += -4
        ; test_reloc(number, 0, &num0);
           4: (18) r1 = map[id:2238]
           6: (18) r3 = map[id:2237][0]+0    <-- direct addr in .bss area
           8: (b7) r4 = 0
           9: (85) call array_map_update_elem#100464
          10: (b7) r1 = 1
        ; test_reloc(number, 1, &num1);
        [...]
        ; test_reloc(string, 2, str2);
         120: (18) r8 = map[id:2237][0]+16   <-- same here at offset +16
         122: (18) r1 = map[id:2239]
         124: (18) r3 = map[id:2237][0]+16
         126: (b7) r4 = 0
         127: (85) call array_map_update_elem#100464
         128: (b7) r1 = 120
        ; str1[5] = 'x';
         129: (73) *(u8 *)(r9 +5) = r1
        ; test_reloc(string, 3, str1);
         130: (b7) r1 = 3
         131: (63) *(u32 *)(r10 -4) = r1
         132: (b7) r9 = 3
         133: (bf) r2 = r10
        ; int load_static_data(struct __sk_buff *skb)
         134: (07) r2 += -4
        ; test_reloc(string, 3, str1);
         135: (18) r1 = map[id:2239]
         137: (18) r3 = map[id:2235][0]+16   <-- direct addr in .data area
         139: (b7) r4 = 0
         140: (85) call array_map_update_elem#100464
         141: (b7) r1 = 111
        ; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
         142: (73) *(u8 *)(r8 +6) = r1       <-- further access based on .bss data
         143: (b7) r1 = 108
         144: (73) *(u8 *)(r8 +5) = r1
        [...]
      
      For Cilium use-case in particular, this enables migrating configuration
      constants from Cilium daemon's generated header defines into global
      data sections such that expensive runtime recompilations with LLVM can
      be avoided altogether. Instead, the ELF file becomes effectively a
      "template", meaning, it is compiled only once (!) and the Cilium daemon
      will then rewrite relevant configuration data from the ELF's .data or
      .rodata sections directly instead of recompiling the program. The
      updated ELF is then loaded into the kernel and atomically replaces
      the existing program in the networking datapath. More info in [0].
      
      Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
      for static variables").
      
        [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
            http://vger.kernel.org/lpc-bpf2018.html#session-3Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d859900c
    • Joe Stringer's avatar
      bpf, libbpf: refactor relocation handling · f8c7a4d4
      Joe Stringer authored
      Adjust the code for relocations slightly with no functional changes,
      so that upcoming patches that will introduce support for relocations
      into the .data, .rodata and .bss sections can be added independent
      of these changes.
      Signed-off-by: default avatarJoe Stringer <joe@wand.net.nz>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f8c7a4d4
    • Daniel Borkmann's avatar
      bpf: sync {btf, bpf}.h uapi header from tools infrastructure · c83fef6b
      Daniel Borkmann authored
      Pull in latest changes from both headers, so we can make use of
      them in libbpf.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c83fef6b
    • Daniel Borkmann's avatar
      bpf: allow for key-less BTF in array map · 2824ecb7
      Daniel Borkmann authored
      Given we'll be reusing BPF array maps for global data/bss/rodata
      sections, we need a way to associate BTF DataSec type as its map
      value type. In usual cases we have this ugly BPF_ANNOTATE_KV_PAIR()
      macro hack e.g. via 38d5d3b3 ("bpf: Introduce BPF_ANNOTATE_KV_PAIR")
      to get initial map to type association going. While more use cases
      for it are discouraged, this also won't work for global data since
      the use of array map is a BPF loader detail and therefore unknown
      at compilation time. For array maps with just a single entry we make
      an exception in terms of BTF in that key type is declared optional
      if value type is of DataSec type. The latter LLVM is guaranteed to
      emit and it also aligns with how we regard global data maps as just
      a plain buffer area reusing existing map facilities for allowing
      things like introspection with existing tools.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2824ecb7
    • Daniel Borkmann's avatar
      bpf: kernel side support for BTF Var and DataSec · 1dc92851
      Daniel Borkmann authored
      This work adds kernel-side verification, logging and seq_show dumping
      of BTF Var and DataSec kinds which are emitted with latest LLVM. The
      following constraints apply:
      
      BTF Var must have:
      
      - Its kind_flag is 0
      - Its vlen is 0
      - Must point to a valid type
      - Type must not resolve to a forward type
      - Size of underlying type must be > 0
      - Must have a valid name
      - Can only be a source type, not sink or intermediate one
      - Name may include dots (e.g. in case of static variables
        inside functions)
      - Cannot be a member of a struct/union
      - Linkage so far can either only be static or global/allocated
      
      BTF DataSec must have:
      
      - Its kind_flag is 0
      - Its vlen cannot be 0
      - Its size cannot be 0
      - Must have a valid name
      - Can only be a source type, not sink or intermediate one
      - Name may include dots (e.g. to represent .bss, .data, .rodata etc)
      - Cannot be a member of a struct/union
      - Inner btf_var_secinfo array with {type,offset,size} triple
        must be sorted by offset in ascending order
      - Type must always point to BTF Var
      - BTF resolved size of Var must be <= size provided by triple
      - DataSec size must be >= sum of triple sizes (thus holes
        are allowed)
      
      btf_var_resolve(), btf_ptr_resolve() and btf_modifier_resolve()
      are on a high level quite similar but each come with slight,
      subtle differences. They could potentially be a bit refactored
      in future which hasn't been done here to ease review.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1dc92851
    • Daniel Borkmann's avatar
      bpf: add specification for BTF Var and DataSec kinds · f063c889
      Daniel Borkmann authored
      This adds the BTF specification and UAPI bits for supporting BTF Var
      and DataSec kinds. This is following LLVM upstream commit ac4082b77e07
      ("[BPF] Add BTF Var and DataSec Support") which has been merged recently.
      Var itself is for describing a global variable and DataSec to describe
      ELF sections e.g. data/bss/rodata sections that hold one or multiple
      global variables.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f063c889
    • Daniel Borkmann's avatar
      bpf: allow . char as part of the object name · 3e0ddc4f
      Daniel Borkmann authored
      Trivial addition to allow '.' aside from '_' as "special" characters
      in the object name. Used to allow for substrings in maps from loader
      side such as ".bss", ".data", ".rodata", but could also be useful for
      other purposes.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3e0ddc4f
    • Daniel Borkmann's avatar
      bpf: add syscall side map freeze support · 87df15de
      Daniel Borkmann authored
      This patch adds a new BPF_MAP_FREEZE command which allows to
      "freeze" the map globally as read-only / immutable from syscall
      side.
      
      Map permission handling has been refactored into map_get_sys_perms()
      and drops FMODE_CAN_WRITE in case of locked map. Main use case is
      to allow for setting up .rodata sections from the BPF ELF which
      are loaded into the kernel, meaning BPF loader first allocates
      map, sets up map value by copying .rodata section into it and once
      complete, it calls BPF_MAP_FREEZE on the map fd to prevent further
      modifications.
      
      Right now BPF_MAP_FREEZE only takes map fd as argument while remaining
      bpf_attr members are required to be zero. I didn't add write-only
      locking here as counterpart since I don't have a concrete use-case
      for it on my side, and I think it makes probably more sense to wait
      once there is actually one. In that case bpf_attr can be extended
      as usual with a flag field and/or others where flag 0 means that
      we lock the map read-only hence this doesn't prevent to add further
      extensions to BPF_MAP_FREEZE upon need.
      
      A map creation flag like BPF_F_WRONCE was not considered for couple
      of reasons: i) in case of a generic implementation, a map can consist
      of more than just one element, thus there could be multiple map
      updates needed to set the map into a state where it can then be
      made immutable, ii) WRONCE indicates exact one-time write before
      it is then set immutable. A generic implementation would set a bit
      atomically on map update entry (if unset), indicating that every
      subsequent update from then onwards will need to bail out there.
      However, map updates can fail, so upon failure that flag would need
      to be unset again and the update attempt would need to be repeated
      for it to be eventually made immutable. While this can be made
      race-free, this approach feels less clean and in combination with
      reason i), it's not generic enough. A dedicated BPF_MAP_FREEZE
      command directly sets the flag and caller has the guarantee that
      map is immutable from syscall side upon successful return for any
      future syscall invocations that would alter the map state, which
      is also more intuitive from an API point of view. A command name
      such as BPF_MAP_LOCK has been avoided as it's too close with BPF
      map spin locks (which already has BPF_F_LOCK flag). BPF_MAP_FREEZE
      is so far only enabled for privileged users.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      87df15de
    • Daniel Borkmann's avatar
      bpf: add program side {rd, wr}only support for maps · 591fe988
      Daniel Borkmann authored
      This work adds two new map creation flags BPF_F_RDONLY_PROG
      and BPF_F_WRONLY_PROG in order to allow for read-only or
      write-only BPF maps from a BPF program side.
      
      Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
      applies to system call side, meaning the BPF program has full
      read/write access to the map as usual while bpf(2) calls with
      map fd can either only read or write into the map depending
      on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
      for the exact opposite such that verifier is going to reject
      program loads if write into a read-only map or a read into a
      write-only map is detected. For read-only map case also some
      helpers are forbidden for programs that would alter the map
      state such as map deletion, update, etc. As opposed to the two
      BPF_F_RDONLY / BPF_F_WRONLY flags, BPF_F_RDONLY_PROG as well
      as BPF_F_WRONLY_PROG really do correspond to the map lifetime.
      
      We've enabled this generic map extension to various non-special
      maps holding normal user data: array, hash, lru, lpm, local
      storage, queue and stack. Further generic map types could be
      followed up in future depending on use-case. Main use case
      here is to forbid writes into .rodata map values from verifier
      side.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      591fe988
    • Daniel Borkmann's avatar
      bpf: do not retain flags that are not tied to map lifetime · be70bcd5
      Daniel Borkmann authored
      Both BPF_F_WRONLY / BPF_F_RDONLY flags are tied to the map file
      descriptor, but not to the map object itself! Meaning, at map
      creation time BPF_F_RDONLY can be set to make the map read-only
      from syscall side, but this holds only for the returned fd, so
      any other fd either retrieved via bpf file system or via map id
      for the very same underlying map object can have read-write access
      instead.
      
      Given that, keeping the two flags around in the map_flags attribute
      and exposing them to user space upon map dump is misleading and
      may lead to false conclusions. Since these two flags are not
      tied to the map object lets also not store them as map property.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      be70bcd5
    • Daniel Borkmann's avatar
      bpf: implement lookup-free direct value access for maps · d8eca5bb
      Daniel Borkmann authored
      This generic extension to BPF maps allows for directly loading
      an address residing inside a BPF map value as a single BPF
      ldimm64 instruction!
      
      The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
      is a special src_reg flag for ldimm64 instruction that indicates
      that inside the first part of the double insns's imm field is a
      file descriptor which the verifier then replaces as a full 64bit
      address of the map into both imm parts. For the newly added
      BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following:
      the first part of the double insns's imm field is again a file
      descriptor corresponding to the map, and the second part of the
      imm field is an offset into the value. The verifier will then
      replace both imm parts with an address that points into the BPF
      map value at the given value offset for maps that support this
      operation. Currently supported is array map with single entry.
      It is possible to support more than just single map element by
      reusing both 16bit off fields of the insns as a map index, so
      full array map lookup could be expressed that way. It hasn't
      been implemented here due to lack of concrete use case, but
      could easily be done so in future in a compatible way, since
      both off fields right now have to be 0 and would correctly
      denote a map index 0.
      
      The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with
      BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of
      map pointer versus load of map's value at offset 0, and changing
      BPF_PSEUDO_MAP_FD's encoding into off by one to differ between
      regular map pointer and map value pointer would add unnecessary
      complexity and increases barrier for debugability thus less
      suitable. Using the second part of the imm field as an offset
      into the value does /not/ come with limitations since maximum
      possible value size is in u32 universe anyway.
      
      This optimization allows for efficiently retrieving an address
      to a map value memory area without having to issue a helper call
      which needs to prepare registers according to calling convention,
      etc, without needing the extra NULL test, and without having to
      add the offset in an additional instruction to the value base
      pointer. The verifier then treats the destination register as
      PTR_TO_MAP_VALUE with constant reg->off from the user passed
      offset from the second imm field, and guarantees that this is
      within bounds of the map value. Any subsequent operations are
      normally treated as typical map value handling without anything
      extra needed from verification side.
      
      The two map operations for direct value access have been added to
      array map for now. In future other types could be supported as
      well depending on the use case. The main use case for this commit
      is to allow for BPF loader support for global variables that
      reside in .data/.rodata/.bss sections such that we can directly
      load the address of them with minimal additional infrastructure
      required. Loader support has been added in subsequent commits for
      libbpf library.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d8eca5bb
  3. 07 Apr, 2019 1 commit
  4. 05 Apr, 2019 9 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-varstack-fixes' · 347807d3
      Daniel Borkmann authored
      Andrey Ignatov says:
      
      ====================
      v2->v3:
      - sanity check max value for variable offset.
      
      v1->v2:
      - rely on meta = NULL to reject var_off stack access to uninit buffer.
      
      This patch set is a follow-up for discussion [1].
      
      It fixes variable offset stack access handling for raw and unprivileged
      mode, rejecting both of them, and sanity checks max variable offset value.
      
      Patch 1 handles raw (uninitialized) mode.
      Patch 2 adds test for raw mode.
      Patch 3 handles unprivileged mode.
      Patch 4 adds test for unprivileged mode.
      Patch 5 adds sanity check for max value of variable offset.
      Patch 6 adds test for variable offset max value checking.
      Patch 7 is a minor fix in verbose log.
      
      Unprivileged mode is an interesting case since one (and only?) way to come
      up with variable offset is to use pointer arithmetics. Though pointer
      arithmetics is already prohibited for unprivileged mode. I'm not sure if
      it's enough though and it seems like a good idea to still reject variable
      offset for unpriv in check_stack_boundary(). Please see patches 3 and 4
      for more details on this.
      
      [1] https://marc.info/?l=linux-netdev&m=155419526427742&w=2
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      347807d3
    • Andrey Ignatov's avatar
      bpf: Add missed newline in verifier verbose log · 1fbd20f8
      Andrey Ignatov authored
      check_stack_access() that prints verbose log is used in
      adjust_ptr_min_max_vals() that prints its own verbose log and now they
      stick together, e.g.:
      
        variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16
        size=1R2 stack pointer arithmetic goes out of range, prohibited for
        !root
      
      Add missing newline so that log is more readable:
        variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1
        R2 stack pointer arithmetic goes out of range, prohibited for !root
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      1fbd20f8
    • Andrey Ignatov's avatar
      selftests/bpf: Test unbounded var_off stack access · 07f91962
      Andrey Ignatov authored
      Test the case when reg->smax_value is too small/big and can overflow,
      and separately min and max values outside of stack bounds.
      
      Example of output:
        # ./test_verifier
        #856/p indirect variable-offset stack access, unbounded OK
        #857/p indirect variable-offset stack access, max out of bound OK
        #858/p indirect variable-offset stack access, min out of bound OK
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      07f91962
    • Andrey Ignatov's avatar
      bpf: Sanity check max value for var_off stack access · 107c26a7
      Andrey Ignatov authored
      As discussed in [1] max value of variable offset has to be checked for
      overflow on stack access otherwise verifier would accept code like this:
      
        0: (b7) r2 = 6
        1: (b7) r3 = 28
        2: (7a) *(u64 *)(r10 -16) = 0
        3: (7a) *(u64 *)(r10 -8) = 0
        4: (79) r4 = *(u64 *)(r1 +168)
        5: (c5) if r4 s< 0x0 goto pc+4
         R1=ctx(id=0,off=0,imm=0) R2=inv6 R3=inv28
         R4=inv(id=0,umax_value=9223372036854775807,var_off=(0x0;
         0x7fffffffffffffff)) R10=fp0,call_-1 fp-8=mmmmmmmm fp-16=mmmmmmmm
        6: (17) r4 -= 16
        7: (0f) r4 += r10
        8: (b7) r5 = 8
        9: (85) call bpf_getsockopt#57
        10: (b7) r0 = 0
        11: (95) exit
      
      , where R4 obviosly has unbounded max value.
      
      Fix it by checking that reg->smax_value is inside (-BPF_MAX_VAR_OFF;
      BPF_MAX_VAR_OFF) range.
      
      reg->smax_value is used instead of reg->umax_value because stack
      pointers are calculated using negative offset from fp. This is opposite
      to e.g. map access where offset must be non-negative and where
      umax_value is used.
      
      Also dedicated verbose logs are added for both min and max bound check
      failures to have diagnostics consistent with variable offset handling in
      check_map_access().
      
      [1] https://marc.info/?l=linux-netdev&m=155433357510597&w=2
      
      Fixes: 2011fccf ("bpf: Support variable offset stack access from helpers")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      107c26a7
    • Andrey Ignatov's avatar
      selftests/bpf: Test indirect var_off stack access in unpriv mode · 2c6927db
      Andrey Ignatov authored
      Test that verifier rejects indirect stack access with variable offset in
      unprivileged mode and accepts same code in privileged mode.
      
      Since pointer arithmetics is prohibited in unprivileged mode verifier
      should reject the program even before it gets to helper call that uses
      variable offset, at the time when that variable offset is trying to be
      constructed.
      
      Example of output:
        # ./test_verifier
        ...
        #859/u indirect variable-offset stack access, priv vs unpriv OK
        #859/p indirect variable-offset stack access, priv vs unpriv OK
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      2c6927db
    • Andrey Ignatov's avatar
      bpf: Reject indirect var_off stack access in unpriv mode · 088ec26d
      Andrey Ignatov authored
      Proper support of indirect stack access with variable offset in
      unprivileged mode (!root) requires corresponding support in Spectre
      masking for stack ALU in retrieve_ptr_limit().
      
      There are no use-case for variable offset in unprivileged mode though so
      make verifier reject such accesses for simplicity.
      
      Pointer arithmetics is one (and only?) way to cause variable offset and
      it's already rejected in unpriv mode so that verifier won't even get to
      helper function whose argument contains variable offset, e.g.:
      
        0: (7a) *(u64 *)(r10 -16) = 0
        1: (7a) *(u64 *)(r10 -8) = 0
        2: (61) r2 = *(u32 *)(r1 +0)
        3: (57) r2 &= 4
        4: (17) r2 -= 16
        5: (0f) r2 += r10
        variable stack access var_off=(0xfffffffffffffff0; 0x4) off=-16 size=1R2
        stack pointer arithmetic goes out of range, prohibited for !root
      
      Still it looks like a good idea to reject variable offset indirect stack
      access for unprivileged mode in check_stack_boundary() explicitly.
      
      Fixes: 2011fccf ("bpf: Support variable offset stack access from helpers")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      088ec26d
    • Andrey Ignatov's avatar
      selftests/bpf: Test indirect var_off stack access in raw mode · f68a5b44
      Andrey Ignatov authored
      Test that verifier rejects indirect access to uninitialized stack with
      variable offset.
      
      Example of output:
        # ./test_verifier
        ...
        #859/p indirect variable-offset stack access, uninitialized OK
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f68a5b44
    • Andrey Ignatov's avatar
      bpf: Reject indirect var_off stack access in raw mode · f2bcd05e
      Andrey Ignatov authored
      It's hard to guarantee that whole memory is marked as initialized on
      helper return if uninitialized stack is accessed with variable offset
      since specific bounds are unknown to verifier. This may cause
      uninitialized stack leaking.
      
      Reject such an access in check_stack_boundary to prevent possible
      leaking.
      
      There are no known use-cases for indirect uninitialized stack access
      with variable offset so it shouldn't break anything.
      
      Fixes: 2011fccf ("bpf: Support variable offset stack access from helpers")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      f2bcd05e
    • Alexei Starovoitov's avatar
      samples/bpf: fix build with new clang · 636e78b1
      Alexei Starovoitov authored
      clang started to error on invalid asm clobber usage in x86 headers
      and many bpf program samples failed to build with the message:
      
        CLANG-bpf  /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.o
      In file included from /data/users/ast/bpf-next/samples/bpf/xdp_redirect_kern.c:14:
      In file included from ../include/linux/in.h:23:
      In file included from ../include/uapi/linux/in.h:24:
      In file included from ../include/linux/socket.h:8:
      In file included from ../include/linux/uio.h:14:
      In file included from ../include/crypto/hash.h:16:
      In file included from ../include/linux/crypto.h:26:
      In file included from ../include/linux/uaccess.h:5:
      In file included from ../include/linux/sched.h:15:
      In file included from ../include/linux/sem.h:5:
      In file included from ../include/uapi/linux/sem.h:5:
      In file included from ../include/linux/ipc.h:9:
      In file included from ../include/linux/refcount.h:72:
      ../arch/x86/include/asm/refcount.h:72:36: error: asm-specifier for input or output variable conflicts with asm clobber list
                                               r->refs.counter, e, "er", i, "cx");
                                                                            ^
      ../arch/x86/include/asm/refcount.h:86:27: error: asm-specifier for input or output variable conflicts with asm clobber list
                                               r->refs.counter, e, "cx");
                                                                   ^
      2 errors generated.
      
      Override volatile() to workaround the problem.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      636e78b1
  5. 04 Apr, 2019 2 commits
  6. 03 Apr, 2019 3 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-verifier-scalability' · cc441a69
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      v1->v2:
      - fixed typo in patch 1
      - added a patch to convert kcalloc to kvcalloc
      - added a patch to verbose 16-bit jump offset check
      - added a test with 1m insns
      
      This patch set is the first step to be able to accept large programs.
      The verifier still suffers from its brute force algorithm and
      large programs can easily hit 1M insn_processed limit.
      A lot more work is necessary to be able to verify large programs.
      
      v1:
      Realize two key ideas to speed up verification speed by ~20 times
      1. every 'branching' instructions records all verifier states.
         not all of them are useful for search pruning.
         add a simple heuristic to keep states that were successful in search pruning
         and remove those that were not
      2. mark_reg_read walks parentage chain of registers to mark parents as LIVE_READ.
         Once the register is marked there is no need to remark it again in the future.
         Hence stop walking the chain once first LIVE_READ is seen.
      
      1st optimization gives 10x speed up on large programs
      and 2nd optimization reduces the cost of mark_reg_read from ~40% of cpu to <1%.
      Combined the deliver ~20x speedup on large programs.
      
      Faster and bounded verification time allows to increase insn_processed
      limit to 1 million from 130k.
      Worst case it takes 1/10 of a second to process that many instructions
      and peak memory consumption is peak_states * sizeof(struct bpf_verifier_state)
      which is around ~5Mbyte.
      
      Increase insn_per_program limit for root to insn_processed limit.
      
      Add verification stats and stress tests for verifier scalability.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      cc441a69
    • Alexei Starovoitov's avatar
      selftests/bpf: synthetic tests to push verifier limits · 8aa2d4b4
      Alexei Starovoitov authored
      Add a test to generate 1m ld_imm64 insns to stress the verifier.
      
      Bump the size of fill_ld_abs_vlan_push_pop test from 4k to 29k
      and jump_around_ld_abs from 4k to 5.5k.
      Larger sizes are not possible due to 16-bit offset encoding
      in jump instructions.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      8aa2d4b4
    • Alexei Starovoitov's avatar
      selftests/bpf: add few verifier scale tests · e5e7a8f2
      Alexei Starovoitov authored
      Add 3 basic tests that stress verifier scalability.
      
      test_verif_scale1.c calls non-inlined jhash() function 90 times on
      different position in the packet.
      This test simulates network packet parsing.
      jhash function is ~140 instructions and main program is ~1200 insns.
      
      test_verif_scale2.c force inlines jhash() function 90 times.
      This program is ~15k instructions long.
      
      test_verif_scale3.c calls non-inlined jhash() function 90 times on
      But this time jhash has to process 32-bytes from the packet
      instead of 14-bytes in tests 1 and 2.
      jhash function is ~230 insns and main program is ~1200 insns.
      
      $ test_progs -s
      can be used to see verifier stats.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      e5e7a8f2