1. 16 Apr, 2019 13 commits
  2. 13 Apr, 2019 4 commits
  3. 12 Apr, 2019 23 commits
    • Andrey Ignatov's avatar
      bpf: Fix distinct pointer types warning for ARCH=i386 · 51356ac8
      Andrey Ignatov authored
      Fix a new warning reported by kbuild for make ARCH=i386:
      
         In file included from kernel/bpf/cgroup.c:11:0:
         kernel/bpf/cgroup.c: In function '__cgroup_bpf_run_filter_sysctl':
         include/linux/kernel.h:827:29: warning: comparison of distinct pointer types lacks a cast
            (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
                                      ^
         include/linux/kernel.h:841:4: note: in expansion of macro '__typecheck'
            (__typecheck(x, y) && __no_side_effects(x, y))
             ^~~~~~~~~~~
         include/linux/kernel.h:851:24: note: in expansion of macro '__safe_cmp'
           __builtin_choose_expr(__safe_cmp(x, y), \
                                 ^~~~~~~~~~
         include/linux/kernel.h:860:19: note: in expansion of macro '__careful_cmp'
          #define min(x, y) __careful_cmp(x, y, <)
                            ^~~~~~~~~~~~~
      >> kernel/bpf/cgroup.c:837:17: note: in expansion of macro 'min'
            ctx.new_len = min(PAGE_SIZE, *pcount);
                          ^~~
      
      Fixes: 4e63acdf ("bpf: Introduce bpf_sysctl_{get,set}_new_value helpers")
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      51356ac8
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-sysctl-hook' · a43d0508
      Alexei Starovoitov authored
      Andrey Ignatov says:
      
      ====================
      v2->v3:
      - simplify C based selftests by relying on variable offset stack access.
      
      v1->v2:
      - add fs/proc/proc_sysctl.c mainteners to Cc:.
      
      The patch set introduces new BPF hook for sysctl.
      
      It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type
      BPF_CGROUP_SYSCTL.
      
      BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so
      that accesses (read/write) to sysctl can be controlled for specific cgroup
      and either allowed or denied, or traced.
      
      The hook has access to sysctl name, current sysctl value and (on write
      only) to new sysctl value via corresponding helpers. New sysctl value can
      be overridden by program. Both name and values (current/new) are
      represented as strings same way they're visible in /proc/sys/. It is up to
      program to parse these strings.
      
      To help with parsing the most common kind of sysctl value, vector of
      integers, two new helpers are provided: bpf_strtol and bpf_strtoul with
      semantic similar to user space strtol(3) and strtoul(3).
      
      The hook also provides bpf_sysctl context with two fields:
      * @write indicates whether sysctl is being read (= 0) or written (= 1);
      * @file_pos is sysctl file position to read from or write to, can be
        overridden.
      
      The hook allows to make better isolation for containerized applications
      that are run as root so that one container can't change a sysctl and affect
      all other containers on a host, make changes to allowed sysctl in a safer
      way and simplify sysctl tracing for cgroups.
      
      Patch 1 is preliminary refactoring.
      Patch 2 adds new program and attach types.
      Patches 3-5 implement helpers to access sysctl name and value.
      Patch 6 adds file_pos field to bpf_sysctl context.
      Patch 7 updates UAPI in tools.
      Patches 8-9 add support for the new hook to libbpf and corresponding test.
      Patches 10-14 add selftests for the new hook.
      Patch 15 adds support for new arg types to verifier: pointer to integer.
      Patch 16 adds bpf_strto{l,ul} helpers to parse integers from sysctl value.
      Patch 17 updates UAPI in tools.
      Patch 18 updates bpf_helpers.h.
      Patch 19 adds selftests for pointer to integer in verifier.
      Patches 20-21 add selftests for bpf_strto{l,ul}, including integration
                    C based test for sysctl value parsing.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a43d0508
    • Andrey Ignatov's avatar
      selftests/bpf: C based test for sysctl and strtoX · 7568f4cb
      Andrey Ignatov authored
      Add C based test for a few bpf_sysctl_* helpers and bpf_strtoul.
      
      Make sure that sysctl can be identified by name and that multiple
      integers can be parsed from sysctl value with bpf_strtoul.
      
      net/ipv4/tcp_mem is chosen as a testing sysctl, it contains 3 unsigned
      longs, they all are parsed and compared (val[0] < val[1] < val[2]).
      
      Example of output:
        # ./test_sysctl
        ...
        Test case: C prog: deny all writes .. [PASS]
        Test case: C prog: deny access by name .. [PASS]
        Test case: C prog: read tcp_mem .. [PASS]
        Summary: 39 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7568f4cb
    • Andrey Ignatov's avatar
      selftests/bpf: Test bpf_strtol and bpf_strtoul helpers · 8549ddc8
      Andrey Ignatov authored
      Test that bpf_strtol and  bpf_strtoul helpers can be used to convert
      provided buffer to long or unsigned long correspondingly and return both
      correct result and number of consumed bytes, or proper errno.
      
      Example of output:
        # ./test_sysctl
        ..
        Test case: bpf_strtoul one number string .. [PASS]
        Test case: bpf_strtoul multi number string .. [PASS]
        Test case: bpf_strtoul buf_len = 0, reject .. [PASS]
        Test case: bpf_strtoul supported base, ok .. [PASS]
        Test case: bpf_strtoul unsupported base, EINVAL .. [PASS]
        Test case: bpf_strtoul buf with spaces only, EINVAL .. [PASS]
        Test case: bpf_strtoul negative number, EINVAL .. [PASS]
        Test case: bpf_strtol negative number, ok .. [PASS]
        Test case: bpf_strtol hex number, ok .. [PASS]
        Test case: bpf_strtol max long .. [PASS]
        Test case: bpf_strtol overflow, ERANGE .. [PASS]
        Summary: 36 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8549ddc8
    • Andrey Ignatov's avatar
      selftests/bpf: Test ARG_PTR_TO_LONG arg type · c2d5f12e
      Andrey Ignatov authored
      Test that verifier handles new argument types properly, including
      uninitialized or partially initialized value, misaligned stack access,
      etc.
      
      Example of output:
        #456/p ARG_PTR_TO_LONG uninitialized OK
        #457/p ARG_PTR_TO_LONG half-uninitialized OK
        #458/p ARG_PTR_TO_LONG misaligned OK
        #459/p ARG_PTR_TO_LONG size < sizeof(long) OK
        #460/p ARG_PTR_TO_LONG initialized OK
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c2d5f12e
    • Andrey Ignatov's avatar
      selftests/bpf: Add sysctl and strtoX helpers to bpf_helpers.h · 99f57973
      Andrey Ignatov authored
      Add bpf_sysctl_* and bpf_strtoX helpers to bpf_helpers.h.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      99f57973
    • Andrey Ignatov's avatar
      bpf: Sync bpf.h to tools/ · b457e553
      Andrey Ignatov authored
      Sync bpf_strtoX related bpf UAPI changes to tools/.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b457e553
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_strtol and bpf_strtoul helpers · d7a4cb9b
      Andrey Ignatov authored
      Add bpf_strtol and bpf_strtoul to convert a string to long and unsigned
      long correspondingly. It's similar to user space strtol(3) and
      strtoul(3) with a few changes to the API:
      
      * instead of NUL-terminated C string the helpers expect buffer and
        buffer length;
      
      * resulting long or unsigned long is returned in a separate
        result-argument;
      
      * return value is used to indicate success or failure, on success number
        of consumed bytes is returned that can be used to identify position to
        read next if the buffer is expected to contain multiple integers;
      
      * instead of *base* argument, *flags* is used that provides base in 5
        LSB, other bits are reserved for future use;
      
      * number of supported bases is limited.
      
      Documentation for the new helpers is provided in bpf.h UAPI.
      
      The helpers are made available to BPF_PROG_TYPE_CGROUP_SYSCTL programs to
      be able to convert string input to e.g. "ulongvec" output.
      
      E.g. "net/ipv4/tcp_mem" consists of three ulong integers. They can be
      parsed by calling to bpf_strtoul three times.
      
      Implementation notes:
      
      Implementation includes "../../lib/kstrtox.h" to reuse integer parsing
      functions. It's done exactly same way as fs/proc/base.c already does.
      
      Unfortunately existing kstrtoX function can't be used directly since
      they fail if any invalid character is present right after integer in the
      string. Existing simple_strtoX functions can't be used either since
      they're obsolete and don't handle overflow properly.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d7a4cb9b
    • Andrey Ignatov's avatar
      bpf: Introduce ARG_PTR_TO_{INT,LONG} arg types · 57c3bb72
      Andrey Ignatov authored
      Currently the way to pass result from BPF helper to BPF program is to
      provide memory area defined by pointer and size: func(void *, size_t).
      
      It works great for generic use-case, but for simple types, such as int,
      it's overkill and consumes two arguments when it could use just one.
      
      Introduce new argument types ARG_PTR_TO_INT and ARG_PTR_TO_LONG to be
      able to pass result from helper to program via pointer to int and long
      correspondingly: func(int *) or func(long *).
      
      New argument types are similar to ARG_PTR_TO_MEM with the following
      differences:
      * they don't require corresponding ARG_CONST_SIZE argument, predefined
        access sizes are used instead (32bit for int, 64bit for long);
      * it's possible to use more than one such an argument in a helper;
      * provided pointers have to be aligned.
      
      It's easy to introduce similar ARG_PTR_TO_CHAR and ARG_PTR_TO_SHORT
      argument types. It's not done due to lack of use-case though.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      57c3bb72
    • Andrey Ignatov's avatar
      selftests/bpf: Test file_pos field in bpf_sysctl ctx · 9a1027e5
      Andrey Ignatov authored
      Test access to file_pos field of bpf_sysctl context, both read (incl.
      narrow read) and write.
      
        # ./test_sysctl
        ...
        Test case: ctx:file_pos sysctl:read read ok .. [PASS]
        Test case: ctx:file_pos sysctl:read read ok narrow .. [PASS]
        Test case: ctx:file_pos sysctl:read write ok .. [PASS]
        ...
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9a1027e5
    • Andrey Ignatov's avatar
      selftests/bpf: Test bpf_sysctl_{get,set}_new_value helpers · 786047dd
      Andrey Ignatov authored
      Test that new value provided by user space on sysctl write can be read
      by bpf_sysctl_get_new_value and overridden by bpf_sysctl_set_new_value.
      
        # ./test_sysctl
        ...
        Test case: sysctl_get_new_value sysctl:read EINVAL .. [PASS]
        Test case: sysctl_get_new_value sysctl:write ok .. [PASS]
        Test case: sysctl_get_new_value sysctl:write ok long .. [PASS]
        Test case: sysctl_get_new_value sysctl:write E2BIG .. [PASS]
        Test case: sysctl_set_new_value sysctl:read EINVAL .. [PASS]
        Test case: sysctl_set_new_value sysctl:write ok .. [PASS]
        Summary: 22 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      786047dd
    • Andrey Ignatov's avatar
      selftests/bpf: Test sysctl_get_current_value helper · 11ff34f7
      Andrey Ignatov authored
      Test sysctl_get_current_value on sysctl read and write, buffers with
      enough space and too small buffers to get E2BIG and truncated result,
      etc.
      
        # ./test_sysctl
        ...
        Test case: sysctl_get_current_value sysctl:read ok, gt .. [PASS]
        Test case: sysctl_get_current_value sysctl:read ok, eq .. [PASS]
        Test case: sysctl_get_current_value sysctl:read E2BIG truncated ..  [PASS]
        Test case: sysctl_get_current_value sysctl:read EINVAL .. [PASS]
        Test case: sysctl_get_current_value sysctl:write ok .. [PASS]
        Summary: 16 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      11ff34f7
    • Andrey Ignatov's avatar
      selftests/bpf: Test bpf_sysctl_get_name helper · 6041c67f
      Andrey Ignatov authored
      Test w/ and w/o BPF_F_SYSCTL_BASE_NAME, buffers with enough space and
      too small buffers to get E2BIG and truncated result, etc.
      
        # ./test_sysctl
        ...
        Test case: sysctl_get_name sysctl_value:base ok .. [PASS]
        Test case: sysctl_get_name sysctl_value:base E2BIG truncated .. [PASS]
        Test case: sysctl_get_name sysctl:full ok .. [PASS]
        Test case: sysctl_get_name sysctl:full E2BIG truncated .. [PASS]
        Test case: sysctl_get_name sysctl:full E2BIG truncated small .. [PASS]
        Summary: 11 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6041c67f
    • Andrey Ignatov's avatar
      selftests/bpf: Test BPF_CGROUP_SYSCTL · 1f5fa9ab
      Andrey Ignatov authored
      Add unit test for BPF_PROG_TYPE_CGROUP_SYSCTL program type.
      
      Test that program can allow/deny access.
      Test both valid and invalid accesses to ctx->write.
      
      Example of output:
        # ./test_sysctl
        Test case: sysctl wrong attach_type .. [PASS]
        Test case: sysctl:read allow all .. [PASS]
        Test case: sysctl:read deny all .. [PASS]
        Test case: ctx:write sysctl:read read ok .. [PASS]
        Test case: ctx:write sysctl:write read ok .. [PASS]
        Test case: ctx:write sysctl:read write reject .. [PASS]
        Summary: 6 PASSED, 0 FAILED
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1f5fa9ab
    • Andrey Ignatov's avatar
      selftests/bpf: Test sysctl section name · 7007af63
      Andrey Ignatov authored
      Add unit test to verify that program and attach types are properly
      identified for "cgroup/sysctl" section name.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7007af63
    • Andrey Ignatov's avatar
      libbpf: Support sysctl hook · 063cc9f0
      Andrey Ignatov authored
      Support BPF_PROG_TYPE_CGROUP_SYSCTL program in libbpf: identifying
      program and attach types by section name, probe.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      063cc9f0
    • Andrey Ignatov's avatar
      bpf: Sync bpf.h to tools/ · 196398d4
      Andrey Ignatov authored
      Sync BPF_PROG_TYPE_CGROUP_SYSCTL related bpf UAPI changes to tools/.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      196398d4
    • Andrey Ignatov's avatar
      bpf: Add file_pos field to bpf_sysctl ctx · e1550bfe
      Andrey Ignatov authored
      Add file_pos field to bpf_sysctl context to read and write sysctl file
      position at which sysctl is being accessed (read or written).
      
      The field can be used to e.g. override whole sysctl value on write to
      sysctl even when sys_write is called by user space with file_pos > 0. Or
      BPF program may reject such accesses.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e1550bfe
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_sysctl_{get,set}_new_value helpers · 4e63acdf
      Andrey Ignatov authored
      Add helpers to work with new value being written to sysctl by user
      space.
      
      bpf_sysctl_get_new_value() copies value being written to sysctl into
      provided buffer.
      
      bpf_sysctl_set_new_value() overrides new value being written by user
      space with a one from provided buffer. Buffer should contain string
      representation of the value, similar to what can be seen in /proc/sys/.
      
      Both helpers can be used only on sysctl write.
      
      File position matters and can be managed by an interface that will be
      introduced separately. E.g. if user space calls sys_write to a file in
      /proc/sys/ at file position = X, where X > 0, then the value set by
      bpf_sysctl_set_new_value() will be written starting from X. If program
      wants to override whole value with specified buffer, file position has
      to be set to zero.
      
      Documentation for the new helpers is provided in bpf.h UAPI.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4e63acdf
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_sysctl_get_current_value helper · 1d11b301
      Andrey Ignatov authored
      Add bpf_sysctl_get_current_value() helper to copy current sysctl value
      into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer.
      
      It provides same string as user space can see by reading corresponding
      file in /proc/sys/, including new line, etc.
      
      Documentation for the new helper is provided in bpf.h UAPI.
      
      Since current value is kept in ctl_table->data in a parsed form,
      ctl_table->proc_handler() with write=0 is called to read that data and
      convert it to a string. Such a string can later be parsed by a program
      using helpers that will be introduced separately.
      
      Unfortunately it's not trivial to provide API to access parsed data due to
      variety of data representations (string, intvec, uintvec, ulongvec,
      custom structures, even NULL, etc). Instead it's assumed that user know
      how to handle specific sysctl they're interested in and appropriate
      helpers can be used.
      
      Since ctl_table->proc_handler() expects __user buffer, conversion to
      __user happens for kernel allocated one where the value is stored.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1d11b301
    • Andrey Ignatov's avatar
      bpf: Introduce bpf_sysctl_get_name helper · 808649fb
      Andrey Ignatov authored
      Add bpf_sysctl_get_name() helper to copy sysctl name (/proc/sys/ entry)
      into provided by BPF_PROG_TYPE_CGROUP_SYSCTL program buffer.
      
      By default full name (w/o /proc/sys/) is copied, e.g. "net/ipv4/tcp_mem".
      
      If BPF_F_SYSCTL_BASE_NAME flag is set, only base name will be copied,
      e.g. "tcp_mem".
      
      Documentation for the new helper is provided in bpf.h UAPI.
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      808649fb
    • Andrey Ignatov's avatar
      bpf: Sysctl hook · 7b146ceb
      Andrey Ignatov authored
      Containerized applications may run as root and it may create problems
      for whole host. Specifically such applications may change a sysctl and
      affect applications in other containers.
      
      Furthermore in existing infrastructure it may not be possible to just
      completely disable writing to sysctl, instead such a process should be
      gradual with ability to log what sysctl are being changed by a
      container, investigate, limit the set of writable sysctl to currently
      used ones (so that new ones can not be changed) and eventually reduce
      this set to zero.
      
      The patch introduces new program type BPF_PROG_TYPE_CGROUP_SYSCTL and
      attach type BPF_CGROUP_SYSCTL to solve these problems on cgroup basis.
      
      New program type has access to following minimal context:
      	struct bpf_sysctl {
      		__u32	write;
      	};
      
      Where @write indicates whether sysctl is being read (= 0) or written (=
      1).
      
      Helpers to access sysctl name and value will be introduced separately.
      
      BPF_CGROUP_SYSCTL attach point is added to sysctl code right before
      passing control to ctl_table->proc_handler so that BPF program can
      either allow or deny access to sysctl.
      Suggested-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7b146ceb
    • Andrey Ignatov's avatar
      bpf: Add base proto function for cgroup-bpf programs · b1cd609d
      Andrey Ignatov authored
      Currently kernel/bpf/cgroup.c contains only one program type and one
      proto function cgroup_dev_func_proto(). It'd be useful to have base
      proto function that can be reused for new cgroup-bpf program types
      coming soon.
      
      Introduce cgroup_base_func_proto().
      Signed-off-by: default avatarAndrey Ignatov <rdna@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b1cd609d