1. 13 Oct, 2023 8 commits
    • Andrii Nakryiko's avatar
      Merge branch 'Open-coded task_vma iter' · 0e10fd4b
      Andrii Nakryiko authored
      Dave Marchevsky says:
      
      ====================
      At Meta we have a profiling daemon which periodically collects
      information on many hosts. This collection usually involves grabbing
      stacks (user and kernel) using perf_event BPF progs and later symbolicating
      them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on
      remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In
      those cases we must fall back to digging around in /proc/PID/maps to map
      virtual address to (binary, offset). The /proc/PID/maps digging does not
      occur synchronously with stack collection, so the process might already
      be gone, in which case it won't have /proc/PID/maps and we will fail to
      symbolicate.
      
      This 'exited process problem' doesn't occur very often as
      most of the prod services we care to profile are long-lived daemons, but
      there are enough usecases to warrant a workaround: a BPF program which
      can be optionally loaded at data collection time and essentially walks
      /proc/PID/maps. Currently this is done by walking the vma list:
      
        struct vm_area_struct* mmap = BPF_CORE_READ(mm, mmap);
        mmap_next = BPF_CORE_READ(rmap, vm_next); /* in a loop */
      
      Since commit 763ecb03 ("mm: remove the vma linked list") there's no
      longer a vma linked list to walk. Walking the vma maple tree is not as
      simple as hopping struct vm_area_struct->vm_next. Luckily,
      commit f39af059 ("mm: add VMA iterator"), another commit in that series,
      added struct vma_iterator and for_each_vma macro for easy vma iteration. If
      similar functionality was exposed to BPF programs, it would be perfect for our
      usecase.
      
      This series adds such functionality, specifically a BPF equivalent of
      for_each_vma using the open-coded iterator style.
      
      Notes:
        * This approach was chosen after discussion on a previous series [0] which
          attempted to solve the same problem by adding a BPF_F_VMA_NEXT flag to
          bpf_find_vma.
        * Unlike the task_vma bpf_iter, the open-coded iterator kfuncs here do not
          drop the vma read lock between iterations. See Alexei's response in [0].
        * The [vsyscall] page isn't really part of task->mm's vmas, but
          /proc/PID/maps returns information about it anyways. The vma iter added
          here does not do the same. See comment on selftest in patch 3.
        * bpf_iter_task_vma allocates a _data struct which contains - among other
          things - struct vma_iterator, using BPF allocator and keeps a pointer to
          the bpf_iter_task_vma_data. This is done in order to prevent changes to
          struct ma_state - which is wrapped by struct vma_iterator - from
          necessitating changes to uapi struct bpf_iter_task_vma.
      
      Changelog:
      
      v6 -> v7: https://lore.kernel.org/bpf/20231010185944.3888849-1-davemarchevsky@fb.com/
      
      Patch numbers correspond to their position in v6
      
      Patch 2 ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
        * Add Andrii ack
      Patch 3 ("bpf: Introduce task_vma open-coded iterator kfuncs")
        * Add Andrii ack
        * Add missing __diag_ignore_all for -Wmissing-prototypes (Song)
      Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter")
        * Remove two unnecessary header includes (Andrii)
        * Remove extraneous !vmas_seen check (Andrii)
      New Patch ("bpf: Add BPF_KFUNC_{START,END}_defs macros")
        * After talking to Andrii, this is an attempt to clean up __diag_ignore_all
          spam everywhere kfuncs are defined. If nontrivial changes are needed,
          let's apply the other 4 and I'll respin as a standalone patch.
      
      v5 -> v6: https://lore.kernel.org/bpf/20231010175637.3405682-1-davemarchevsky@fb.com/
      
      Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter")
        * Remove extraneous blank line. I did this manually to the .patch file
          for v5, which caused BPF CI to complain about failing to apply the
          series
      
      v4 -> v5: https://lore.kernel.org/bpf/20231002195341.2940874-1-davemarchevsky@fb.com/
      
      Patch numbers correspond to their position in v4
      
      New Patch ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
        * Patch 2's renaming of this selftest, and associated changes in the
          userspace runner, are split out into this separate commit (Andrii)
      
      Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
        * Remove bpf_iter_task_vma kfuncs from libbpf's bpf_helpers.h, they'll be
          added to selftests' bpf_experimental.h in selftests patch below (Andrii)
        * Split bpf_iter_task_vma.c renaming into separate commit (Andrii)
      
      Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter")
        * Add bpf_iter_task_vma kfuncs to bpf_experimental.h (Andrii)
        * Remove '?' from prog SEC, open_and_load the skel in one operation (Andrii)
        * Ensure that fclose() always happens in test runner (Andrii)
        * Use global var w/ 1000 (vm_start, vm_end) structs instead of two
          MAP_TYPE_ARRAY's w/ 1k u64s each (Andrii)
      
      v3 -> v4: https://lore.kernel.org/bpf/20230822050558.2937659-1-davemarchevsky@fb.com/
      
      Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num")
        * Add Andrii ack
      Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
        * Mark bpf_iter_task_vma_new args KF_RCU and remove now-unnecessary !task
          check (Yonghong)
          * Although KF_RCU is a function-level flag, in reality it only applies to
            the task_struct *task parameter, as the other two params are a scalar int
            and a specially-handled KF_ARG_PTR_TO_ITER
         * Remove struct bpf_iter_task_vma definition from uapi headers, define in
           kernel/bpf/task_iter.c instead (Andrii)
      Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter")
        * Use a local var when looping over vmas to track map idx. Update vmas_seen
          global after done iterating. Don't start iterating or update vmas_seen if
          vmas_seen global is nonzero. (Andrii)
        * Move getpgid() call to correct spot - above skel detach. (Andrii)
      
      v2 -> v3: https://lore.kernel.org/bpf/20230821173415.1970776-1-davemarchevsky@fb.com/
      
      Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num")
        * Add Yonghong ack
      
      Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs")
        * UAPI bpf header and tools/ version should match
        * Add bpf_iter_task_vma_kern_data which bpf_iter_task_vma_kern points to,
          bpf_mem_alloc/free it instead of just vma_iterator. (Alexei)
          * Inner data ptr == NULL implies initialization failed
      
      v1 -> v2: https://lore.kernel.org/bpf/20230810183513.684836-1-davemarchevsky@fb.com/
        * Patch 1
          * Now removes the unnecessary BTF_TYPE_EMIT instead of changing the
            type (Yonghong)
        * Patch 2
          * Don't do unnecessary BTF_TYPE_EMIT (Yonghong)
          * Bump task refcount to prevent ->mm reuse (Yonghong)
          * Keep a pointer to vma_iterator in bpf_iter_task_vma, alloc/free
            via BPF mem allocator (Yonghong, Stanislav)
        * Patch 3
      
        [0]: https://lore.kernel.org/bpf/20230801145414.418145-1-davemarchevsky@fb.com/
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      0e10fd4b
    • Dave Marchevsky's avatar
      selftests/bpf: Add tests for open-coded task_vma iter · e0e1a7a5
      Dave Marchevsky authored
      The open-coded task_vma iter added earlier in this series allows for
      natural iteration over a task's vmas using existing open-coded iter
      infrastructure, specifically bpf_for_each.
      
      This patch adds a test demonstrating this pattern and validating
      correctness. The vma->vm_start and vma->vm_end addresses of the first
      1000 vmas are recorded and compared to /proc/PID/maps output. As
      expected, both see the same vmas and addresses - with the exception of
      the [vsyscall] vma - which is explained in a comment in the prog_tests
      program.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20231013204426.1074286-5-davemarchevsky@fb.com
      e0e1a7a5
    • Dave Marchevsky's avatar
      bpf: Introduce task_vma open-coded iterator kfuncs · 4ac45468
      Dave Marchevsky authored
      This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow
      creation and manipulation of struct bpf_iter_task_vma in open-coded
      iterator style. BPF programs can use these kfuncs directly or through
      bpf_for_each macro for natural-looking iteration of all task vmas.
      
      The implementation borrows heavily from bpf_find_vma helper's locking -
      differing only in that it holds the mmap_read lock for all iterations
      while the helper only executes its provided callback on a maximum of 1
      vma. Aside from locking, struct vma_iterator and vma_next do all the
      heavy lifting.
      
      A pointer to an inner data struct, struct bpf_iter_task_vma_data, is the
      only field in struct bpf_iter_task_vma. This is because the inner data
      struct contains a struct vma_iterator (not ptr), whose size is likely to
      change under us. If bpf_iter_task_vma_kern contained vma_iterator directly
      such a change would require change in opaque bpf_iter_task_vma struct's
      size. So better to allocate vma_iterator using BPF allocator, and since
      that alloc must already succeed, might as well allocate all iter fields,
      thereby freezing struct bpf_iter_task_vma size.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20231013204426.1074286-4-davemarchevsky@fb.com
      4ac45468
    • Dave Marchevsky's avatar
      selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c · 45b38941
      Dave Marchevsky authored
      Further patches in this series will add a struct bpf_iter_task_vma,
      which will result in a name collision with the selftest prog renamed in
      this patch. Rename the selftest to avoid the collision.
      Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20231013204426.1074286-3-davemarchevsky@fb.com
      45b38941
    • Dave Marchevsky's avatar
      bpf: Don't explicitly emit BTF for struct btf_iter_num · f10ca5da
      Dave Marchevsky authored
      Commit 6018e1f4 ("bpf: implement numbers iterator") added the
      BTF_TYPE_EMIT line that this patch is modifying. The struct btf_iter_num
      doesn't exist, so only a forward declaration is emitted in BTF:
      
        FWD 'btf_iter_num' fwd_kind=struct
      
      That commit was probably hoping to ensure that struct bpf_iter_num is
      emitted in vmlinux BTF. A previous version of this patch changed the
      line to emit the correct type, but Yonghong confirmed that it would
      definitely be emitted regardless in [0], so this patch simply removes
      the line.
      
      This isn't marked "Fixes" because the extraneous btf_iter_num FWD wasn't
      causing any issues that I noticed, aside from mild confusion when I
      looked through the code.
      
        [0]: https://lore.kernel.org/bpf/25d08207-43e6-36a8-5e0f-47a913d4cda5@linux.dev/Signed-off-by: default avatarDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20231013204426.1074286-2-davemarchevsky@fb.com
      f10ca5da
    • Artem Savkov's avatar
      bpf: Change syscall_nr type to int in struct syscall_tp_t · ba8ea723
      Artem Savkov authored
      linux-rt-devel tree contains a patch (b1773eac3f29c ("sched: Add support
      for lazy preemption")) that adds an extra member to struct trace_entry.
      This causes the offset of args field in struct trace_event_raw_sys_enter
      be different from the one in struct syscall_trace_enter:
      
      struct trace_event_raw_sys_enter {
              struct trace_entry         ent;                  /*     0    12 */
      
              /* XXX last struct has 3 bytes of padding */
              /* XXX 4 bytes hole, try to pack */
      
              long int                   id;                   /*    16     8 */
              long unsigned int          args[6];              /*    24    48 */
              /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
              char                       __data[];             /*    72     0 */
      
              /* size: 72, cachelines: 2, members: 4 */
              /* sum members: 68, holes: 1, sum holes: 4 */
              /* paddings: 1, sum paddings: 3 */
              /* last cacheline: 8 bytes */
      };
      
      struct syscall_trace_enter {
              struct trace_entry         ent;                  /*     0    12 */
      
              /* XXX last struct has 3 bytes of padding */
      
              int                        nr;                   /*    12     4 */
              long unsigned int          args[];               /*    16     0 */
      
              /* size: 16, cachelines: 1, members: 3 */
              /* paddings: 1, sum paddings: 3 */
              /* last cacheline: 16 bytes */
      };
      
      This, in turn, causes perf_event_set_bpf_prog() fail while running bpf
      test_profiler testcase because max_ctx_offset is calculated based on the
      former struct, while off on the latter:
      
        10488         if (is_tracepoint || is_syscall_tp) {
        10489                 int off = trace_event_get_offsets(event->tp_event);
        10490
        10491                 if (prog->aux->max_ctx_offset > off)
        10492                         return -EACCES;
        10493         }
      
      What bpf program is actually getting is a pointer to struct
      syscall_tp_t, defined in kernel/trace/trace_syscalls.c. This patch fixes
      the problem by aligning struct syscall_tp_t with struct
      syscall_trace_(enter|exit) and changing the tests to use these structs
      to dereference context.
      Signed-off-by: default avatarArtem Savkov <asavkov@redhat.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Link: https://lore.kernel.org/bpf/20231013054219.172920-1-asavkov@redhat.com
      ba8ea723
    • Martin KaFai Lau's avatar
      net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set · 9c1292ec
      Martin KaFai Lau authored
      It was reported that there is a compiler warning on the unused variable
      "sin_addr_len" in af_inet.c when CONFIG_CGROUP_BPF is not set.
      This patch is to address it similar to the ipv6 counterpart
      in inet6_getname(). It is to "return sin_addr_len;"
      instead of "return sizeof(*sin);".
      
      Fixes: fefba7d1 ("bpf: Propagate modified uaddrlen from cgroup sockaddr programs")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/bpf/20231013185702.3993710-1-martin.lau@linux.dev
      
      Closes: https://lore.kernel.org/bpf/20231013114007.2fb09691@canb.auug.org.au/
      9c1292ec
    • Yafang Shao's avatar
      bpf: Avoid unnecessary audit log for CPU security mitigations · 236334ae
      Yafang Shao authored
      Check cpu_mitigations_off() first to avoid calling capable() if it is off.
      This can avoid unnecessary audit log.
      
      Fixes: bc5bc309 ("bpf: Inherit system settings for CPU security mitigations")
      Suggested-by: default avatarAndrii Nakryiko <andrii.nakryiko@gmail.com>
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/CAEf4Bza6UVUWqcWQ-66weZ-nMDr+TFU3Mtq=dumZFD-pSqU7Ow@mail.gmail.com/
      Link: https://lore.kernel.org/bpf/20231013083916.4199-1-laoar.shao@gmail.com
      236334ae
  2. 12 Oct, 2023 7 commits
    • Martin KaFai Lau's avatar
      Merge branch 'Add cgroup sockaddr hooks for unix sockets' · d2dc885b
      Martin KaFai Lau authored
      Daan De Meyer says:
      
      ====================
      Changes since v10:
      
      * Removed extra check from bpf_sock_addr_set_sun_path() again in favor of
        calling unix_validate_addr() everywhere in af_unix.c before calling the hooks.
      
      Changes since v9:
      
      * Renamed bpf_sock_addr_set_unix_addr() to bpf_sock_addr_set_sun_path() and
        rennamed arguments to match the new name.
      * Added an extra check to bpf_sock_addr_set_sun_path() to disallow changing the
        address of an unnamed unix socket.
      * Removed unnecessary NULL check on uaddrlen in
        __cgroup_bpf_run_filter_sock_addr().
      
      Changes since v8:
      
      * Added missing test programs to last patch
      
      Changes since v7:
      
      * Fixed formatting nit in comment
      * Renamed from cgroup/connectun to cgroup/connect_unix (and similar for all
        other hooks)
      
      Changes since v6:
      
      * Actually removed bpf_bind() helper for AF_UNIX hooks.
      * Fixed merge conflict
      * Updated comment to mention uaddrlen is read-only for AF_INET[6]
      * Removed unnecessary forward declaration of struct sock_addr_test
      * Removed unused BPF_CGROUP_RUN_PROG_UNIX_CONNECT()
      * Fixed formatting nit reported by checkpatch
      * Added more information to commit message about recvmsg() on connected socket
      
      Changes since v5:
      
      * Fixed kernel version in bpftool documentation (6.3 => 6.7).
      * Added connection mode socket recvmsg() test.
      * Removed bpf_bind() helper for AF_UNIX hooks.
      * Added missing getpeernameun and getsocknameun BPF test programs.
      * Added note for bind() test being unused currently.
      
      Changes since v4:
      
      * Dropped support for intercepting bind() as when using bind() with unix sockets
        and a pathname sockaddr, bind() will create an inode in the filesystem that
        needs to be cleaned up. If the address is rewritten, users might try to clean
        up the wrong file and leak the actual socket file in the filesystem.
      * Changed bpf_sock_addr_set_unix_addr() to use BTF_KFUNC_HOOK_CGROUP_SKB instead
        of BTF_KFUNC_HOOK_COMMON.
      * Removed unix socket related changes from BPF_CGROUP_PRE_CONNECT_ENABLED() as
        unix sockets do not support pre-connect.
      * Added tests for getpeernameun and getsocknameun hooks.
      * We now disallow an empty sockaddr in bpf_sock_addr_set_unix_addr() similar to
        unix_validate_addr().
      * Removed unnecessary cgroup_bpf_enabled() checks
      * Removed unnecessary error checks
      
      Changes since v3:
      
      * Renamed bpf_sock_addr_set_addr() to bpf_sock_addr_set_unix_addr() and
        made it only operate on AF_UNIX sockaddrs. This is because for the other
        families, users usually want to configure more than just the address so
        a generic interface will not fit the bill here. e.g. for AF_INET and AF_INET6,
        users would generally also want to be able to configure the port which the
        current interface doesn't support. So we expose an AF_UNIX specific function
        instead.
      * Made the tests in the new sock addr tests more generic (similar to test_sock_addr.c),
        this should make it easier to migrate the other sock addr tests in the future.
      * Removed the new kfunc hook and attached to BTF_KFUNC_HOOK_COMMON instead
      * Set uaddrlen to 0 when the family is AF_UNSPEC
      * Pass in the addrlen to the hook from IPv6 code
      * Fixed mount directory mkdir() to ignore EEXIST
      
      Changes since v2:
      
      * Configuring the sock addr is now done via a new kfunc bpf_sock_addr_set()
      * The addrlen is exposed as u32 in bpf_sock_addr_kern
      * Selftests are updated to use the new kfunc
      * Selftests are now added as a new sock_addr test in prog_tests/
      * Added BTF_KFUNC_HOOK_SOCK_ADDR for BPF_PROG_TYPE_CGROUP_SOCK_ADDR
      * __cgroup_bpf_run_filter_sock_addr() now returns the modified addrlen
      
      Changes since v1:
      
      * Split into multiple patches instead of one single patch
      * Added unix support for all socket address hooks instead of only connect()
      * Switched approach to expose the socket address length to the bpf hook
      instead of recalculating the socket address length in kernelspace to
      properly support abstract unix socket addresses
      * Modified socket address hook tests to calculate the socket address length
      once and pass it around everywhere instead of recalculating the actual unix
      socket address length on demand.
      * Added some missing section name tests for getpeername()/getsockname()
      
      This patch series extends the cgroup sockaddr hooks to include support for unix
      sockets. To add support for unix sockets, struct bpf_sock_addr_kern is extended
      to expose the socket address length to the bpf program. Along with that, a new
      kfunc bpf_sock_addr_set_unix_addr() is added to safely allow modifying an
      AF_UNIX sockaddr from bpf programs.
      
      I intend to use these new hooks in systemd to reimplement the LogNamespace=
      feature, which allows running multiple instances of systemd-journald to
      process the logs of different services. systemd-journald also processes
      syslog messages, so currently, using log namespaces means all services running
      in the same log namespace have to live in the same private mount namespace
      so that systemd can mount the journal namespace's associated syslog socket
      over /dev/log to properly direct syslog messages from all services running
      in that log namespace to the correct systemd-journald instance. We want to
      relax this requirement so that processes running in disjoint mount namespaces
      can still run in the same log namespace. To achieve this, we can use these
      new hooks to rewrite the socket address of any connect(), sendto(), ...
      syscalls to /dev/log to the socket address of the journal namespace's syslog
      socket instead, which will transparently do the redirection without requiring
      use of a mount namespace and mounting over /dev/log.
      
      Aside from the above usecase, these hooks can more generally be used to
      transparently redirect unix sockets to different addresses as required by
      services.
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      d2dc885b
    • Daan De Meyer's avatar
      selftests/bpf: Add tests for cgroup unix socket address hooks · 82ab6b50
      Daan De Meyer authored
      These selftests are written in prog_tests style instead of adding
      them to the existing test_sock_addr tests. Migrating the existing
      sock addr tests to prog_tests style is left for future work. This
      commit adds support for testing bind() sockaddr hooks, even though
      there's no unix socket sockaddr hook for bind(). We leave this code
      intact for when the INET and INET6 tests are migrated in the future
      which do support intercepting bind().
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-10-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      82ab6b50
    • Daan De Meyer's avatar
      selftests/bpf: Make sure mount directory exists · af2752ed
      Daan De Meyer authored
      The mount directory for the selftests cgroup tree might
      not exist so let's make sure it does exist by creating
      it ourselves if it doesn't exist.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-9-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      af2752ed
    • Daan De Meyer's avatar
      documentation/bpf: Document cgroup unix socket address hooks · 3243fef6
      Daan De Meyer authored
      Update the documentation to mention the new cgroup unix sockaddr
      hooks.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-8-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      3243fef6
    • Daan De Meyer's avatar
      bpftool: Add support for cgroup unix socket address hooks · 8b3cba98
      Daan De Meyer authored
      Add the necessary plumbing to hook up the new cgroup unix sockaddr
      hooks into bpftool.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Acked-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-7-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      8b3cba98
    • Daan De Meyer's avatar
      libbpf: Add support for cgroup unix socket address hooks · bf90438c
      Daan De Meyer authored
      Add the necessary plumbing to hook up the new cgroup unix sockaddr
      hooks into libbpf.
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-6-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      bf90438c
    • Daan De Meyer's avatar
      bpf: Implement cgroup sockaddr hooks for unix sockets · 859051dd
      Daan De Meyer authored
      These hooks allows intercepting connect(), getsockname(),
      getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
      socket hooks get write access to the address length because the
      address length is not fixed when dealing with unix sockets and
      needs to be modified when a unix socket address is modified by
      the hook. Because abstract socket unix addresses start with a
      NUL byte, we cannot recalculate the socket address in kernelspace
      after running the hook by calculating the length of the unix socket
      path using strlen().
      
      These hooks can be used when users want to multiplex syscall to a
      single unix socket to multiple different processes behind the scenes
      by redirecting the connect() and other syscalls to process specific
      sockets.
      
      We do not implement support for intercepting bind() because when
      using bind() with unix sockets with a pathname address, this creates
      an inode in the filesystem which must be cleaned up. If we rewrite
      the address, the user might try to clean up the wrong file, leaking
      the socket in the filesystem where it is never cleaned up. Until we
      figure out a solution for this (and a use case for intercepting bind()),
      we opt to not allow rewriting the sockaddr in bind() calls.
      
      We also implement recvmsg() support for connected streams so that
      after a connect() that is modified by a sockaddr hook, any corresponding
      recmvsg() on the connected socket can also be modified to make the
      connected program think it is connected to the "intended" remote.
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDaan De Meyer <daan.j.demeyer@gmail.com>
      Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      859051dd
  3. 11 Oct, 2023 3 commits
  4. 09 Oct, 2023 7 commits
  5. 06 Oct, 2023 8 commits
  6. 04 Oct, 2023 7 commits