Commit 52839f31 authored by Alexei Starovoitov's avatar Alexei Starovoitov Committed by Andrii Nakryiko

Merge branch 'no_caller_saved_registers-attribute-for-helper-calls'

Eduard Zingerman says:

====================
no_caller_saved_registers attribute for helper calls

This patch-set seeks to allow using no_caller_saved_registers gcc/clang
attribute with some BPF helper functions (and kfuncs in the future).

As documented in [1], this attribute means that function scratches
only some of the caller saved registers defined by ABI.
For BPF the set of such registers could be defined as follows:
- R0 is scratched only if function is non-void;
- R1-R5 are scratched only if corresponding parameter type is defined
  in the function prototype.

The goal of the patch-set is to implement no_caller_saved_registers
(nocsr for short) in a backwards compatible manner:
- for kernels that support the feature, gain some performance boost
  from better register allocation;
- for kernels that don't support the feature, allow programs execution
  with minor performance losses.

To achieve this, use a scheme suggested by Alexei Starovoitov:
- for nocsr calls clang allocates registers as-if relevant r0-r5
  registers are not scratched by the call;
- as a post-processing step, clang visits each nocsr call and adds
  spill/fill for every live r0-r5;
- stack offsets used for spills/fills are allocated as lowest
  stack offsets in whole function and are not used for any other
  purpose;
- when kernel loads a program, it looks for such patterns
  (nocsr function surrounded by spills/fills) and checks if
  spill/fill stack offsets are used exclusively in nocsr patterns;
- if so, and if current JIT inlines the call to the nocsr function
  (e.g. a helper call), kernel removes unnecessary spill/fill pairs;
- when old kernel loads a program, presence of spill/fill pairs
  keeps BPF program valid, albeit slightly less efficient.

Corresponding clang/llvm changes are available in [2].

The patch-set uses bpf_get_smp_processor_id() function as a canary,
making it the first helper with nocsr attribute.

For example, consider the following program:

  #define __no_csr __attribute__((no_caller_saved_registers))
  #define SEC(name) __attribute__((section(name), used))
  #define bpf_printk(fmt, ...) bpf_trace_printk((fmt), sizeof(fmt), __VA_ARGS__)

  typedef unsigned int __u32;

  static long (* const bpf_trace_printk)(const char *fmt, __u32 fmt_size, ...) = (void *) 6;
  static __u32 (*const bpf_get_smp_processor_id)(void) __no_csr = (void *)8;

  SEC("raw_tp")
  int test(void *ctx)
  {
          __u32 task = bpf_get_smp_processor_id();
  	bpf_printk("ctx=%p, smp=%d", ctx, task);
  	return 0;
  }

  char _license[] SEC("license") = "GPL";

Compiled (using [2]) as follows:

  $ clang --target=bpf -O2 -g -c -o nocsr.bpf.o nocsr.bpf.c
  $ llvm-objdump --no-show-raw-insn -Sd nocsr.bpf.o
    ...
  3rd parameter for printk call     removable spill/fill pair
  .--- 0:       r3 = r1                             |
; |       __u32 task = bpf_get_smp_processor_id();  |
  |    1:       *(u64 *)(r10 - 0x8) = r3 <----------|
  |    2:       call 0x8                            |
  |    3:       r3 = *(u64 *)(r10 - 0x8) <----------'
; |     bpf_printk("ctx=%p, smp=%d", ctx, task);
  |    4:       r1 = 0x0 ll
  |    6:       r2 = 0xf
  |    7:       r4 = r0
  '--> 8:       call 0x6
;       return 0;
       9:       r0 = 0x0
      10:       exit

Here is how the program looks after verifier processing:

  # bpftool prog load ./nocsr.bpf.o /sys/fs/bpf/nocsr-test
  # bpftool prog dump xlated pinned /sys/fs/bpf/nocsr-test

  int test(void * ctx):
     0: (bf) r3 = r1                         <--- 3rd printk parameter
  ; __u32 task = bpf_get_smp_processor_id();
     1: (b4) w0 = 197324                     <--. inlined helper call,
     2: (bf) r0 = &(void __percpu *)(r0)     <--- spill/fill
     3: (61) r0 = *(u32 *)(r0 +0)            <--' pair removed
  ; bpf_printk("ctx=%p, smp=%d", ctx, task);
     4: (18) r1 = map[id:5][0]+0
     6: (b7) r2 = 15
     7: (bf) r4 = r0
     8: (85) call bpf_trace_printk#-125920
  ; return 0;
     9: (b7) r0 = 0
    10: (95) exit

[1] https://clang.llvm.org/docs/AttributeReference.html#no-caller-saved-registers
[2] https://github.com/eddyz87/llvm-project/tree/bpf-no-caller-saved-registers

Change list:
- v3 -> v4:
  - When nocsr spills/fills are removed in the subprogram, allow these
    spills/fills to reside in [-MAX_BPF_STACK-48..MAX_BPF_STACK) range
    (suggested by Alexei);
  - Dropped patches with special handling for bpf_probe_read_kernel()
    (requested by Alexei);
  - Reset aux .nocsr_pattern and .nocsr_spills_num fields in
    check_nocsr_stack_contract() (requested by Andrii).
    Andrii, I have not added an additional flag to
    struct bpf_subprog_info, it currently does not have holes
    and I really don't like adding a bool field there just as an
    alternative indicator that nocsr is disabled.
    Indicator at the moment:
    - nocsr_stack_off >= S16_MIN means that nocsr rewrite is enabled;
    - nocsr_stack_off == S16_MIN means that nocsr rewrite is disabled.
- v2 -> v3:
  - As suggested by Andrii, 'nocsr_stack_off' is no longer checked at
    rewrite time, instead mark_nocsr_patterns() now does two passes
    over BPF program:
    - on a first pass it computes the lowest stack spill offset for
      the subprogram;
    - on a second pass this offset is used to recognize nocsr pattern.
  - As suggested by Alexei, a new mechanic is added to work around a
    situation mentioned by Andrii, when more helper functions are
    marked as nocsr at compile time than current kernel supports:
    - all {spill*,helper call,fill*} patterns are now marked as
      insn_aux_data[*].nocsr_pattern, thus relaxing failure condition
      for check_nocsr_stack_contract();
    - spill/fill pairs are not removed for patterns where helper can't
      be inlined;
    - see mark_nocsr_pattern_for_call() for details an example.
  - As suggested by Alexei, subprogram stack depth is now adjusted
    if all spill/fill pairs could be removed. This adjustment has
    to take place before optimize_bpf_loop(), hence the rewrite
    is moved from do_misc_fixups() to remove_nocsr_spills_fills()
    (again).
  - As suggested by Andrii, special measures are taken to work around
    bpf_probe_read_kernel() access to BPF stack, see patches 11, 12.
    Patch #11 is very simplistic, a more comprehensive solution would
    be to change the type of the third parameter of the
    bpf_probe_read_kernel() from ARG_ANYTHING to something else and
    not only check nocsr contract, but also propagate stack slot
    liveness information. However, such change would require update in
    struct bpf_call_arg_meta processing, which currently implies that
    every memory parameter is followed by a size parameter.
    I can work on these changes, please comment.
  - Stylistic changes suggested by Andrii.
  - Added acks from Andrii.
  - Dropped RFC tag.
- v1 -> v2:
  - assume that functions inlined by either jit or verifier
    conform to no_caller_saved_registers contract (Andrii, Puranjay);
  - allow nocsr rewrite for bpf_get_smp_processor_id()
    on arm64 and riscv64 architectures (Puranjay);
  - __arch_{x86_64,arm64,riscv64} macro for test_loader;
  - moved remove_nocsr_spills_fills() inside do_misc_fixups() (Andrii);
  - moved nocsr pattern detection from check_cfg() to a separate pass
    (Andrii);
  - various stylistic/correctness changes according to Andrii's
    comments.

Revisions:
- v1 https://lore.kernel.org/bpf/20240629094733.3863850-1-eddyz87@gmail.com/
- v2 https://lore.kernel.org/bpf/20240704102402.1644916-1-eddyz87@gmail.com/
- v3 https://lore.kernel.org/bpf/20240715230201.3901423-1-eddyz87@gmail.com/
====================

Link: https://lore.kernel.org/r/20240722233844.1406874-1-eddyz87@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
parents 26672b5c d0ad1f8f
......@@ -808,6 +808,12 @@ struct bpf_func_proto {
bool gpl_only;
bool pkt_access;
bool might_sleep;
/* set to true if helper follows contract for gcc/llvm
* attribute no_caller_saved_registers:
* - void functions do not scratch r0
* - functions taking N arguments scratch only registers r1-rN
*/
bool allow_nocsr;
enum bpf_return_type ret_type;
union {
struct {
......
......@@ -576,6 +576,14 @@ struct bpf_insn_aux_data {
bool is_iter_next; /* bpf_iter_<type>_next() kfunc call */
bool call_with_percpu_alloc_ptr; /* {this,per}_cpu_ptr() with prog percpu alloc */
u8 alu_state; /* used in combination with alu_limit */
/* true if STX or LDX instruction is a part of a spill/fill
* pattern for a no_caller_saved_registers call.
*/
u8 nocsr_pattern:1;
/* for CALL instructions, a number of spill/fill pairs in the
* no_caller_saved_registers pattern.
*/
u8 nocsr_spills_num:3;
/* below fields are initialized once */
unsigned int orig_idx; /* original instruction index */
......@@ -645,6 +653,10 @@ struct bpf_subprog_info {
u32 linfo_idx; /* The idx to the main_prog->aux->linfo */
u16 stack_depth; /* max. stack depth used by this function */
u16 stack_extra;
/* offsets in range [stack_depth .. nocsr_stack_off)
* are used for no_caller_saved_registers spills and fills.
*/
s16 nocsr_stack_off;
bool has_tail_call: 1;
bool tail_call_reachable: 1;
bool has_ld_abs: 1;
......@@ -652,6 +664,8 @@ struct bpf_subprog_info {
bool is_async_cb: 1;
bool is_exception_cb: 1;
bool args_cached: 1;
/* true if nocsr stack region is used by functions that can't be inlined */
bool keep_nocsr_stack: 1;
u8 arg_cnt;
struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
......
......@@ -158,6 +158,7 @@ const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
.func = bpf_get_smp_processor_id,
.gpl_only = false,
.ret_type = RET_INTEGER,
.allow_nocsr = true,
};
BPF_CALL_0(bpf_get_numa_node_id)
......
This diff is collapsed.
......@@ -661,6 +661,7 @@ TRUNNER_EXTRA_SOURCES := test_progs.c \
test_loader.c \
xsk.c \
disasm.c \
disasm_helpers.c \
json_writer.c \
flow_dissector_load.h \
ip_check_defrag_frags.h
......
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
#include <bpf/bpf.h>
#include "disasm.h"
struct print_insn_context {
char scratch[16];
char *buf;
size_t sz;
};
static void print_insn_cb(void *private_data, const char *fmt, ...)
{
struct print_insn_context *ctx = private_data;
va_list args;
va_start(args, fmt);
vsnprintf(ctx->buf, ctx->sz, fmt, args);
va_end(args);
}
static const char *print_call_cb(void *private_data, const struct bpf_insn *insn)
{
struct print_insn_context *ctx = private_data;
/* For pseudo calls verifier.c:jit_subprogs() hides original
* imm to insn->off and changes insn->imm to be an index of
* the subprog instead.
*/
if (insn->src_reg == BPF_PSEUDO_CALL) {
snprintf(ctx->scratch, sizeof(ctx->scratch), "%+d", insn->off);
return ctx->scratch;
}
return NULL;
}
struct bpf_insn *disasm_insn(struct bpf_insn *insn, char *buf, size_t buf_sz)
{
struct print_insn_context ctx = {
.buf = buf,
.sz = buf_sz,
};
struct bpf_insn_cbs cbs = {
.cb_print = print_insn_cb,
.cb_call = print_call_cb,
.private_data = &ctx,
};
char *tmp, *pfx_end, *sfx_start;
bool double_insn;
int len;
print_bpf_insn(&cbs, insn, true);
/* We share code with kernel BPF disassembler, it adds '(FF) ' prefix
* for each instruction (FF stands for instruction `code` byte).
* Remove the prefix inplace, and also simplify call instructions.
* E.g.: "(85) call foo#10" -> "call foo".
* Also remove newline in the end (the 'max(strlen(buf) - 1, 0)' thing).
*/
pfx_end = buf + 5;
sfx_start = buf + max((int)strlen(buf) - 1, 0);
if (strncmp(pfx_end, "call ", 5) == 0 && (tmp = strrchr(buf, '#')))
sfx_start = tmp;
len = sfx_start - pfx_end;
memmove(buf, pfx_end, len);
buf[len] = 0;
double_insn = insn->code == (BPF_LD | BPF_IMM | BPF_DW);
return insn + (double_insn ? 2 : 1);
}
/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
#ifndef __DISASM_HELPERS_H
#define __DISASM_HELPERS_H
#include <stdlib.h>
struct bpf_insn;
struct bpf_insn *disasm_insn(struct bpf_insn *insn, char *buf, size_t buf_sz);
#endif /* __DISASM_HELPERS_H */
......@@ -10,7 +10,8 @@
#include "bpf/btf.h"
#include "bpf_util.h"
#include "linux/filter.h"
#include "disasm.h"
#include "linux/kernel.h"
#include "disasm_helpers.h"
#define MAX_PROG_TEXT_SZ (32 * 1024)
......@@ -628,63 +629,6 @@ static bool match_pattern(struct btf *btf, char *pattern, char *text, char *reg_
return false;
}
static void print_insn(void *private_data, const char *fmt, ...)
{
va_list args;
va_start(args, fmt);
vfprintf((FILE *)private_data, fmt, args);
va_end(args);
}
/* Disassemble instructions to a stream */
static void print_xlated(FILE *out, struct bpf_insn *insn, __u32 len)
{
const struct bpf_insn_cbs cbs = {
.cb_print = print_insn,
.cb_call = NULL,
.cb_imm = NULL,
.private_data = out,
};
bool double_insn = false;
int i;
for (i = 0; i < len; i++) {
if (double_insn) {
double_insn = false;
continue;
}
double_insn = insn[i].code == (BPF_LD | BPF_IMM | BPF_DW);
print_bpf_insn(&cbs, insn + i, true);
}
}
/* We share code with kernel BPF disassembler, it adds '(FF) ' prefix
* for each instruction (FF stands for instruction `code` byte).
* This function removes the prefix inplace for each line in `str`.
*/
static void remove_insn_prefix(char *str, int size)
{
const int prefix_size = 5;
int write_pos = 0, read_pos = prefix_size;
int len = strlen(str);
char c;
size = min(size, len);
while (read_pos < size) {
c = str[read_pos++];
if (c == 0)
break;
str[write_pos++] = c;
if (c == '\n')
read_pos += prefix_size;
}
str[write_pos] = 0;
}
struct prog_info {
char *prog_kind;
enum bpf_prog_type prog_type;
......@@ -699,9 +643,10 @@ static void match_program(struct btf *btf,
char *reg_map[][2],
bool skip_first_insn)
{
struct bpf_insn *buf = NULL;
struct bpf_insn *buf = NULL, *insn, *insn_end;
int err = 0, prog_fd = 0;
FILE *prog_out = NULL;
char insn_buf[64];
char *text = NULL;
__u32 cnt = 0;
......@@ -739,12 +684,13 @@ static void match_program(struct btf *btf,
PRINT_FAIL("Can't open memory stream\n");
goto out;
}
if (skip_first_insn)
print_xlated(prog_out, buf + 1, cnt - 1);
else
print_xlated(prog_out, buf, cnt);
insn_end = buf + cnt;
insn = buf + (skip_first_insn ? 1 : 0);
while (insn < insn_end) {
insn = disasm_insn(insn, insn_buf, sizeof(insn_buf));
fprintf(prog_out, "%s\n", insn_buf);
}
fclose(prog_out);
remove_insn_prefix(text, MAX_PROG_TEXT_SZ);
ASSERT_TRUE(match_pattern(btf, pattern, text, reg_map),
pinfo->prog_kind);
......
......@@ -53,6 +53,7 @@
#include "verifier_movsx.skel.h"
#include "verifier_netfilter_ctx.skel.h"
#include "verifier_netfilter_retcode.skel.h"
#include "verifier_nocsr.skel.h"
#include "verifier_or_jmp32_k.skel.h"
#include "verifier_precision.skel.h"
#include "verifier_prevent_map_lookup.skel.h"
......@@ -173,6 +174,7 @@ void test_verifier_meta_access(void) { RUN(verifier_meta_access); }
void test_verifier_movsx(void) { RUN(verifier_movsx); }
void test_verifier_netfilter_ctx(void) { RUN(verifier_netfilter_ctx); }
void test_verifier_netfilter_retcode(void) { RUN(verifier_netfilter_retcode); }
void test_verifier_nocsr(void) { RUN(verifier_nocsr); }
void test_verifier_or_jmp32_k(void) { RUN(verifier_or_jmp32_k); }
void test_verifier_precision(void) { RUN(verifier_precision); }
void test_verifier_prevent_map_lookup(void) { RUN(verifier_prevent_map_lookup); }
......
......@@ -26,6 +26,9 @@
*
* __regex Same as __msg, but using a regular expression.
* __regex_unpriv Same as __msg_unpriv but using a regular expression.
* __xlated Expect a line in a disassembly log after verifier applies rewrites.
* Multiple __xlated attributes could be specified.
* __xlated_unpriv Same as __xlated but for unprivileged mode.
*
* __success Expect program load success in privileged mode.
* __success_unpriv Expect program load success in unprivileged mode.
......@@ -60,14 +63,20 @@
* __auxiliary Annotated program is not a separate test, but used as auxiliary
* for some other test cases and should always be loaded.
* __auxiliary_unpriv Same, but load program in unprivileged mode.
*
* __arch_* Specify on which architecture the test case should be tested.
* Several __arch_* annotations could be specified at once.
* When test case is not run on current arch it is marked as skipped.
*/
#define __msg(msg) __attribute__((btf_decl_tag("comment:test_expect_msg=" msg)))
#define __regex(regex) __attribute__((btf_decl_tag("comment:test_expect_regex=" regex)))
#define __xlated(msg) __attribute__((btf_decl_tag("comment:test_expect_xlated=" msg)))
#define __failure __attribute__((btf_decl_tag("comment:test_expect_failure")))
#define __success __attribute__((btf_decl_tag("comment:test_expect_success")))
#define __description(desc) __attribute__((btf_decl_tag("comment:test_description=" desc)))
#define __msg_unpriv(msg) __attribute__((btf_decl_tag("comment:test_expect_msg_unpriv=" msg)))
#define __regex_unpriv(regex) __attribute__((btf_decl_tag("comment:test_expect_regex_unpriv=" regex)))
#define __xlated_unpriv(msg) __attribute__((btf_decl_tag("comment:test_expect_xlated_unpriv=" msg)))
#define __failure_unpriv __attribute__((btf_decl_tag("comment:test_expect_failure_unpriv")))
#define __success_unpriv __attribute__((btf_decl_tag("comment:test_expect_success_unpriv")))
#define __log_level(lvl) __attribute__((btf_decl_tag("comment:test_log_level="#lvl)))
......@@ -77,6 +86,10 @@
#define __auxiliary __attribute__((btf_decl_tag("comment:test_auxiliary")))
#define __auxiliary_unpriv __attribute__((btf_decl_tag("comment:test_auxiliary_unpriv")))
#define __btf_path(path) __attribute__((btf_decl_tag("comment:test_btf_path=" path)))
#define __arch(arch) __attribute__((btf_decl_tag("comment:test_arch=" arch)))
#define __arch_x86_64 __arch("X86_64")
#define __arch_arm64 __arch("ARM64")
#define __arch_riscv64 __arch("RISCV64")
/* Convenience macro for use with 'asm volatile' blocks */
#define __naked __attribute__((naked))
......
This diff is collapsed.
This diff is collapsed.
......@@ -447,7 +447,6 @@ typedef int (*pre_execution_cb)(struct bpf_object *obj);
struct test_loader {
char *log_buf;
size_t log_buf_sz;
size_t next_match_pos;
pre_execution_cb pre_execution_cb;
struct bpf_object *obj;
......
......@@ -7,6 +7,7 @@
#include <errno.h>
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include "disasm.h"
#include "test_progs.h"
#include "testing_helpers.h"
#include <linux/membarrier.h>
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment