Add new targeted error injection tool

bpf_override_return is a very powerful mechanism for error injection, with the caveat that it requires whitelisting of the functions to be overriden. inject.py will take a call chain and optional set of predicates as input, and inject the appropriate error when both the call chain and all predicates are satisfied. Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>

Add new targeted error injection tool
bpf_override_return is a very powerful mechanism for error injection, with the caveat that it requires whitelisting of the functions to be overriden. inject.py will take a call chain and optional set of predicates as input, and inject the appropriate error when both the call chain and all predicates are satisfied. Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
ef4154b1 · Howard McLauchlan · d6f716bc · ef4154b1 · ef4154b1 · ef4154b1
Commit ef4154b1 authored Mar 16, 2018 by Howard McLauchlan
Hide whitespace changes
Inline Side-by-side

Showing with 581 additions and 0 deletions

README.md README.md +1 -0

man/man8/inject.8 man/man8/inject.8 +47 -0

tools/inject.py tools/inject.py +452 -0

tools/inject_example.txt tools/inject_example.txt +81 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -113,6 +113,7 @@ pair of .c and .py files, and some are directories of files.
 - tools/[funcslower](tools/funcslower.py): Trace slow kernel or user function calls. [Examples](tools/funcslower_example.txt).
 - tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
 - tools/[hardirqs](tools/hardirqs.py):  Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt).
+- tools/[inject](tools/inject.py): Targeted error injection with call chain and predicates [Examples](tools/inject_example.txt).
 - tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
 - tools/[llcstat](tools/llcstat.py): Summarize CPU cache references and misses by process. [Examples](tools/llcstat_example.txt).
 - tools/[mdflush](tools/mdflush.py): Trace md flush events. [Examples](tools/mdflush_example.txt).

--- a/man/man8/inject.8
+++ b/man/man8/inject.8
+.TH inject 8  "2018-03-16" "USER COMMANDS"
+.SH NAME
+inject \- injects appropriate error into function if input call chain and
+predicates are satisfied. Uses Linux eBPF/bcc.
+.SH SYNOPSIS
+.B trace -h [-I header] [-v]
+.SH DESCRIPTION
+inject injects errors into specified kernel functionality when a given call
+chain and associated predicates are satsified.
+
+This makes use of a Linux 4.16 feature (bpf_override_return())
+
+Additionally, use of the kmalloc failure mode is only possible with 
+
+	commit f7174d08a5fc ("mm: make should_failslab always available for
+	fault injection")
+
+which is in mm-tree but not yet in mainline (as of 4.16-rc5).
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF, CONFIG_BPF_KPROBE_OVERRIDE, bcc
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-v
+Display the generated BPF program, for debugging or modification.
+.TP
+\-I header
+Necessary headers to be included.
+.SH EXAMPLES
+Please see inject_example.txt
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Howard McLauchlan
--- a/tools/inject.py
+++ b/tools/inject.py
+#!/usr/bin/env python3
+#
+# This script generates a BPF program with structure inspired by trace.py. The
+# generated program operates on PID-indexed stacks. Generally speaking,
+# bookkeeping is done at every intermediate function kprobe/kretprobe to enforce
+# the goal of "fail iff this call chain and these predicates".
+#
+# Top level functions(the ones at the end of the call chain) are responsible for
+# creating the pid_struct and deleting it from the map in kprobe and kretprobe
+# respectively.
+#
+# Intermediate functions(between should_fail_whatever and the top level
+# functions) are responsible for updating the stack to indicate "I have been
+# called and one of my predicate(s) passed" in their entry probes. In their exit
+# probes, they do the opposite, popping their stack to maintain correctness.
+# This implementation aims to ensure correctness in edge cases like recursive
+# calls, so there's some additional information stored in pid_struct for that.
+#
+# At the bottom level function(should_fail_whatever), we do a simple check to
+# ensure all necessary calls/predicates have passed before error injection.
+#
+# Note: presently there are a few hacks to get around various rewriter/verifier
+# issues.
+#
+# Note: this tool requires(as of v4.16-rc5):
+# - commit f7174d08a5fc ("mm: make should_failslab always available for fault
+# injection")
+# - CONFIG_BPF_KPROBE_OVERRIDE
+#
+# USAGE: inject [-h] [-I header] [-v]
+#
+# Copyright (c) 2018 Facebook, Inc.
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 16-Mar-2018   Howard McLauchlan   Created this.
+
+import argparse
+from bcc import BPF
+
+
+class Probe:
+    errno_mapping = {
+        "kmalloc": "-ENOMEM",
+        "bio": "-EIO",
+    }
+
+    @classmethod
+    def configure(cls, mode):
+        cls.mode = mode
+
+    def __init__(self, func, preds, length, entry):
+        # length of call chain
+        self.length = length
+        self.func = func
+        self.preds = preds
+        self.is_entry = entry
+
+    def _bail(self, err):
+        raise ValueError("error in probe '%s': %s" %
+                (self.spec, err))
+
+    def _get_err(self):
+        return Probe.errno_mapping[Probe.mode]
+
+    def _get_if_top(self):
+        # ordering guarantees that if this function is top, the last tup is top
+        chk = self.preds[0][1] == 0
+        if not chk:
+            return ""
+
+        # init the map
+        # dont do an early exit here so the singular case works automatically
+        enter = """
+        /*
+         * Top level function init map
+         */
+        struct pid_struct p_struct = {0, 0};
+        m.insert(&pid, &p_struct);
+        """
+
+        # kill the entry
+        exit = """
+        /*
+         * Top level function clean up map
+         */
+        m.delete(&pid);
+        """
+
+        return enter if self.is_entry else exit
+
+    def _get_heading(self):
+
+        # we need to insert identifier and ctx into self.func
+        # gonna make a lot of formatting assumptions to make this work
+        left = self.func.find("(")
+        right = self.func.rfind(")")
+
+        # self.event and self.func_name need to be accessible
+        self.event = self.func[0:left]
+        self.func_name = self.event + ("_entry" if self.is_entry else "_exit")
+        func_sig = "struct pt_regs *ctx"
+
+        # assume theres something in there, no guarantee its well formed
+        if right > left + 1 and self.is_entry:
+            func_sig += ", " + self.func[left + 1:right]
+
+        return "int %s(%s)" % (self.func_name, func_sig)
+
+    def _get_entry_logic(self):
+        # there is at least one tup(pred, place) for this function
+        text = """
+
+        if (p->conds_met >= %s)
+                return 0;
+        if (p->conds_met == %s && %s) {
+                p->stack[%s] = p->curr_call;
+                p->conds_met++;
+        }"""
+        text = text % (self.length, self.preds[0][1], self.preds[0][0],
+                self.preds[0][1])
+
+        # for each additional pred
+        for tup in self.preds[1:]:
+            text += """
+        else if (p->conds_met == %s && %s) {
+                p->stack[%s] = p->curr_call;
+                p->conds_met++;
+        }
+            """ % (tup[1], tup[0], tup[1])
+        return text
+
+    def _generate_entry(self):
+        prog = self._get_heading() + """
+{
+        u32 pid = bpf_get_current_pid_tgid();
+        %s
+
+        struct pid_struct *p = m.lookup(&pid);
+
+        if (!p)
+                return 0;
+
+        /*
+         * preparation for predicate, if necessary
+         */
+         %s
+        /*
+         * Generate entry logic
+         */
+        %s
+
+        p->curr_call++;
+
+        return 0;
+}"""
+
+        prog = prog % (self._get_if_top(), self.prep, self._get_entry_logic())
+        return prog
+
+    # only need to check top of stack
+    def _get_exit_logic(self):
+        text = """
+        if (p->conds_met < 1 || p->conds_met >= %s)
+                return 0;
+
+        if (p->stack[p->conds_met - 1] == p->curr_call)
+                p->conds_met--;
+        """
+        return text % str(self.length + 1)
+
+    def _generate_exit(self):
+        prog = self._get_heading() + """
+{
+        u32 pid = bpf_get_current_pid_tgid();
+
+        struct pid_struct *p = m.lookup(&pid);
+
+        if (!p)
+                return 0;
+
+        p->curr_call--;
+
+        /*
+         * Generate exit logic
+         */
+        %s
+        %s
+        return 0;
+}"""
+
+        prog = prog % (self._get_exit_logic(), self._get_if_top())
+
+        return prog
+
+    # Special case for should_fail_whatever
+    def _generate_bottom(self):
+        pred = self.preds[0][0]
+        text = self._get_heading() + """
+{
+        /*
+         * preparation for predicate, if necessary
+         */
+         %s
+        /*
+         * If this is the only call in the chain and predicate passes
+         */
+        if (%s == 1 && %s) {
+                bpf_override_return(ctx, %s);
+                return 0;
+        }
+        u32 pid = bpf_get_current_pid_tgid();
+
+        struct pid_struct *p = m.lookup(&pid);
+
+        if (!p)
+                return 0;
+
+        /*
+         * If all conds have been met and predicate passes
+         */
+        if (p->conds_met == %s && %s)
+                bpf_override_return(ctx, %s);
+        return 0;
+}""" % (self.prep, self.length, pred, self._get_err(), self.length - 1, pred,
+            self._get_err())
+        return text
+
+    # presently parses and replaces STRCMP
+    # STRCMP exists because string comparison is inconvenient and somewhat buggy
+    # https://github.com/iovisor/bcc/issues/1617
+    def _prepare_pred(self):
+        self.prep = ""
+        for i in range(len(self.preds)):
+            new_pred = ""
+            pred = self.preds[i][0]
+            place = self.preds[i][1]
+            start, ind = 0, 0
+            while start < len(pred):
+                ind = pred.find("STRCMP(", start)
+                if ind == -1:
+                    break
+                new_pred += pred[start:ind]
+                # 7 is len("STRCMP(")
+                start = pred.find(")", start + 7) + 1
+
+                # then ind ... start is STRCMP(...)
+                ptr, literal = pred[ind + 7:start - 1].split(",")
+                literal = literal.strip()
+
+                # x->y->z, some string literal
+                # we make unique id with place_ind
+                uuid = "%s_%s" % (place, ind)
+                unique_bool = "is_true_%s" % uuid
+                self.prep += """
+        char *str_%s = %s;
+        bool %s = true;\n""" % (uuid, ptr.strip(), unique_bool)
+
+                check = "\t%s &= *(str_%s++) == '%%s';\n" % (unique_bool, uuid)
+
+                for ch in literal:
+                    self.prep += check % ch
+                self.prep += check % r'\0'
+                new_pred += unique_bool
+
+            new_pred += pred[start:]
+            self.preds[i] = (new_pred, place)
+
+    def generate_program(self):
+        # generate code to work around various rewriter issues
+        self._prepare_pred()
+
+        # special case for bottom
+        if self.preds[-1][1] == self.length - 1:
+            return self._generate_bottom()
+
+        return self._generate_entry() if self.is_entry else self._generate_exit()
+
+    def attach(self, bpf):
+        if self.is_entry:
+            bpf.attach_kprobe(event=self.event,
+                    fn_name=self.func_name)
+        else:
+            bpf.attach_kretprobe(event=self.event,
+                    fn_name=self.func_name)
+
+
+class Tool:
+    # add cases as necessary
+    error_injection_mapping = {
+        "kmalloc": "should_failslab(struct kmem_cache *s, gfp_t gfpflags)",
+        "bio": "should_fail_bio(struct bio *bio)",
+    }
+
+    def __init__(self):
+        parser = argparse.ArgumentParser(description="Fail specified kernel" +
+                " functionality when call chain and predicates are met",
+                formatter_class=argparse.RawDescriptionHelpFormatter)
+        parser.add_argument(metavar="mode", dest="mode",
+                help="indicate which base kernel function to fail")
+        parser.add_argument(metavar="spec", dest="spec",
+                help="specify call chain")
+        parser.add_argument("-I", "--include", action="append",
+                metavar="header",
+                help="additional header files to include in the BPF program")
+        parser.add_argument("-v", "--verbose", action="store_true",
+            help="print BPF program")
+        self.args = parser.parse_args()
+
+        self.program = ""
+        self.spec = self.args.spec
+        self.map = {}
+        self.probes = []
+        self.key = Tool.error_injection_mapping[self.args.mode]
+
+    # create_probes and associated stuff
+    def _create_probes(self):
+        self._parse_spec()
+        Probe.configure(self.args.mode)
+        # self, func, preds, total, entry
+
+        # create all the pair probes
+        for fx, preds in self.map.items():
+
+            # do the enter
+            self.probes.append(Probe(fx, preds, self.length, True))
+
+            if self.key == fx:
+                continue
+
+            # do the exit
+            self.probes.append(Probe(fx, preds, self.length, False))
+
+    def _parse_frames(self):
+        # sentinel
+        data = self.spec + '\0'
+        start, count = 0, 0
+
+        frames = []
+        cur_frame = []
+        i = 0
+
+        while i < len(data):
+            # improper input
+            if count < 0:
+                raise Exception("Check your parentheses")
+            c = data[i]
+            count += c == '('
+            count -= c == ')'
+            if not count:
+                if c == '\0' or (c == '<' and data[i + 1] == '-'):
+                    if len(cur_frame) == 2:
+                        frame = tuple(cur_frame)
+                    elif cur_frame[0][0] == '(':
+                        frame = self.key, cur_frame[0]
+                    else:
+                        frame = cur_frame[0], '(true)'
+                    frames.append(frame)
+                    del cur_frame[:]
+                    i += 1
+                    start = i + 1
+                elif c == ')':
+                    cur_frame.append(data[start:i + 1].strip())
+                    start = i + 1
+            i += 1
+        # improper input
+        if count:
+            raise Exception("Check your parentheses")
+        return frames
+
+    def _parse_spec(self):
+        frames = self._parse_frames()
+        frames.reverse()
+
+        absolute_order = 0
+        for f in frames:
+            # default case
+            func, pred = f[0], f[1]
+
+            if not self._validate_predicate(pred):
+                raise Exception
+            tup = (pred, absolute_order)
+
+            if func not in self.map:
+                self.map[func] = [tup]
+            else:
+                self.map[func].append(tup)
+
+            absolute_order += 1
+
+        if self.key not in self.map:
+            self.map[self.key] = [('(true)', absolute_order)]
+            absolute_order += 1
+
+        self.length = absolute_order
+
+    def _validate_predicate(self, pred):
+
+        if len(pred) > 0 and pred[0] == "(":
+            open = 1
+            for i in range(1, len(pred)):
+                if pred[i] == "(":
+                    open += 1
+                elif pred[i] == ")":
+                    open -= 1
+            if open != 0:
+                # not well formed, break
+                return False
+
+        return True
+
+    def _def_pid_struct(self):
+        text = """
+struct pid_struct {
+    u64 curr_call; /* book keeping to handle recursion */
+    u64 conds_met; /* stack pointer */
+    u64 stack[%s];
+};
+""" % self.length
+        return text
+
+    def _attach_probes(self):
+        self.bpf = BPF(text=self.program)
+        for p in self.probes:
+            p.attach(self.bpf)
+
+    def _generate_program(self):
+        # leave out auto includes for now
+
+        for include in (self.args.include or []):
+            self.program += "#include <%s>\n" % include
+
+        self.program += self._def_pid_struct()
+        self.program += "BPF_HASH(m, u32, struct pid_struct);\n"
+        for p in self.probes:
+            self.program += p.generate_program() + "\n"
+
+        if self.args.verbose:
+            print(self.program)
+
+    def _main_loop(self):
+        while True:
+            self.bpf.perf_buffer_poll()
+
+    def run(self):
+        self._create_probes()
+        self._generate_program()
+        self._attach_probes()
+        self._main_loop()
+
+
+if __name__ == "__main__":
+    Tool().run()
--- a/tools/inject_example.txt
+++ b/tools/inject_example.txt
+Some examples for inject
+
+inject guarantees the appropriate erroneous return of the specified injection
+mode(kmalloc,bio,etc) given a call chain and an optional set of predicates. You
+can also optionally print out the generated BPF program for
+modification/debugging purposes.
+
+For example, suppose you want to fail kmalloc() from mount_subtree() when called
+from btrfs_mount():
+
+# ./inject.py kmalloc -I 'linux/mm.h' -I 'linux/fs.h' -v '(true)<-
+mount_subtree(struct vfsmount *mnt, const char *name) (true) <-
+btrfs_mount(struct file_system_type *fs_type, int flags, const char
+*device_name, void *data)'
+
+The first argument indicates the mode(or what to fail). Appropriate headers
+are specified. The verbosity flag prints the generated program.
+
+Note that btrfs_mount() has no accompanying predicate. In such cases the program
+defaults to (true).
+
+Next, lets say we want to hit one of the BUG_ONs in fs/btrfs. As of 4.16-rc3,
+there is a BUG_ON in btrfs_prepare_close_one_device() at fs/btrfs/volumes.c:1002
+
+To hit this, we can use the following:
+
+# ./inject.py kmalloc -v -I 'linux/mm.h' '(true)<- btrfs_alloc_device(struct
+btrfs_fs_info *fs_info, const u64 *ded, const u8 *uuid)(true)<-
+btrfs_close_devices(struct btrfs_fs_devices *fs_devices)(true)'
+
+While the script was executing, I mounted and unmounted btrfs, causing a
+segfault on umount(since that satisfied the call path indicated). A look at
+dmesg will confirm we successfully hit that BUG_ON and caused a panic.
+
+In general, it's worth noting that the required specificity of the call chain is
+dependent on how much granularity you need. The example above might have
+performed as expected without the intermediate btrfs_alloc_device, but might
+have also done something unexpected(an earlier kmalloc could have failed before
+the one we were targetting).
+
+For hot paths, the approach outlined above isn't enough. If a path is traversed
+very often, we can distinguish distinct calls with function arguments. Let's say
+we want to fail the dentry allocation of a file creatively named 'bananas'. We
+can do the following:
+
+# ./inject.py kmalloc -v -I 'linux/fs.h' '(true) <- d_alloc_parallel(struct
+dentry *parent, const struct qstr *name, wait_queue_head_t *wq)
+(STRCMP(name->name, 'bananas'))'
+
+While this script is executing, any operation that would cause a dentry
+allocation where the name is 'bananas' fails, as expected.
+
+To note, STRCMP is a workaround for some rewriter issues. It will take input of
+the form (x->...->z, 'literal'), and generate some equivalent code that the
+verifier is more friendly about. It's not horribly robust, but works for the
+purposes of making string comparisons a bit easier.
+
+Finally, we briefly demonstrate how to inject bio failures. The mechanism is
+identical, so any information from above will apply.
+
+Let's say we want to fail bio requests when the request is to some specific
+sector. An example use case would be to fail superblock writes in btrfs. For
+btrfs, we know that there must be a superblock at 65536 bytes, or sector 128.
+This allows us to run the following:
+
+# ./inject.py bio -v -I 'linux/blkdev.h'  '(({struct gendisk *d = bio->bi_disk;
+struct disk_part_tbl *tbl = d->part_tbl; struct hd_struct **parts = (void *)tbl +
+sizeof(struct disk_part_tbl); struct hd_struct **partp = parts + bio->bi_partno;
+struct hd_struct *p = *partp; dev_t disk = p->__dev.devt; disk ==
+MKDEV(254,16);}) && bio->bi_iter.bi_sector == 128)'
+
+The predicate in the command above has two parts. The first is a compound
+statement which shortens to "only if the system is btrfs", but is long due
+to rewriter/verifier shenanigans. The major/minor information can be found
+however; I used Python. The second part simply checks the starting
+address of bi_iter. While executing, this script effectively fails superblock
+writes to the superblock at sector 128 without affecting other filesystems.
+
+As an extension to the above, one could easily fail all btrfs superblock writes
+(we only fail the primary) by calculating the sector number of the mirrors and
+amending the predicate accordingly.