Merge branch 'master' into codegen

92218a24 · Alastair Robertson · 8cabf980 · 492e1f86 · 92218a24 · 92218a24
Commit 92218a24 authored Sep 04, 2018 by Alastair Robertson
36 changed files
--- a/INSTALL.md
+++ b/INSTALL.md
@@ -12,7 +12,7 @@ CONFIG_BPF_EVENTS=y
 To use some BPFtrace features, minimum kernel versions are required:
 - 4.1+ - kprobes
 - 4.3+ - uprobes
- 4.6+ - stack traces, count and quantize builtins (use PERCPU maps for accuracy and efficiency)
+- 4.6+ - stack traces, count and hist builtins (use PERCPU maps for accuracy and efficiency)
 - 4.7+ - tracepoints
 - 4.9+ - timers/profiling


--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@

 BPFtrace is a high-level tracing language for Linux enhanced Berkeley Packet Filter (eBPF) available in recent Linux kernels (4.x). BPFtrace uses LLVM as a backend to compile scripts to BPF-bytecode and makes use of [BCC](https://github.com/iovisor/bcc) for interacting with the Linux BPF system, as well as existing Linux tracing capabilities: kernel dynamic tracing (kprobes), user-level dynamic tracing (uprobes), and tracepoints. The BPFtrace language is inspired by awk and C, and predecessor tracers such as DTrace and SystemTap.

-For instructions on building BPFtrace, see [INSTALL.md](INSTALL.md)
+For instructions on building BPFtrace, see [INSTALL.md](INSTALL.md). There is also a [Reference Guide](docs/reference_guide.md) and [One-Liner Tutorial](docs/tutorial_one_liners.md).

 ## Examples

@@ -40,7 +40,7 @@ kprobe:sys_read

 kretprobe:sys_read / @start[tid] /
 {
-  @times = quantize(nsecs - @start[tid]);
+  @times = hist(nsecs - @start[tid]);
  delete(@start[tid]);
 }
 ```
@@ -157,8 +157,20 @@ Attach script to a statically defined tracepoint in the kernel:

 Tracepoints are guaranteed to be stable between kernel versions, unlike kprobes.

-### timers
-Run the script at specified time intervals:
+### software
+Attach script to kernel software events, executing once every provided count or use a default:
+
+`software:faults:100`
+`software:faults:`
+
+### hardware
+Attach script to hardware events (PMCs), executing once every provided count or use a default:
+
+`hardware:cache-references:1000000`
+`hardware:cache-references:`
+
+### profile
+Run the script on all CPUs at specified time intervals:

 `profile:hz:99 { ... }`

@@ -168,6 +180,13 @@ Run the script at specified time intervals:

 `profile:us:1500 { ... }`

+### interval
+Run the script once per interval, for printing interval output:
+
+`interval:s:1 { ... }`
+
+`interval:ms:20 { ... }`
+
 ### Multiple attachment points
 A single probe can be attached to multiple events:

@@ -199,13 +218,28 @@ Variables:
 - `arg0`, `arg1`, ... etc. - Arguments to the function being traced
 - `retval` - Return value from function being traced
 - `func` - Name of the function currently being traced
+- `curtask` - Current task_struct as a u64.
+- `rand` - Random number of type u32.

 Functions:
- `quantize(int n)` - Produce a log2 histogram of values of `n`
+- `hist(int n)` - Produce a log2 histogram of values of `n`
+- `lhist(int n, int min, int max, int step)` - Produce a linear histogram of values of `n`
 - `count()` - Count the number of times this function is called
+- `sum(int n)` - Sum this value
+- `min(int n)` - Record the minimum value seen
+- `max(int n)` - Record the maximum value seen
+- `avg(int n)` - Average this value
+- `stats(int n)` - Return the count, average, and total for this value
 - `delete(@x)` - Delete the map element passed in as an argument
 - `str(char *s)` - Returns the string pointed to by `s`
- `printf(char *fmt, ...)` - Write to stdout
+- `printf(char *fmt, ...)` - Print formatted to stdout
+- `print(@x[, int top [, int div]])` - Print a map, with optional top entry count and divisor
+- `clear(@x)` - Delet all key/values from a map
 - `sym(void *p)` - Resolve kernel address
 - `usym(void *p)` - Resolve user space address (incomplete)
 - `reg(char *name)` - Returns the value stored in the named register
+- `join(char *arr[])` - Prints the string array
+- `time(char *fmt)` - Print the current time
+- `exit()` - Quit bpftrace
+
+See the [Reference Guide](docs/reference_guide.md) for more detail.
--- a/docs/reference_guide.md
+++ b/docs/reference_guide.md
+# bpftrace Reference Guide
+
+For a reference summary, see the [README.md](../README.md) for the sections on [Probe types](../README.md#probe-types) and [Builtins](../README.md#builtins).
+
+This is a work in progress. If something is missing or incomplete, check the bpftrace source to see if these docs are just out of date. And if you find something, please file an issue or pull request to update these docs.
+
+## Contents
+
+- [Terminology](#terminology)
+- [Language](#language)
+    - [1. `{...}`: Action Blocks](#1--action-blocks)
+    - [2. `/.../`: Filtering](#2--filtering)
+    - [3. `//`, `/*`: Comments](#3---comments)
+    - [4. `->`: C Struct Navigation](#4---c-struct-navigation)
+- [Probes](#probes)
+    - [1. `kprobe`/`kretprobe`: Dynamic Tracing, Kernel-Level](#1-kprobekretprobe-dynamic-tracing-kernel-level)
+    - [2. `kprobe`/`kretprobe`: Dynamic Tracing, Kernel-Level Arguments](#2-kprobekretprobe-dynamic-tracing-kernel-level-arguments)
+    - [3. `uprobe`/`uretprobe`: Dynamic Tracing, User-Level](#3-uprobeuretprobe-dynamic-tracing-user-level)
+    - [4. `uprobe`/`uretprobe`: Dynamic Tracing, User-Level Arguments](#4-uprobeuretprobe-dynamic-tracing-user-level-arguments)
+    - [5. `tracepoint`: Static Tracing, Kernel-Level](#5-tracepoint-static-tracing-kernel-level)
+    - [6. `tracepoint`: Static Tracing, Kernel-Level Arguments](#6-tracepoint-static-tracing-kernel-level-arguments)
+    - [7. `usdt`: Static Tracing, User-Level](#7-usdt-static-tracing-user-level)
+    - [8. `usdt`: Static Tracing, User-Level Arguments](#8-usdt-static-tracing-user-level-arguments)
+    - [9. `profile`: Timed Sampling Events](#9-profile-timed-sampling-events)
+    - [10. `software`: Pre-defined Software Events](#10-software-pre-defined-software-events)
+    - [11. `hardware`: Pre-defined Hardware Events](#11-hardware-pre-defined-hardware-events)
+- [Variables](#variables)
+    - [1. Builtins](#1-builtins)
+    - [2. `@`, `$`: Basic Variables](#2---basic-variables)
+    - [3. `@[]`: Associative Arrays](#3--associative-arrays)
+    - [4. `count()`: Frequency Counting](#4-count-frequency-counting)
+    - [5. `hist()`, `lhist()`: Histograms](#5-hist-lhist-histograms)
+    - [6. `nsecs`: Timestamps and Time Deltas](#6-nsecs-timestamps-and-time-deltas)
+    - [7. `stack`: Stack Traces, Kernel](#7-stack-stack-traces-kernel)
+    - [8. `ustack`: Stack Traces, User](#8-ustack-stack-traces-user)
+- [Functions](#functions)
+    - [1. Builtins](#1-builtins2)
+    - [2. `printf()`: Print Formatted](#2-printf-print-formatted)
+    - [3. `time()`: Time](#3-time-time)
+    - [4. `join()`: Join](#4-join-join)
+    - [5. `str()`: Strings](#5-str-strings)
+    - [6. `sym()`: Symbol Resolution, Kernel-Level](#6-str-symbol-resolution-kernel-level)
+    - [7. `usym()`: Symbol Resolution, User-Level](#7-usym-symbol-resolution-user-level)
+    - [8. `reg()`: Registers](#8-reg-registers)
+    - [9. `exit()`: Exit](#9-exit-exit)
+- [Map Functions](#map-functions)
+    - [1. Builtins](#1-builtins3)
+    - [2. `count()`: Count](#2-count-count)
+    - [3. `sum()`: Sum](#3-sum-sum)
+    - [4. `avg()`: Average](#4-avg-average)
+    - [5. `min()`: Minimum](#5-min-minimum)
+    - [6. `max()`: Maximum](#6-max-maximum)
+    - [7. `stats()`: Stats](#7-stats-stats)
+    - [8. `hist()`: Log2 Histogram](#8-hist-log2-histogram)
+    - [9. `lhist()`: Linear Histogram](#9-lhist-linear-histogram)
+    - [10. `print()`: Print Map](#10-print-print-map)
+- [Output](#output)
+    - [1. `printf()`: Per-Event Output](#1-printf-per-event-output)
+    - [2. `interval`: Interval Output](#2-interval-interval-output)
+    - [3. `hist()`, `printf()`: Histogram Printing](#3-hist-printf-histogram-printing)
+- [Errors](#errors)
+
+# Terminology
+
+Term | Description
+---- | -----------
+BPF | Berkely Packet Filter: a kernel technology originally developed for optimizing the processing of packet filters (eg, tcpdump expressions)
+eBPF | Enhanced BPF: a kernel technology that extends BPF so that it can execute more generic programs on any events, such as the bpftrace programs listed below. It makes use of the BPF sandboxed virtual machine environment. Also note that eBPF is often just referred to as BPF.
+probe | An instrumentation point in software or hardware, that generates events that can execute bpftrace programs.
+static tracing | Hard-coded instrumentation points in code. Since these are fixed, they may be provided as part of a stable API, and documented.
+dynamic tracing | Also known as dynamic instrumentation, this is a technology that can instrument any software event, such as function calls and returns, by live modification of instruction text. Target software usually does not need special capabilities to support dynamic tracing, other than a symbol table that bpftrace can read. Since this instruments all software text, it is not considered a stable API, and the target functions may not be documented outside of their source code.
+tracepoints | A Linux kernel technology for providing static tracing.
+kprobes | A Linux kernel technology for providing dynamic tracing of kernel functions.
+uprobes | A Linux kernel technology for providing dynamic tracing of user-level functions.
+USDT | User Statically-Defined Tracing: static tracing points for user-level software. Some applications support USDT.
+BPF map | A BPF memory object, which is used by bpftrace to create many higher-level objects.
+
+# Language
+
+## 1. `{...}`: Action Blocks
+
+Syntax: `probe[,probe,...] /filter/ { action }`
+
+A bpftrace program can have multiple action blocks. The filter is optional.
+
+Example:
+
+```
+# bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
+Attaching 1 probe...
+opening: /proc/cpuinfo
+opening: /proc/stat
+opening: /proc/diskstats
+opening: /proc/stat
+opening: /proc/vmstat
+[...]
+```
+
+This is a one-liner invocation of bpftrace. The probe is `kprobe:do_sys_open`. When that probe "fires" (the instrumentation event occurred) the action will be executed, which consists of a `print()` statement. Explanations of the probe and action are in the sections that follow.
+
+## 2. `/.../`: Filtering
+
+Syntax: `/filter/`
+
+Filters (also known as predicates) can be added after probe names. The probe still fires, but it will skip the action unless the filter is true.
+
+Examples:
+
+```
+# bpftrace -e 'kprobe:sys_read /arg2 < 16/ { printf("small read: %d byte buffer\n", arg2); }'
+Attaching 1 probe...
+small read: 8 byte buffer
+small read: 8 byte buffer
+small read: 8 byte buffer
+small read: 8 byte buffer
+small read: 8 byte buffer
+small read: 12 byte buffer
+^C
+```
+
+```
+# bpftrace -e 'kprobe:sys_read /comm == "bash"/ { printf("read by %s\n", comm); }'
+Attaching 1 probe...
+read by bash
+read by bash
+read by bash
+read by bash
+^C
+```
+
+## 3. `//`, `/*`: Comments
+
+Syntax
+
+```
+// single-line comment
+
+/*
+ * multi-line comment
+ */
+```
+
+These can be used in bpftrace scripts to document your code.
+
+## 4. `->`: C Struct Navigation
+
+**TODO**: see issue [#31](https://github.com/iovisor/bpftrace/issues/31)
+
+Future example:
+
+```
+bpftrace -e 'kprobe:sys_nanosleep { printf("secs: %d\n", arg0->tv_nsec); }
+```
+
+or
+
+```
+bpftrace -e 'kprobe:sys_nanosleep { printf("secs: %d\n", ((struct timespec *)arg0)->tv_nsec); }'
+```
+
+# Probes
+
+- `kprobe` - kernel function start
+- `kretprobe` - kernel function return
+- `uprobe` - user-level function start
+- `uretprobe` - user-level function return
+- `tracepoint` - kernel static tracepoints
+- `profile` - timed sampling
+
+Some probe types allow wildcards to match multiple probes, eg, `kprobe:SyS_*`.
+
+## 1. `kprobe`/`kretprobe`: Dynamic Tracing, Kernel-Level
+
+Syntax:
+
+```
+kprobe:function_name
+kretprobe:function_name
+```
+
+These use kprobes (a Linux kernel capability). `kprobe` instruments the beginning of a function's execution, and `kretprobe` instruments the end (its return).
+
+Examples:
+
+```
+# bpftrace -e 'kprobe:sys_nanosleep { printf("sleep by %d\n", tid); }'
+Attaching 1 probe...
+sleep by 1396
+sleep by 3669
+sleep by 1396
+sleep by 27662
+sleep by 3669
+^C
+```
+
+## 2. `kprobe`/`kretprobe`: Dynamic Tracing, Kernel-Level Arguments
+
+Syntax: `arg0, arg1, ..., argN`
+
+Arguments can be accessed via these variables names. arg0 is the first argument.
+
+Examples:
+
+```
+# bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'
+Attaching 1 probe...
+opening: /proc/cpuinfo
+opening: /proc/stat
+opening: /proc/diskstats
+opening: /proc/stat
+opening: /proc/vmstat
+[...]
+```
+
+```
+# bpftrace -e 'kprobe:do_sys_open { printf("open flags: %d\n", arg2); }'
+Attaching 1 probe...
+open flags: 557056
+open flags: 32768
+open flags: 32768
+open flags: 32768
+[...]
+```
+
+```
+# bpftrace -e 'kretprobe:do_sys_open { printf("returned: %d\n", retval); }'
+Attaching 1 probe...
+returned: 8
+returned: 21
+returned: -2
+returned: 21
+[...]
+```
+
+## 3. `uprobe`/`uretprobe`: Dynamic Tracing, User-Level
+
+Syntax:
+
+```
+uprobe:library_name:function_name
+uretprobe:library_name:function_name
+```
+
+These use uprobes (a Linux kernel capability). `uprobe` instruments the beginning of a user-level function's execution, and `uretprobe` instruments the end (its return).
+
+Examples:
+
+```
+# bpftrace -e 'uretprobe:/bin/bash:readline { printf("read a line\n"); }'
+Attaching 1 probe...
+read a line
+read a line
+read a line
+read a line
+^C
+```
+
+## 4. `uprobe`/`uretprobe`: Dynamic Tracing, User-Level Arguments
+
+Syntax: `arg0, arg1, ..., argN`
+
+Arguments can be accessed via these variables names. arg0 is the first argument.
+
+Examples:
+
+```
+# bpftrace -e 'uprobe:/bin/bash:readline { printf("arg0: %d\n", arg0); }'
+Attaching 1 probe...
+arg0: 19755784
+arg0: 19755016
+arg0: 19755784
+^C
+```
+
+```
+# bpftrace -e 'uprobe:/lib/x86_64-linux-gnu/libc-2.23.so:fopen { printf("fopen: %s\n", str(arg0)); }'
+Attaching 1 probe...
+fopen: /proc/filesystems
+fopen: /usr/share/locale/locale.alias
+fopen: /proc/self/mountinfo
+^C
+```
+
+```
+# bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: \"%s\"\n", str(retval)); }'
+Attaching 1 probe...
+readline: "echo hi"
+readline: "ls -l"
+readline: "date"
+readline: "uname -r"
+^C
+```
+
+## 5. `tracepoint`: Static Tracing, Kernel-Level
+
+Syntax: `tracepoint:name`
+
+These use tracepoints (a Linux kernel capability).
+
+```
+# bpftrace -e 'tracepoint:block:block_rq_insert { printf("block I/O created by %d\n", tid); }'
+Attaching 1 probe...
+block I/O created by 28922
+block I/O created by 3949
+block I/O created by 883
+block I/O created by 28941
+block I/O created by 28941
+block I/O created by 28941
+[...]
+```
+
+## 6. `tracepoint`: Static Tracing, Kernel-Level Arguments
+
+**TODO**: see issue [#32](https://github.com/iovisor/bpftrace/issues/32)
+
+Future examples:
+
+```
+bpftrace -e 'tracepoint:block:block_rq_insert { printf("sectors: %d\n", args->nr_sector); }'
+```
+
+## 7. `usdt`: Static Tracing, User-Level
+
+Syntax:
+
+```
+usdt:binary_path:probe_name
+usdt:library_path:probe_name
+```
+
+Examples:
+
+```
+# bpftrace -e 'usdt:/root/tick:loop { printf("hi\n"); }'
+Attaching 1 probe...
+hi
+hi
+hi
+hi
+hi
+^C
+```
+
+## 8. `usdt`: Static Tracing, User-Level Arguments
+
+**TODO**: see issue [#33](https://github.com/iovisor/bpftrace/issues/33)
+
+Future example:
+
+```
+bpftrace -e 'usdt:pthread:pthread_create /arg4 != 0/ { printf("created thread\n"); }'
+```
+
+## 9. `profile`: Timed Sampling Events
+
+Syntax:
+
+```
+profile:hz:rate
+profile:s:rate
+profile:ms:rate
+profile:us:rate
+```
+
+These operating using perf_events (a Linux kernel facility), which is also used by the `perf` command).
+
+Examples:
+
+```
+# bpftrace -e 'profile:hz:99 { @[tid] = count(); }'
+Attaching 1 probe...
+^C
+
+@[32586]: 98
+@[0]: 579
+```
+
+## 10. `software`: Pre-defined Software Events
+
+Syntax:
+
+```
+software:event_name:count
+software:event_name:
+```
+
+These are the pre-defined software events provided by the Linux kernel, as commonly traced via the perf utility. They are similar to tracepoints, but there is only about a dozen of these, and they are documented in the perf\_event\_open(2) man page. The event names are:
+
+- `cpu-clock` or `cpu`
+- `task-clock`
+- `page-faults` or `faults`
+- `context-switches` or `cs`
+- `cpu-migrations`
+- `minor-faults`
+- `major-faults`
+- `alignment-faults`
+- `emulation-faults`
+- `dummy`
+- `bpf-output`
+
+The count is the trigger for the probe, which will fire once for every count events. If the count is not provided, a default is used.
+
+Examples:
+
+```
+# bpftrace -e 'software:faults:100 { @[comm] = count(); }'
+Attaching 1 probe...
+^C
+
+@[ls]: 1
+@[pager]: 2
+@[locale]: 2
+@[preconv]: 2
+@[sh]: 3
+@[tbl]: 3
+@[bash]: 4
+@[groff]: 5
+@[grotty]: 7
+@[sleep]: 9
+@[nroff]: 12
+@[troff]: 18
+@[man]: 97
+```
+
+This roughly counts who is causing page faults, by sampling the process name for every one in one hundred faults.
+
+## 11. `hardware`: Pre-defined Hardware Events
+
+Syntax:
+
+```
+hardware:event_name:count
+hardware:event_name:
+```
+
+These are the pre-defined hardware events provided by the Linux kernel, as commonly traced by the perf utility. They are implemented using performance monitoring counters (PMCs): hardware resources on the processor. There are about ten of these, and they are documented in the perf\_event\_open(2) man page. The event names are:
+
+- `cpu-cycles` or `cycles`
+- `instructions`
+- `cache-references`
+- `cache-misses`
+- `branch-instructions` or `branches`
+- `bus-cycles`
+- `frontend-stalls`
+- `backend-stalls`
+- `ref-cycles`
+
+The count is the trigger for the probe, which will fire once for every count events. If the count is not provided, a default is used.
+
+Examples:
+
+```
+bpftrace -e 'hardware:cache-misses:1000000 { @[pid] = count(); }'
+```
+
+That would fire once for every 1000000 cache misses. This usually indicates the last level cache (LLC).
+
+# Variables
+
+## 1. Builtins
+
+- `pid` - Process ID (kernel tgid)
+- `tid` - Thread ID (kernel pid)
+- `uid` - User ID
+- `gid` - Group ID
+- `nsecs` - Nanosecond timestamp
+- `cpu` - Processor ID
+- `comm` - Process name
+- `stack` - Kernel stack trace
+- `ustack` - User stack trace
+- `arg0`, `arg1`, ..., `argN`. - Arguments to the traced function
+- `retval` - Return value from traced function
+- `func` - Name of the traced function
+- `name` - Full name of the probe
+- `curtask` - Current task struct as a u64
+- `rand` - Random number as a u32
+
+Many of these are discussed in other sections (use search).
+
+## 2. `@`, `$`: Basic variables
+
+Syntax:
+
+```
+@global_name
+@thread_local_variable_name[tid]
+$scratch_name
+```
+
+bpftrace supports global & per-thread variables (via BPF maps), and scratch variables.
+
+Examples:
+
+### 2.1. Global
+
+Syntax: `@name`
+
+For example, `@start`:
+
+```
+# bpftrace -e 'BEGIN { @start = nsecs; }
+    kprobe:sys_nanosleep /@start != 0/ { printf("at %d ms: sleep\n", (nsecs - @start) / 1000000); }'
+Attaching 2 probes...
+at 437 ms: sleep
+at 647 ms: sleep
+at 1098 ms: sleep
+at 1438 ms: sleep
+at 1648 ms: sleep
+^C
+
+@start: 4064438886907216
+```
+
+### 2.2. Per-Thread:
+
+These can be implemented as an associative array keyed on the thread ID. For example, `@start[tid]`:
+
+```
+# bpftrace -e 'kprobe:sys_nanosleep { @start[tid] = nsecs; }
+    kretprobe:sys_nanosleep /@start[tid] != 0/ { printf("slept for %d ms\n", (nsecs - @start[tid]) / 1000000); delete(@start[tid]); }'
+Attaching 2 probes...
+slept for 1000 ms
+slept for 1000 ms
+slept for 1000 ms
+slept for 1009 ms
+slept for 2002 ms
+[...]
+```
+
+### 2.3. Scratch:
+
+Syntax: `$name`
+
+For example, `$delta`:
+
+```
+# bpftrace -e 'kprobe:sys_nanosleep { @start[tid] = nsecs; }
+    kretprobe:sys_nanosleep /@start[tid] != 0/ { $delta = nsecs - @start[tid]; printf("slept for %d ms\n", $delta / 1000000); delete(@start[tid]); }'
+Attaching 2 probes...
+slept for 1000 ms
+slept for 1000 ms
+slept for 1000 ms
+```
+
+## 3. `@[]`: Associative Arrays
+
+Syntax: `@associative_array_name[key_name] = value`
+
+These are implemented using BPF maps.
+
+For example, `@start[tid]`:
+
+```
+# bpftrace -e 'kprobe:sys_nanosleep { @start[tid] = nsecs; }
+    kretprobe:sys_nanosleep /@start[tid] != 0/ { printf("slept for %d ms\n", (nsecs - @start[tid]) / 1000000); delete(@start[tid]); }'
+Attaching 2 probes...
+slept for 1000 ms
+slept for 1000 ms
+slept for 1000 ms
+[...]
+```
+
+## 4. `count()`: Frequency Counting
+
+This is provided by the count() function: see the [Count](#2-count) section.
+
+## 5. `hist()`, `lhist()`: Histograms
+
+These are provided by the hist() and lhist() functions. See the [Log2 Histogram](#8-log2-histogram) and [Linear Histogram](#9-linear-histogram) sections.
+
+## 6. `nsecs`: Timestamps and Time Deltas
+
+Syntax: `nsecs`
+
+These are implemented using bpf_ktime_get_ns().
+
+Examples:
+
+```
+# bpftrace -e 'BEGIN { @start = nsecs; }
+    kprobe:sys_nanosleep /@start != 0/ { printf("at %d ms: sleep\n", (nsecs - @start) / 1000000); }'
+Attaching 2 probes...
+at 437 ms: sleep
+at 647 ms: sleep
+at 1098 ms: sleep
+at 1438 ms: sleep
+^C
+```
+
+## 7. `stack`: Stack Traces, Kernel
+
+Syntax: `stack`
+
+These are implemented using BPF stack maps.
+
+Examples:
+
+```
+# bpftrace -e 'kprobe:ip_output { @[stack] = count(); }'
+Attaching 1 probe...
+[...]
+@[
+ip_output+1
+tcp_transmit_skb+1308
+tcp_write_xmit+482
+tcp_release_cb+225
+release_sock+64
+tcp_sendmsg+49
+sock_sendmsg+48
+sock_write_iter+135
+__vfs_write+247
+vfs_write+179
+sys_write+82
+entry_SYSCALL_64_fastpath+30
+]: 1708
+@[
+ip_output+1
+tcp_transmit_skb+1308
+tcp_write_xmit+482
+__tcp_push_pending_frames+45
+tcp_sendmsg_locked+2637
+tcp_sendmsg+39
+sock_sendmsg+48
+sock_write_iter+135
+__vfs_write+247
+vfs_write+179
+sys_write+82
+entry_SYSCALL_64_fastpath+30
+]: 9048
+@[
+ip_output+1
+tcp_transmit_skb+1308
+tcp_write_xmit+482
+tcp_tasklet_func+348
+tasklet_action+241
+__do_softirq+239
+irq_exit+174
+do_IRQ+74
+ret_from_intr+0
+cpuidle_enter_state+159
+do_idle+389
+cpu_startup_entry+111
+start_secondary+398
+secondary_startup_64+165
+]: 11430
+```
+
+## 8. `ustack`: Stack Traces, User
+
+Syntax: `ustack`
+
+These are implemented using BPF stack maps.
+
+Examples:
+
+```
+# bpftrace -e 'kprobe:do_sys_open /comm == "bash"/ { @[ustack] = count(); }'
+Attaching 1 probe...
+^C
+
+@[
+__open_nocancel+65
+command_word_completion_function+3604
+rl_completion_matches+370
+bash_default_completion+540
+attempt_shell_completion+2092
+gen_completion_matches+82
+rl_complete_internal+288
+rl_complete+145
+_rl_dispatch_subseq+647
+_rl_dispatch+44
+readline_internal_char+479
+readline_internal_charloop+22
+readline_internal+23
+readline+91
+yy_readline_get+152
+yy_readline_get+429
+yy_getc+13
+shell_getc+469
+read_token+251
+yylex+192
+yyparse+777
+parse_command+126
+read_command+207
+reader_loop+391
+main+2409
+__libc_start_main+231
+0x61ce258d4c544155
+]: 9
+@[
+__open_nocancel+65
+command_word_completion_function+3604
+rl_completion_matches+370
+bash_default_completion+540
+attempt_shell_completion+2092
+gen_completion_matches+82
+rl_complete_internal+288
+rl_complete+89
+_rl_dispatch_subseq+647
+_rl_dispatch+44
+readline_internal_char+479
+readline_internal_charloop+22
+readline_internal+23
+readline+91
+yy_readline_get+152
+yy_readline_get+429
+yy_getc+13
+shell_getc+469
+read_token+251
+yylex+192
+yyparse+777
+parse_command+126
+read_command+207
+reader_loop+391
+main+2409
+__libc_start_main+231
+0x61ce258d4c544155
+]: 18
+```
+
+Note that for this example to work, bash had to be recompiled with frame pointers.
+
+# Functions
+
+## 1. Builtins
+
+- `printf(char *fmt, ...)` - Print formatted
+- `time(char *fmt)` - Print formatted time
+- `join(char *arr[])` - Print the array
+- `str(char *s)` - Returns the string pointed to by s
+- `sym(void *p)` - Resolve kernel address
+- `usym(void *p)` - Resolve user space address (incomplete)
+- `reg(char *name)` - Returns the value stored in the named register
+- `exit()` - Quit bpftrace
+
+Some of these are asynchronous: the kernel queues the event, but some time later (milliseconds) it is processed in user-space. The asynchronous actions are: <tt>printf()</tt>, <tt>time()</tt>, and <tt>join()</tt>. Both <tt>sym()</tt> and <tt>usym()</tt>, as well as the variables <tt>stack</tt> and </tt>ustack</tt>, record addresses synchronously, but then do symbol translation asynchronously.
+
+A selection of these are discussed in the following sections.
+
+## 2. `printf()`: Printing
+
+Syntax: `printf(fmt, args)`
+
+This behaves like printf() from C and other languages, with a limited set of format characters. Example:
+
+```
+# bpftrace -e 'kprobe:sys_execve { printf("%s called %s\n", comm, str(arg0)); }'
+Attaching 1 probe...
+bash called /bin/ls
+bash called /usr/bin/man
+man called /apps/nflx-bash-utils/bin/preconv
+man called /usr/local/sbin/preconv
+man called /usr/local/bin/preconv
+man called /usr/sbin/preconv
+man called /usr/bin/preconv
+man called /apps/nflx-bash-utils/bin/tbl
+[...]
+```
+
+## 3. `time()`: Time
+
+Syntax: `time(fmt)`
+
+This prints the current time using the format string supported by libc `strftime(3)`.
+
+```
+# bpftrace -e 'kprobe:sys_nanosleep { time("%H:%M:%S"); }'
+07:11:03
+07:11:09
+^C
+```
+
+If a format string is not provided, it defaults to "%H:%M:%S".
+
+## 4. `join()`: Join
+
+Syntax: `join(char *arr[])`
+
+This joins the array of strings with a space character, and prints it out. This current version does not return a string, so it cannot be used as an argument in printf(). Example:
+
+```
+# bpftrace -e 'kprobe:sys_execve { join(arg1); }'
+Attaching 1 probe...
+ls --color=auto
+man ls
+preconv -e UTF-8
+preconv -e UTF-8
+preconv -e UTF-8
+preconv -e UTF-8
+preconv -e UTF-8
+tbl
+[...]
+```
+
+## 5. `str()`: Strings
+
+Syntax: `str(char *s)`
+
+Returns the string pointer to by s. This was used in the earlier printf() example, since arg0 to sys_execve() is <tt>const char *filename</tt>:
+
+```
+# bpftrace -e 'kprobe:sys_execve { printf("%s called %s\n", comm, str(arg0)); }'
+Attaching 1 probe...
+bash called /bin/ls
+bash called /usr/bin/man
+man called /apps/nflx-bash-utils/bin/preconv
+man called /usr/local/sbin/preconv
+man called /usr/local/bin/preconv
+man called /usr/sbin/preconv
+man called /usr/bin/preconv
+man called /apps/nflx-bash-utils/bin/tbl
+[...]
+```
+
+## 6. `sym()`: Symbol resolution, kernel-level
+
+Syntax: `sym(addr)`
+
+Examples:
+
+```
+# ./build/src/bpftrace -e 'kprobe:sys_nanosleep { printf("%s\n", sym(reg("ip"))); }'
+Attaching 1 probe...
+sys_nanosleep
+sys_nanosleep
+```
+
+## 7. `usym()`: Symbol resolution, user-level
+
+Syntax: `usym(addr)`
+
+Examples:
+
+```
+# bpftrace -e 'uprobe:/bin/bash:readline { printf("%s\n", usym(reg("ip"))); }'
+Attaching 1 probe...
+readline
+readline
+readline
+^C
+```
+
+## 8. `reg()`: Registers
+
+Syntax: `reg(char *name)`
+
+Examples:
+
+```
+# ./src/bpftrace -e 'kprobe:tcp_sendmsg { @[sym(reg("ip"))] = count(); }'
+Attaching 1 probe...
+^C
+
+@[tcp_sendmsg]: 7
+```
+
+See src/arch/x86_64.cpp for the register name list.
+
+## 9. `exit()`: Exit
+
+Syntax: `exit()`
+
+This exits bpftrace, and can be combined with an interval probe to record statistics for a certain duration. Example:
+
+```
+# bpftrace -e 'kprobe:do_sys_open { @opens = count(); } interval:s:1 { exit(); }'
+Attaching 2 probes...
+@opens: 119
+```
+
+# Map Functions
+
+## 1. Builtins
+
+- `count()` - Count the number of times this function is called
+- `sum(int n)` - Sum the value
+- `avg(int n)` - Average the value
+- `min(int n)` - Record the minimum value seen
+- `max(int n)` - Record the maximum value seen
+- `stats(int n)` - Return the count, average, and total for this value
+- `hist(int n)` - Produce a log2 histogram of values of n
+- `lhist(int n, int min, int max, int step)` - Produce a linear histogram of values of n
+- `delete(@x[key])` - Delete the map element passed in as an argument
+- `print(@x[, top [, div]])` - Print the map, optionally the top entries only and with a divisor
+- `clear(@x)` - Delete all keys from the map
+- `zero(@x)` - Set all map values to zero
+
+Some of these are asynchronous: the kernel queues the event, but some time later (milliseconds) it is processed in user-space. The asynchronous actions are: <tt>print()</tt>, <tt>clear()</tt>, and <tt>zero()</tt>.
+
+## 2. `count()`: Count
+
+Syntax: `@counter_name[optional_keys] = count()`
+
+This is implemented using a BPF map.
+
+For example, `@reads`:
+
+```
+# bpftrace -e 'kprobe:sys_read { @reads = count();  }'
+Attaching 1 probe...
+^C
+
+@reads: 119
+```
+
+That shows there were 119 calls to sys_read() while tracing.
+
+This next example includes the `comm` variable as a key, so that the value is broken down by each process name. For example, `@reads[comm]`:
+
+```
+# bpftrace -e 'kprobe:sys_read { @reads[comm] = count(); }'
+Attaching 1 probe...
+^C
+
+@reads[sleep]: 4
+@reads[bash]: 5
+@reads[ls]: 7
+@reads[snmp-pass]: 8
+@reads[snmpd]: 14
+@reads[sshd]: 14
+```
+
+## 3. `sum()`: Sum
+
+Syntax: `@counter_name[optional_keys] = sum(value)`
+
+This is implemented using a BPF map.
+
+For example, `@bytes[comm]`:
+
+```
+# bpftrace -e 'kprobe:sys_read { @bytes[comm] = sum(arg2); }'
+Attaching 1 probe...
+^C
+
+@bytes[bash]: 7
+@bytes[sleep]: 4160
+@bytes[ls]: 6208
+@bytes[snmpd]: 20480
+@bytes[snmp-pass]: 65536
+@bytes[sshd]: 262144
+```
+
+That is summing requested bytes via the sys_read() kernel function, which is one of two possible entry points for the read syscall. To see actual bytes read:
+
+```
+# bpftrace -e 'kretprobe:sys_read /retval > 0/ { @bytes[comm] = sum(retval); }'
+Attaching 1 probe...
+^C
+
+@bytes[bash]: 5
+@bytes[sshd]: 1135
+@bytes[systemd-journal]: 1699
+@bytes[sleep]: 2496
+@bytes[ls]: 4583
+@bytes[snmpd]: 35549
+@bytes[snmp-pass]: 55681
+```
+
+Now a filter is used to ensure the return value was positive before it is used in the sum(). The return value may be negative in cases of error, as is the case with other functions. Remember this whenever using sum() on a retval.
+
+## 4. `avg()`: Average
+
+Syntax: `@counter_name[optional_keys] = avg(value)`
+
+This is implemented using a BPF map.
+
+For example, `@bytes[comm]`:
+
+```
+# bpftrace -e 'kprobe:sys_read { @bytes[comm] = avg(arg2); }'
+Attaching 1 probe...
+^C
+
+@bytes[bash]: 1
+@bytes[sleep]: 832
+@bytes[ls]: 886
+@bytes[snmpd]: 1706
+@bytes[snmp-pass]: 8192
+@bytes[sshd]: 16384
+```
+
+This is averaging the requested read size.
+
+## 5. `min()`: Minimum
+
+Syntax: `@counter_name[optional_keys] = min(value)`
+
+This is implemented using a BPF map.
+
+For example, `@bytes[comm]`:
+
+```
+# bpftrace -e 'kprobe:sys_read { @bytes[comm] = min(arg2); }'
+Attaching 1 probe...
+^C
+
+@bytes[bash]: 1
+@bytes[systemd-journal]: 8
+@bytes[snmpd]: 64
+@bytes[ls]: 832
+@bytes[sleep]: 832
+@bytes[snmp-pass]: 8192
+@bytes[sshd]: 16384
+```
+
+This shows the minimum value seen.
+
+## 6. `max()`: Maximum
+
+Syntax: `@counter_name[optional_keys] = max(value)`
+
+This is implemented using a BPF map.
+
+For example, `@bytes[comm]`:
+
+```
+# bpftrace -e 'kprobe:sys_read { @bytes[comm] = max(arg2); }'
+Attaching 1 probe...
+^C
+
+@bytes[bash]: 1
+@bytes[systemd-journal]: 8
+@bytes[sleep]: 832
+@bytes[ls]: 1024
+@bytes[snmpd]: 4096
+@bytes[snmp-pass]: 8192
+@bytes[sshd]: 16384
+```
+
+This shows the maximum value seen.
+
+## 7. `stats()`: Stats
+
+Syntax: `@counter_name[optional_keys] = stats(value)`
+
+This is implemented using a BPF map.
+
+For example, `@bytes[comm]`:
+
+```
+# bpftrace -e 'kprobe:sys_read { @bytes[comm] = stats(arg2); }'
+Attaching 1 probe...
+^C
+
+@bytes[bash]: count 7, average 1, total 7
+@bytes[sleep]: count 5, average 832, total 4160
+@bytes[ls]: count 7, average 886, total 6208
+@bytes[snmpd]: count 18, average 1706, total 30718
+@bytes[snmp-pass]: count 12, average 8192, total 98304
+@bytes[sshd]: count 15, average 16384, total 245760
+```
+
+This stats() function returns three statistics: the count of events, the average for the argument value, and the total of the argument value. This is similar to using count(), avg(), and sum().
+
+## 8. `hist()`: Log2 Histogram
+
+Syntax:
+
+```
+@histogram_name[optional_key] = hist(value)
+```
+
+This is implemented using a BPF map.
+
+Examples:
+
+### 8.1. Power-Of-2:
+
+```
+# bpftrace -e 'kretprobe:sys_read { @bytes = hist(retval); }'
+Attaching 1 probe...
+^C
+
+@bytes:
+[0, 1]                 7 |@@@@@@@@@@@@@                                       |
+[2, 4)                 3 |@@@@@                                               |
+[4, 8)                 8 |@@@@@@@@@@@@@@                                      |
+[8, 16)                9 |@@@@@@@@@@@@@@@@                                    |
+[16, 32)               0 |                                                    |
+[32, 64)               1 |@                                                   |
+[64, 128)              1 |@                                                   |
+[128, 256)             0 |                                                    |
+[256, 512)             3 |@@@@@                                               |
+[512, 1k)              0 |                                                    |
+[1k, 2k)              12 |@@@@@@@@@@@@@@@@@@@@@@                              |
+[2k, 4k)              28 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+```
+
+### 8.2. Power-Of-2 By Key:
+
+```
+# bpftrace -e 'kretprobe:do_sys_open { @bytes[comm] = hist(retval); }'
+Attaching 1 probe...
+^C
+
+@bytes[snmp-pass]:
+[0, 1]                 0 |                                                    |
+[2, 4)                 0 |                                                    |
+[4, 8)                 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+
+@bytes[ls]:
+[0, 1]                 0 |                                                    |
+[2, 4)                 9 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+
+@bytes[snmpd]:
+[0, 1]                 1 |@@@@                                                |
+[2, 4)                 0 |                                                    |
+[4, 8)                 0 |                                                    |
+[8, 16)                4 |@@@@@@@@@@@@@@@@@@                                  |
+[16, 32)              11 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+```
+
+## 9. `lhist()`: Linear Histogram
+
+Syntax:
+
+```
+@histogram_name[optional_key] = lhist(value, min, max, step)
+```
+
+This is implemented using a BPF map.
+
+Examples:
+
+```
+# bpftrace -e 'kretprobe:sys_read { @bytes = lhist(retval, 0, 10000, 1000); }'
+Attaching 1 probe...
+^C
+
+@bytes:
+(...,0]                0 |                                                    |
+[0, 1000)            480 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+[1000, 2000)          49 |@@@@@                                               |
+[2000, 3000)          12 |@                                                   |
+[3000, 4000)          39 |@@@@                                                |
+[4000, 5000)         267 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@                        |
+[5000, 6000)           0 |                                                    |
+[6000, 7000)           0 |                                                    |
+[7000, 8000)           0 |                                                    |
+[8000, 9000)           0 |                                                    |
+[9000, 10000)          0 |                                                    |
+[10000,...)            0 |                                                    |
+```
+
+## 10. `print()`: Print Map
+
+Syntax: ```print(@map [, top [, divisor]])```
+
+The <tt>print()</tt> function will print a map, similar to the automatic printing when bpftrace ends. Two optional arguments can be provided: a top number, so that only the top number of entries are printed, and a divisor, which divides the value. A couple of examples will explain their use.
+
+As an example of top, tracing the top 5 syscalls via kprobe:SyS_*:
+
+```
+# bpftrace -e 'kprobe:SyS_* { @[func] = count(); } END { print(@, 5); clear(@); }'
+Attaching 345 probes...
+^C
+@[sys_write]: 1827
+@[sys_newfstat]: 8401
+@[sys_close]: 9608
+@[sys_open]: 17453
+@[sys_read]: 26353
+```
+
+The final <tt>clear()</tt> is used to prevent printing the map automatically on exit.
+
+As an example of divisor, summing total time in SyS_read() by process name as milliseconds:
+
+```
+# bpftrace -e 'kprobe:SyS_read { @start[tid] = nsecs; } kretprobe:SyS_read /@start[tid]/ { @ms[pid] = sum(nsecs - @start[tid]); delete(@start[tid]); } END { print(@ms, 1, 1000000); clear(@ms); }'
+```
+
+This one-liner sums the SyS_read() durations as nanoseconds, and then does the division to milliseconds when printing. Without this capability, should one try to divide to milliseconds when summing (eg, <tt>sum((nsecs - @start[tid]) / 1000000)</tt>), the value would often be rounded to zero, and not accumulate as it should.
+
+# Output
+
+## 1. `printf()`: Per-Event Output
+
+Syntax: `printf(char *format, arguments)`
+
+Per-event details can be printed using `print()`.
+
+Examples:
+
+```
+# bpftrace -e 'kprobe:sys_nanosleep { printf("sleep by %d\n", tid); }'
+Attaching 1 probe...
+sleep by 3669
+sleep by 1396
+sleep by 3669
+sleep by 1396
+[...]
+```
+
+## 2. `interval`: Interval output
+
+Syntax: `interval:s:duration_seconds`
+
+Examples:
+
+```
+# bpftrace -e 'kprobe:do_sys_open { @opens = @opens + 1; } interval:s:1 { printf("opens/sec: %d\n", @opens); @opens = 0; }'
+Attaching 2 probes...
+opens/sec: 16
+opens/sec: 2
+opens/sec: 3
+opens/sec: 15
+opens/sec: 8
+opens/sec: 2
+^C
+
+@opens: 2
+```
+
+## 3. `hist()`, `print()`: Histogram Printing
+
+Declared histograms are automatically printed out on program termination. See [5. Histograms](#5-histograms) for declarations.
+
+Examples:
+
+```
+# bpftrace -e 'kretprobe:sys_read { @bytes = hist(retval); }'
+Attaching 1 probe...
+^C
+
+@bytes:
+[0, 1]                 7 |@@@@@@@@@@@@@                                       |
+[2, 4)                 3 |@@@@@                                               |
+[4, 8)                 8 |@@@@@@@@@@@@@@                                      |
+[8, 16)                9 |@@@@@@@@@@@@@@@@                                    |
+[16, 32)               0 |                                                    |
+[32, 64)               1 |@                                                   |
+[64, 128)              1 |@                                                   |
+[128, 256)             0 |                                                    |
+[256, 512)             3 |@@@@@                                               |
+[512, 1k)              0 |                                                    |
+[1k, 2k)              12 |@@@@@@@@@@@@@@@@@@@@@@                              |
+[2k, 4k)              28 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+```
+
+Histograms can also be printed on-demand, using the <tt>print()</tt> function. Eg:
+
+<pre>
+# bpftrace -e 'kretprobe:sys_read { @bytes = hist(retval); } interval:s:1 { print(@bytes); clear(@bytes); }'
+
+[...]
+</pre>
+
+
+# Errors
+
+## 1. Looks like the BPF stack limit of 512 bytes is exceeded
+
+BPF programs that operate on many data items may hit this limit. There are a number of things you can try to stay within the limit:
+
+1. Find ways to reduce the size of the data used in the program. Eg, avoid strings if they are unnecessary: use `pid` instead of `comm`. Use fewer map keys.
+1. Split your program over multiple probes.
+1. Check the status of the BPF stack limit in Linux (it may be increased in the future, maybe as a tuneabe).
+1. (advanced): Run -d and examine the LLVM IR, and look for ways to optimize src/ast/codegen_llvm.cpp.
--- a/docs/tutorial_one_liners.md
+++ b/docs/tutorial_one_liners.md
+The bpftrace One-Liner Tutorial
+
+This teaches you bpftrace for Linux in 12 easy lessons, where each lesson is a one-liner you can try running. This series of one-liners introduces concepts which are summarized as bullet points. For a full reference to bpftrace, see docs/reference_guide.md.
+
+Contributed by Brendan Gregg, Netflix (2018), based on his FreeBSD [DTrace Tutorial](https://wiki.freebsd.org/DTrace/Tutorial).
+
+TODO: lessons 3, 5, 11, and 12, will not work until the struct work is complete (see issues #31, #32, #34).
+
+# Lesson 1. Listing Probes
+
+```
+bpftrace -l 'tracepoint:syscalls:sys_enter_*'
+```
+
+"bpftrace -l" lists all probes, and a search term can be added.
+
+- A probe is an instrumentation point for capturing event data.
+- The supplied search term will do partial matches, and wildcards (file globbing) can be used.
+- "bpftrace -l" can also be piped to grep(1) for full regular expression searching.
+
+# Lesson 2. Hello World
+
+```
+# bpftrace -e 'BEGIN { printf("hello world\n"); }'
+Attaching 1 probe...
+hello world
+^C
+```
+
+This prints a welcome message. Run it, then hit Ctrl-C to end.
+
+- The word `BEGIN` is a special probe that fires at the start of the program (like awk's BEGIN). You can use it to set variables and print headers.
+- An action can be associated with probes, in { }. This example calls printf() when the probe fires.
+
+# Lesson 3. File Opens
+
+```
+# bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'
+Attaching 1 probe...
+snmp-pass /proc/cpuinfo
+snmp-pass /proc/stat
+snmpd /proc/net/dev
+snmpd /proc/net/if_inet6
+^C
+```
+
+This traces file opens as they happen, and we're printing the process name and pathname.
+
+- It begins with the probe `tracepoint:syscalls:sys_enter_open`: this is the tracepoint probe type (kernel static tracing), and is instrumenting when the open() syscall begins (is entered). Tracepoints are preferred over kprobes (kernel dynamic tracing, introduced in lesson 6), since tracepoints have stable API.
+- `comm` is a builtin variable that has the current process's name. Other similar builtins include pid and tid.
+- `arg0` is a builtin variable containing the first probe argument, the meaning of which is defined by the probe type. For `kprobe`, it is the first argument to the function. Other arguments can be accessed as arg1, ..., argN. The sys_open() arguments are: const char *pathname, int flags, mode_t mode (see the open(2) man page). So, arg0 is the pathname pointer.
+- `str()` turns a pointer into the string it points to.
+
+# Lesson 4. Syscall Counts By Process
+
+```
+bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
+Attaching 1 probe...
+^C
+
+@[bpftrace]: 6
+@[systemd]: 24
+@[snmp-pass]: 96
+@[sshd]: 125
+```
+
+This summarizes syscalls by process name, printing a report on Ctrl-C.
+
+- @: This denotes a special variable type called a map, which can store and summarize data in different ways. You can add an optional variable name after the @, eg "@num", either to improve readability, or to differentiate between more than one map.
+- []: The optional brackets allow a key to be set for the map, much like an associative array.
+- count(): This is a map function – the way it is populated. count() counts the number of times it is called. Since this is saved by comm, the result is a frequency count of system calls by process name.
+
+Maps are automatically printed when bpftrace ends (eg, via Ctrl-C).
+
+# Lesson 5. Distribution of read() Bytes
+
+```
+# bpftrace -e 'tracepoint:syscalls:sys_exit_read /pid == 18644/ { @bytes = hist(args->retval); }'
+Attaching 1 probe...
+^C
+
+@bytes:
+[0, 1]                12 |@@@@@@@@@@@@@@@@@@@@                                |
+[2, 4)                18 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                     |
+[4, 8)                 0 |                                                    |
+[8, 16)                0 |                                                    |
+[16, 32)               0 |                                                    |
+[32, 64)              30 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+[64, 128)             19 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                    |
+[128, 256)             1 |@
+```
+
+This summarizes the return value of the sys_read() kernel function for PID 18644, printing it as a histogram.
+
+- /.../: This is a filter (aka predicate), which acts as a filter for the action. The action is only executed if the filtered expression is true, in this case, only for the process ID 18644. Boolean operators are supported ("&&", "||").
+- retval: This is the return value of the function. For sys_read(), this is either -1 (error) or the number of bytes successfully read.
+- @: This is an map similar to the previous lesson, but without any keys ([]) this time, and the name "bytes" which decorates the output.
+- hist(): This is a map function which summarizes the argument as a power-of-2 histogram. The output shows rows that begin with interval notation, where, for example `[128, 256)` means that the value is: 128 <= value < 256. The next number is the count of occurrences, and then an ASCII histogram is printed to visualize that count. The histogram can be used to study multi-modal distributions.
+- Other map functions include lhist() (linear hist), count(), sum(), avg(), min(), and max().
+
+# Lesson 6. Kernel Dynamic Tracing of read() Bytes
+
+```
+# bpftrace -e 'kretprobe:vfs_read { @bytes = lhist(retval, 0, 2000, 200); }'
+Attaching 1 probe...
+^C
+
+@bytes:
+(...,0]                0 |                                                    |
+[0, 200)              66 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+[200, 400)             2 |@                                                   |
+[400, 600)             3 |@@                                                  |
+[600, 800)             0 |                                                    |
+[800, 1000)            5 |@@@                                                 |
+[1000, 1200)           0 |                                                    |
+[1200, 1400)           0 |                                                    |
+[1400, 1600)           0 |                                                    |
+[1600, 1800)           0 |                                                    |
+[1800, 2000)           0 |                                                    |
+[2000,...)            39 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      |
+```
+
+Summarize read() bytes as a linear histogram, and traced using kernel dynamic tracing.
+
+- It begins with the probe `kretprobe:vfs_read`: this is the kretprobe probe type (kernel dynamic tracing of function returns) instrumenting the `vfs_read()` kernel function. There is also the kprobe probe type (shown in the next lesson), to instrument when functions begin execution (are entered). These are powerful probe types, letting you trace tens of thousands of different kernel functions. However, these are "unstable" probe types: since they can trace any kernel function, there is no guarantee that your kprobe/kretprobe will work between kernel versions, as the function names, arguments, return values, and roles may change. Also, since it is tracing the raw kernel, you'll need to browse the kernel source to understand what these probes, arguments, and return values, mean.
+- lhist(): this is a linear histogram, where the arguments are: value, min, max, step. The first argument (`retval`) of vfs_read() is the return value: the number of bytes read.
+
+# Lesson 7. Timing read()s
+
+```
+# bpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; } kretprobe:vfs_read /@start[tid]/ { @ns[comm] = hist(nsecs - @start[tid]); delete(@start[tid]); }'
+Attaching 2 probes...
+
+[...]
+@ns[snmp-pass]:
+[0, 1]                 0 |                                                    |
+[2, 4)                 0 |                                                    |
+[4, 8)                 0 |                                                    |
+[8, 16)                0 |                                                    |
+[16, 32)               0 |                                                    |
+[32, 64)               0 |                                                    |
+[64, 128)              0 |                                                    |
+[128, 256)             0 |                                                    |
+[256, 512)            27 |@@@@@@@@@                                           |
+[512, 1k)            125 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
+[1k, 2k)              22 |@@@@@@@                                             |
+[2k, 4k)               1 |                                                    |
+[4k, 8k)              10 |@@@                                                 |
+[8k, 16k)              1 |                                                    |
+[16k, 32k)             3 |@                                                   |
+[32k, 64k)           144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
+[64k, 128k)            7 |@@                                                  |
+[128k, 256k)          28 |@@@@@@@@@@                                          |
+[256k, 512k)           2 |                                                    |
+[512k, 1M)             3 |@                                                   |
+[1M, 2M)               1 |                                                    |
+```
+
+Summarize the time spent in read(), in nanoseconds, as a histogram, by process name.
+
+- @start[tid]: This uses the thread ID as a key. There may be many reads in-flight, and we want to store a start timestamp to each. How? We could construct a unique identifier for each read, and use that as the key. But because kernel threads can only be executing one syscall at a time, we can use the thread ID as the unique identifier, as each thread cannot be executing more than one.
+- nsecs: Nanoseconds since boot. This is a high resolution timestamp counter than can be used to time events.
+- /@start[tid]/: This filter checks that the start time was seen and recorded. Without this filter, this program may be launched during a read and only catch the end, resulting in a time calculation of now - zero, instead of now - start.
+
+- delete(@start[tid]): this frees the variable.
+
+# Lesson 8. Count Process-Level Events
+
+```
+# bpftrace -e 'tracepoint:sched:sched* { @[name] = count(); } interval:s:5 { exit(); }'
+Attaching 25 probes...
+@[tracepoint:sched:sched_wakeup_new]: 1
+@[tracepoint:sched:sched_process_fork]: 1
+@[tracepoint:sched:sched_process_exec]: 1
+@[tracepoint:sched:sched_process_exit]: 1
+@[tracepoint:sched:sched_process_free]: 2
+@[tracepoint:sched:sched_process_wait]: 7
+@[tracepoint:sched:sched_wake_idle_without_ipi]: 53
+@[tracepoint:sched:sched_stat_runtime]: 212
+@[tracepoint:sched:sched_wakeup]: 253
+@[tracepoint:sched:sched_waking]: 253
+@[tracepoint:sched:sched_switch]: 510
+```
+
+Count process-level events for five seconds, printing a summary.
+
+- sched: The `sched` probe category has high-level scheduler and process events, such as fork, exec, and context switch.
+- name: The full name of the probe.
+- interval:s:5: This is a probe that fires once every 5 seconds, on one CPU only. It is used for creating script-level intervals or timeouts.
+- exit(): This exits bpftrace.
+
+# Lesson 9. Profile On-CPU Kernel Stacks
+
+```
+# bpftrace -e 'profile:hz:99 { @[stack] = count(); }'
+Attaching 1 probe...
+^C
+
+[...]
+@[
+filemap_map_pages+181
+__handle_mm_fault+2905
+handle_mm_fault+250
+__do_page_fault+599
+async_page_fault+69
+]: 12
+[...]
+@[
+cpuidle_enter_state+164
+do_idle+390
+cpu_startup_entry+111
+start_secondary+423
+secondary_startup_64+165
+]: 22122
+```
+
+Profile kernel stacks at 99 Hertz, printing a frequency count.
+
+- profile:hz:99: This fires on all CPUs at 99 Hertz. Why 99 and not 100 or 1000? We want frequent enough to catch both the big and small picture of execution, but not too frequent as to perturb performance. 100 Hertz is enough. But we don't want 100 exactly, as sampling may occur in lockstep with other timed activities, hence 99.
+- stack: Returns the kernel stack trace. This is used as a key for the map, so that it can be frequency counted. The output of this is ideal to be visualized as a flame graph. There is also `ustack` for the user-level stack trace.
+
+# Lesson 10. Scheduler Tracing
+
+```
+# bpftrace -e 'tracepoint:sched:sched_switch { @[stack] = count(); }'
+^C
+[...]
+
+@[
+__schedule+697
+__schedule+697
+schedule+50
+schedule_timeout+365
+xfsaild+274
+kthread+248
+ret_from_fork+53
+]: 73
+@[
+__schedule+697
+__schedule+697
+schedule_idle+40
+do_idle+356
+cpu_startup_entry+111
+start_secondary+423
+secondary_startup_64+165
+]: 305
+```
+
+This counts stack traces that led to context switching (off-CPU) events. The above output has been truncated to show the last two only.
+
+- sched: The sched category has tracepoints for different kernel CPU scheduler events: sched_switch, sched_wakeup, sched_migrate_task, etc.
+- sched_switch: This probe fires when a thread leaves CPU. This will be a blocking event: eg, waiting on I/O, a timer, paging/swapping, or a lock.
+- stack: A kernel stack trace.
+- sched_switch fires in thread context, so that the stack refers to the thread who is leaving. As you use other probe types, pay attention to context, as comm, pid, stack, etc, may not refer to the target of the probe.
+
+# Lesson 11. Block I/O Tracing
+
+```
+bpftrace -e 'tracepoint:block:block_rq_complete { @ = hist(args->nr_sector * 512); }'
+```
+
+Block I/O requests by size in bytes, as a histogram.
+
+- tracepoint:block: The block category of tracepoints traces various block I/O (storage) events.
+- block_rq_complete: This fires when an I/O has completed.
+- args->nr_sector: This is a member from the tracepoint block_rq_complete arguments which shows the size in sectors.
+
+The context of this probe is important: this fires when the device sends the completion interrupt. At this point, the process that submitted the I/O is not on-CPU, therefore, builtins like comm will not show which you might expect.
+
+# Lesson 12. Kernel Struct Tracing
+
+```
+bpftrace -n 'kprobe:blk_account_io_start { @bytes = hist(((struct request *)arg0)->__data_len); }'
+```
+
+Summarize kernel blk_account_io_start() calls with a histogram of the I/O size. This differs from the previous example in that it uses kernel dynamic tracing and fetches the size from a kernel struct.
+
+- kprobe: As mentioned earlier, this is the kernel dynamic tracing probe type, which traces the entry of kernel functions (use kretprobe to trace their returns). 
+- ((struct request *)arg0)->__data_len: this casts arg0 as struct request *, then dereferences the __data_len field.
+
+At this point you understand much of bpftrace, and can begin to use and write powerful one-liners. See the reference guide for more capabilities.
+
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -9,6 +9,7 @@ add_executable(bpftrace
  mapkey.cpp
  printf.cpp
  types.cpp
+  list.cpp
 )

 target_link_libraries(bpftrace arch ast parser resources)

--- a/src/ast/ast.cpp
+++ b/src/ast/ast.cpp
@@ -36,6 +36,10 @@ void Unop::accept(Visitor &v) {
  v.visit(*this);
 }

+void Ternary::accept(Visitor &v) {
+  v.visit(*this);
+}
+
 void FieldAccess::accept(Visitor &v) {
  v.visit(*this);
 }

--- a/src/ast/ast.h
+++ b/src/ast/ast.h
@@ -49,6 +49,7 @@ class Builtin : public Expression {
 public:
  explicit Builtin(std::string ident) : ident(ident) { }
  std::string ident;
+  int name_id;

  void accept(Visitor &v) override;
 };
@@ -161,6 +162,14 @@ public:
  void accept(Visitor &v) override;
 };

+class Ternary : public Expression {
+public:
+  Ternary(Expression *cond, Expression *left, Expression *right) : cond(cond), left(left), right(right) { }
+  Expression *cond, *left, *right;
+
+  void accept(Visitor &v) override;
+};
+
 class AttachPoint : public Node {
 public:
  explicit AttachPoint(const std::string &provider)
@@ -198,6 +207,7 @@ public:

  void accept(Visitor &v) override;
  std::string name() const;
+  bool need_expansion = false;	// must build a BPF program per wildcard match
 };
 using ProbeList = std::vector<Probe *>;

@@ -222,6 +232,7 @@ public:
  virtual void visit(Variable &var) = 0;
  virtual void visit(Binop &binop) = 0;
  virtual void visit(Unop &unop) = 0;
+  virtual void visit(Ternary &ternary) = 0;
  virtual void visit(FieldAccess &acc) = 0;
  virtual void visit(Cast &cast) = 0;
  virtual void visit(ExprStatement &expr) = 0;

--- a/src/ast/codegen_llvm.cpp
+++ b/src/ast/codegen_llvm.cpp
@@ -3,6 +3,7 @@
 #include "ast.h"
 #include "parser.tab.hh"
 #include "arch/arch.h"
+#include "types.h"

 #include <llvm/Support/raw_os_ostream.h>
 #include <llvm/Support/TargetRegistry.h>
@@ -36,7 +37,10 @@ void CodegenLLVM::visit(Builtin &builtin)
  }
  else if (builtin.ident == "stack" || builtin.ident == "ustack")
  {
-    expr_ = b_.CreateGetStackId(ctx_, builtin.ident == "ustack");
+    // pack uint64_t with: (uint32_t)stack_id, (uint32_t)pid
+    Value *pidhigh = b_.CreateShl(b_.CreateGetPidTgid(), 32);
+    Value *stackid = b_.CreateGetStackId(ctx_, builtin.ident == "ustack");
+    expr_ = b_.CreateOr(stackid, pidhigh);
  }
  else if (builtin.ident == "pid" || builtin.ident == "tid")
  {
@@ -66,6 +70,14 @@ void CodegenLLVM::visit(Builtin &builtin)
  {
    expr_ = b_.CreateGetCpuId();
  }
+  else if (builtin.ident == "curtask")
+  {
+    expr_ = b_.CreateGetCurrentTask();
+  }
+  else if (builtin.ident == "rand")
+  {
+    expr_ = b_.CreateGetRandom();
+  }
  else if (builtin.ident == "comm")
  {
    AllocaInst *buf = b_.CreateAllocaBPF(builtin.type, "comm");
@@ -96,6 +108,14 @@ void CodegenLLVM::visit(Builtin &builtin)
    expr_ = b_.CreateLoad(dst);
    b_.CreateLifetimeEnd(dst);
  }
+  else if (builtin.ident == "name")
+  {
+    static int name_id = 0;
+    bpftrace_.name_ids_.push_back(probefull_);
+    builtin.name_id = name_id;
+    name_id++;
+    expr_ = b_.getInt64(builtin.name_id);
+  }
  else
  {
    abort();
@@ -118,13 +138,138 @@ void CodegenLLVM::visit(Call &call)
    b_.CreateLifetimeEnd(newval);
    expr_ = nullptr;
  }
-  else if (call.func == "quantize")
+  else if (call.func == "sum")
+  {
+    Map &map = *call.map;
+    AllocaInst *key = getMapKey(map);
+    Value *oldval = b_.CreateMapLookupElem(map, key);
+    AllocaInst *newval = b_.CreateAllocaBPF(map.type, map.ident + "_val");
+
+    call.vargs->front()->accept(*this);
+    b_.CreateStore(b_.CreateAdd(expr_, oldval), newval);
+    b_.CreateMapUpdateElem(map, key, newval);
+
+    // oldval can only be an integer so won't be in memory and doesn't need lifetime end
+    b_.CreateLifetimeEnd(key);
+    b_.CreateLifetimeEnd(newval);
+    expr_ = nullptr;
+  }
+  else if (call.func == "min")
+  {
+    Map &map = *call.map;
+    AllocaInst *key = getMapKey(map);
+    Value *oldval = b_.CreateMapLookupElem(map, key);
+    AllocaInst *newval = b_.CreateAllocaBPF(map.type, map.ident + "_val");
+
+    // Store the max of (0xffffffff - val), so that our SGE comparison with uninitialized
+    // elements will always store on the first occurrance. Revent this later when printing.
+    Function *parent = b_.GetInsertBlock()->getParent();
+    call.vargs->front()->accept(*this);
+    Value *inverted = b_.CreateSub(b_.getInt64(0xffffffff), expr_);
+    BasicBlock *lt = BasicBlock::Create(module_->getContext(), "min.lt", parent);
+    BasicBlock *ge = BasicBlock::Create(module_->getContext(), "min.ge", parent);
+    b_.CreateCondBr(b_.CreateICmpSGE(inverted, oldval), ge, lt);
+
+    b_.SetInsertPoint(ge);
+    b_.CreateStore(inverted, newval);
+    b_.CreateMapUpdateElem(map, key, newval);
+    b_.CreateBr(lt);
+
+    b_.SetInsertPoint(lt);
+    b_.CreateLifetimeEnd(key);
+    b_.CreateLifetimeEnd(newval);
+    expr_ = nullptr;
+  }
+  else if (call.func == "max")
+  {
+    Map &map = *call.map;
+    AllocaInst *key = getMapKey(map);
+    Value *oldval = b_.CreateMapLookupElem(map, key);
+    AllocaInst *newval = b_.CreateAllocaBPF(map.type, map.ident + "_val");
+
+    Function *parent = b_.GetInsertBlock()->getParent();
+    call.vargs->front()->accept(*this);
+    BasicBlock *lt = BasicBlock::Create(module_->getContext(), "min.lt", parent);
+    BasicBlock *ge = BasicBlock::Create(module_->getContext(), "min.ge", parent);
+    b_.CreateCondBr(b_.CreateICmpSGE(expr_, oldval), ge, lt);
+
+    b_.SetInsertPoint(ge);
+    b_.CreateStore(expr_, newval);
+    b_.CreateMapUpdateElem(map, key, newval);
+    b_.CreateBr(lt);
+
+    b_.SetInsertPoint(lt);
+    b_.CreateLifetimeEnd(key);
+    b_.CreateLifetimeEnd(newval);
+    expr_ = nullptr;
+  }
+  else if (call.func == "avg" || call.func == "stats")
+  {
+    // avg stores the count and total in a hist map using indexes 0 and 1
+    // respectively, and the calculation is made when printing.
+    Map &map = *call.map;
+
+    AllocaInst *count_key = getHistMapKey(map, b_.getInt64(0));
+    Value *count_old = b_.CreateMapLookupElem(map, count_key);
+    AllocaInst *count_new = b_.CreateAllocaBPF(map.type, map.ident + "_num");
+    b_.CreateStore(b_.CreateAdd(count_old, b_.getInt64(1)), count_new);
+    b_.CreateMapUpdateElem(map, count_key, count_new);
+    b_.CreateLifetimeEnd(count_key);
+    b_.CreateLifetimeEnd(count_new);
+
+    AllocaInst *total_key = getHistMapKey(map, b_.getInt64(1));
+    Value *total_old = b_.CreateMapLookupElem(map, total_key);
+    AllocaInst *total_new = b_.CreateAllocaBPF(map.type, map.ident + "_val");
+    call.vargs->front()->accept(*this);
+    b_.CreateStore(b_.CreateAdd(expr_, total_old), total_new);
+    b_.CreateMapUpdateElem(map, total_key, total_new);
+    b_.CreateLifetimeEnd(total_key);
+    b_.CreateLifetimeEnd(total_new);
+
+    expr_ = nullptr;
+  }
+  else if (call.func == "hist")
  {
    Map &map = *call.map;
    call.vargs->front()->accept(*this);
    Function *log2_func = module_->getFunction("log2");
    Value *log2 = b_.CreateCall(log2_func, expr_, "log2");
-    AllocaInst *key = getQuantizeMapKey(map, log2);
+    AllocaInst *key = getHistMapKey(map, log2);
+
+    Value *oldval = b_.CreateMapLookupElem(map, key);
+    AllocaInst *newval = b_.CreateAllocaBPF(map.type, map.ident + "_val");
+    b_.CreateStore(b_.CreateAdd(oldval, b_.getInt64(1)), newval);
+    b_.CreateMapUpdateElem(map, key, newval);
+
+    // oldval can only be an integer so won't be in memory and doesn't need lifetime end
+    b_.CreateLifetimeEnd(key);
+    b_.CreateLifetimeEnd(newval);
+    expr_ = nullptr;
+  }
+  else if (call.func == "lhist")
+  {
+    Map &map = *call.map;
+    call.vargs->front()->accept(*this);
+    Function *linear_func = module_->getFunction("linear");
+
+    // prepare arguments
+    Integer &value_arg = static_cast<Integer&>(*call.vargs->at(0));
+    Integer &min_arg = static_cast<Integer&>(*call.vargs->at(1));
+    Integer &max_arg = static_cast<Integer&>(*call.vargs->at(2));
+    Integer &step_arg = static_cast<Integer&>(*call.vargs->at(3));
+    Value *value, *min, *max, *step;
+    value_arg.accept(*this);
+    value = expr_;
+    min_arg.accept(*this);
+    min = expr_;
+    max_arg.accept(*this);
+    max = expr_;
+    step_arg.accept(*this);
+    step = expr_;
+
+    Value *linear = b_.CreateCall(linear_func, {value, min, max, step} , "linear");
+
+    AllocaInst *key = getHistMapKey(map, linear);

    Value *oldval = b_.CreateMapLookupElem(map, key);
    AllocaInst *newval = b_.CreateAllocaBPF(map.type, map.ident + "_val");
@@ -153,11 +298,58 @@ void CodegenLLVM::visit(Call &call)
    b_.CreateProbeReadStr(buf, call.type.size, expr_);
    expr_ = buf;
  }
-  else if (call.func == "sym" || call.func == "usym")
+  else if (call.func == "join")
+  {
+    call.vargs->front()->accept(*this);
+    AllocaInst *first = b_.CreateAllocaBPF(SizedType(Type::integer, 8), call.func + "_first");
+    AllocaInst *second = b_.CreateAllocaBPF(b_.getInt64Ty(), call.func+"_second");
+    Value *perfdata = b_.CreateGetJoinMap(ctx_);
+    Function *parent = b_.GetInsertBlock()->getParent();
+    BasicBlock *zero = BasicBlock::Create(module_->getContext(), "joinzero", parent);
+    BasicBlock *notzero = BasicBlock::Create(module_->getContext(), "joinnotzero", parent);
+    b_.CreateCondBr(b_.CreateICmpNE(perfdata, ConstantExpr::getCast(Instruction::IntToPtr, b_.getInt64(0), b_.getInt8PtrTy()), "joinzerocond"), notzero, zero);
+
+    // arg0
+    b_.SetInsertPoint(notzero);
+    b_.CreateStore(b_.getInt64(asyncactionint(AsyncAction::join)), perfdata);
+    AllocaInst *arr = b_.CreateAllocaBPF(b_.getInt64Ty(), call.func+"_r0");
+    b_.CreateProbeRead(arr, 8, expr_);
+    b_.CreateProbeReadStr(b_.CreateAdd(perfdata, b_.getInt64(8)), bpftrace_.join_argsize_, b_.CreateLoad(arr));
+
+    for (int i = 1; i < bpftrace_.join_argnum_; i++) {
+      // argi
+      b_.CreateStore(b_.CreateAdd(expr_, b_.getInt64(8 * i)), first);
+      b_.CreateProbeRead(second, 8, b_.CreateLoad(first));
+      b_.CreateProbeReadStr(b_.CreateAdd(perfdata, b_.getInt64(8 + i * bpftrace_.join_argsize_)), bpftrace_.join_argsize_, b_.CreateLoad(second));
+    }
+
+    // emit
+    b_.CreatePerfEventOutput(ctx_, perfdata, 8 + bpftrace_.join_argnum_ * bpftrace_.join_argsize_);
+
+    b_.CreateBr(zero);
+
+    // done
+    b_.SetInsertPoint(zero);
+    expr_ = nullptr;
+  }
+  else if (call.func == "sym")
  {
    // We want expr_ to just pass through from the child node - don't set it here
    call.vargs->front()->accept(*this);
  }
+  else if (call.func == "usym")
+  {
+    // store uint64_t[2] with: [0]: (uint64_t)addr, [1]: (uint64_t)pid
+    AllocaInst *buf = b_.CreateAllocaBPF(call.type, "usym");
+    b_.CreateMemSet(buf, b_.getInt8(0), call.type.size, 1);
+    Value *pid = b_.CreateLShr(b_.CreateGetPidTgid(), 32);
+    Value *addr_offset = b_.CreateGEP(buf, b_.getInt64(0));
+    Value *pid_offset = b_.CreateGEP(buf, {b_.getInt64(0), b_.getInt64(8)});
+    call.vargs->front()->accept(*this);
+    b_.CreateStore(expr_, addr_offset);
+    b_.CreateStore(pid, pid_offset);
+    expr_ = buf;
+  }
  else if (call.func == "reg")
  {
    auto &reg_name = static_cast<String&>(*call.vargs->at(0)).str;
@@ -173,6 +365,12 @@ void CodegenLLVM::visit(Call &call)
  }
  else if (call.func == "printf")
  {
+    /*
+     * perf event output has: uint64_t printf_id, vargs
+     * The printf_id maps to bpftrace_.printf_args_, and is a way to define the
+     * types and offsets of each of the arguments, and share that between BPF and
+     * user-space for printing.
+     */
    ArrayType *string_type = ArrayType::get(b_.getInt8Ty(), STRING_SIZE);
    std::vector<llvm::Type *> elements = { b_.getInt64Ty() }; // printf ID
    String &fmt = static_cast<String&>(*call.vargs->at(0));
@@ -202,7 +400,7 @@ void CodegenLLVM::visit(Call &call)
      Expression &arg = *call.vargs->at(i);
      arg.accept(*this);
      Value *offset = b_.CreateGEP(printf_args, {b_.getInt32(0), b_.getInt32(i)});
-      if (arg.type.type == Type::string)
+      if (arg.type.type == Type::string || arg.type.type == Type::usym)
        b_.CreateMemCpy(offset, expr_, arg.type.size, 1);
      else
        b_.CreateStore(expr_, offset);
@@ -213,8 +411,108 @@ void CodegenLLVM::visit(Call &call)
    b_.CreateLifetimeEnd(printf_args);
    expr_ = nullptr;
  }
+  else if (call.func == "exit")
+  {
+    /*
+     * perf event output has: uint64_t asyncaction_id
+     * The asyncaction_id informs user-space that this is not a printf(), but is a
+     * special asynchronous action. The ID maps to exit().
+     */
+    ArrayType *perfdata_type = ArrayType::get(b_.getInt8Ty(), sizeof(uint64_t));
+    AllocaInst *perfdata = b_.CreateAllocaBPF(perfdata_type, "perfdata");
+    b_.CreateStore(b_.getInt64(asyncactionint(AsyncAction::exit)), perfdata);
+    b_.CreatePerfEventOutput(ctx_, perfdata, sizeof(uint64_t));
+    b_.CreateLifetimeEnd(perfdata);
+    expr_ = nullptr;
+  }
+  else if (call.func == "print")
+  {
+    /*
+     * perf event output has: uint64_t asyncaction_id, uint64_t top, uint64_t div, string map_ident
+     * The asyncaction_id informs user-space that this is not a printf(), but is a
+     * special asynchronous action. The ID maps to print(). The top argument is either
+     * a value for truncation, or 0 for everything. The div argument divides the output values
+     * by this (eg: for use in nanosecond -> millisecond conversions).
+     * TODO: consider stashing top & div in a printf_args_ like struct, so we don't need to pass
+     * them here via the perfdata output (which is a little more wasteful than need be: I'm using
+     * uint64_t's to avoid "misaligned stack access off" errors when juggling uint32_t's).
+     */
+    auto &arg = *call.vargs->at(0);
+    auto &map = static_cast<Map&>(arg);
+    Constant *const_str = ConstantDataArray::getString(module_->getContext(), map.ident, true);
+    AllocaInst *str_buf = b_.CreateAllocaBPF(ArrayType::get(b_.getInt8Ty(), map.ident.length()), "str");
+    b_.CreateStore(b_.CreateGEP(const_str, b_.getInt64(0)), str_buf);
+    ArrayType *perfdata_type = ArrayType::get(b_.getInt8Ty(), sizeof(uint64_t) + 2 * sizeof(uint64_t) + map.ident.length());
+    AllocaInst *perfdata = b_.CreateAllocaBPF(perfdata_type, "perfdata");
+
+    // store asyncactionid:
+    b_.CreateStore(b_.getInt64(asyncactionint(AsyncAction::print)), perfdata);
+
+    // store top:
+    if (call.vargs->size() > 1)
+    {
+      Integer &top_arg = static_cast<Integer&>(*call.vargs->at(1));
+      Value *top;
+      top_arg.accept(*this);
+      top = expr_;
+      b_.CreateStore(top, b_.CreateGEP(perfdata, {b_.getInt32(0), b_.getInt32(sizeof(uint64_t))}));
+    }
+    else
+      b_.CreateStore(b_.getInt64(0), b_.CreateGEP(perfdata, {b_.getInt64(0), b_.getInt64(sizeof(uint64_t))}));
+
+    // store top:
+    if (call.vargs->size() > 2)
+    {
+      Integer &div_arg = static_cast<Integer&>(*call.vargs->at(2));
+      Value *div;
+      div_arg.accept(*this);
+      div = expr_;
+      b_.CreateStore(div, b_.CreateGEP(perfdata, {b_.getInt64(0), b_.getInt64(sizeof(uint64_t) + sizeof(uint64_t))}));
+    }
+    else
+      b_.CreateStore(b_.getInt64(0), b_.CreateGEP(perfdata, {b_.getInt64(0), b_.getInt64(sizeof(uint64_t) + sizeof(uint64_t))}));
+
+    // store map ident:
+    b_.CreateMemCpy(b_.CreateGEP(perfdata, {b_.getInt64(0), b_.getInt64(sizeof(uint64_t) + 2 * sizeof(uint64_t))}), str_buf, map.ident.length(), 1);
+    b_.CreatePerfEventOutput(ctx_, perfdata, sizeof(uint64_t) + 2 * sizeof(uint64_t) + map.ident.length());
+    b_.CreateLifetimeEnd(perfdata);
+    expr_ = nullptr;
+  }
+  else if (call.func == "clear" || call.func == "zero")
+  {
+    auto &arg = *call.vargs->at(0);
+    auto &map = static_cast<Map&>(arg);
+    Constant *const_str = ConstantDataArray::getString(module_->getContext(), map.ident, true);
+    AllocaInst *str_buf = b_.CreateAllocaBPF(ArrayType::get(b_.getInt8Ty(), map.ident.length()), "str");
+    b_.CreateStore(b_.CreateGEP(const_str, b_.getInt64(0)), str_buf);
+    ArrayType *perfdata_type = ArrayType::get(b_.getInt8Ty(), sizeof(uint64_t) + map.ident.length());
+    AllocaInst *perfdata = b_.CreateAllocaBPF(perfdata_type, "perfdata");
+    if (call.func == "clear")
+      b_.CreateStore(b_.getInt64(asyncactionint(AsyncAction::clear)), perfdata);
+    else
+      b_.CreateStore(b_.getInt64(asyncactionint(AsyncAction::zero)), perfdata);
+    b_.CreateMemCpy(b_.CreateGEP(perfdata, {b_.getInt64(0), b_.getInt64(sizeof(uint64_t))}), str_buf, map.ident.length(), 1);
+    b_.CreatePerfEventOutput(ctx_, perfdata, sizeof(uint64_t) + map.ident.length());
+    b_.CreateLifetimeEnd(perfdata);
+    expr_ = nullptr;
+  }
+  else if (call.func == "time")
+  {
+    ArrayType *perfdata_type = ArrayType::get(b_.getInt8Ty(), sizeof(uint64_t) * 2);
+    AllocaInst *perfdata = b_.CreateAllocaBPF(perfdata_type, "perfdata");
+    b_.CreateStore(b_.getInt64(asyncactionint(AsyncAction::time)), perfdata);
+    static int time_id = 0;
+    b_.CreateStore(b_.getInt64(time_id), b_.CreateGEP(perfdata, {b_.getInt64(0), b_.getInt64(sizeof(uint64_t))}));
+
+    time_id++;
+    b_.CreatePerfEventOutput(ctx_, perfdata, sizeof(uint64_t) * 2);
+    b_.CreateLifetimeEnd(perfdata);
+    expr_ = nullptr;
+  }
+
  else
  {
+    std::cerr << "Error: missing codegen for function \"" << call.func << "\"" << std::endl;
    abort();
  }
 }
@@ -332,6 +630,53 @@ void CodegenLLVM::visit(Unop &unop)
  }
 }

+void CodegenLLVM::visit(Ternary &ternary)
+{
+  Function *parent = b_.GetInsertBlock()->getParent();
+  BasicBlock *left_block = BasicBlock::Create(module_->getContext(), "left", parent);
+  BasicBlock *right_block = BasicBlock::Create(module_->getContext(), "right", parent);
+  BasicBlock *done = BasicBlock::Create(module_->getContext(), "done", parent);
+
+  // ordering of all the following statements is important
+  Value *result = b_.CreateAllocaBPF(ternary.type, "result");
+  AllocaInst *buf = b_.CreateAllocaBPF(ternary.type, "buf");
+  Value *cond;
+  ternary.cond->accept(*this);
+  cond = expr_;
+  b_.CreateCondBr(b_.CreateICmpNE(cond, b_.getInt64(0), "true_cond"),
+                  left_block, right_block);
+
+  if (ternary.type.type == Type::integer) {
+    // fetch selected integer via CreateStore
+    b_.SetInsertPoint(left_block);
+    ternary.left->accept(*this);
+    b_.CreateStore(expr_, result);
+    b_.CreateBr(done);
+
+    b_.SetInsertPoint(right_block);
+    ternary.right->accept(*this);
+    b_.CreateStore(expr_, result);
+    b_.CreateBr(done);
+
+    b_.SetInsertPoint(done);
+    expr_ = b_.CreateLoad(result);
+  } else {
+    // copy selected string via CreateMemCpy
+    b_.SetInsertPoint(left_block);
+    ternary.left->accept(*this);
+    b_.CreateMemCpy(buf, expr_, ternary.type.size, 1);
+    b_.CreateBr(done);
+
+    b_.SetInsertPoint(right_block);
+    ternary.right->accept(*this);
+    b_.CreateMemCpy(buf, expr_, ternary.type.size, 1);
+    b_.CreateBr(done);
+
+    b_.SetInsertPoint(done);
+    expr_ = buf;
+  }
+}
+
 void CodegenLLVM::visit(FieldAccess &acc)
 {
  SizedType &type = acc.expr->type;
@@ -499,21 +844,70 @@ void CodegenLLVM::visit(Probe &probe)
      b_.getInt64Ty(),
      {b_.getInt8PtrTy()}, // struct pt_regs *ctx
      false);
-  Function *func = Function::Create(func_type, Function::ExternalLinkage, probe.name(), module_.get());
-  func->setSection("s_" + probe.name());
-  BasicBlock *entry = BasicBlock::Create(module_->getContext(), "entry", func);
-  b_.SetInsertPoint(entry);

-  ctx_ = func->arg_begin();
+  /*
+   * Most of the time, we can take a probe like kprobe:do_f* and build a
+   * single BPF program for that, called "s_kprobe:do_f*", and attach it to
+   * each wildcard match. An exception is the "name" builtin, where we need
+   * to build different BPF programs for each wildcard match that cantains an
+   * ID for the match. Those programs will be called "s_kprobe:do_fcntl" etc.
+   */
+  if (probe.need_expansion == false) {
+    // build a single BPF program pre-wildcards
+    Function *func = Function::Create(func_type, Function::ExternalLinkage, probe.name(), module_.get());
+    func->setSection("s_" + probe.name());
+    BasicBlock *entry = BasicBlock::Create(module_->getContext(), "entry", func);
+    b_.SetInsertPoint(entry);

-  if (probe.pred) {
-    probe.pred->accept(*this);
-  }
-  for (Statement *stmt : *probe.stmts) {
-    stmt->accept(*this);
-  }
+    ctx_ = func->arg_begin();

-  b_.CreateRet(ConstantInt::get(module_->getContext(), APInt(64, 0)));
+    if (probe.pred) {
+      probe.pred->accept(*this);
+    }
+    for (Statement *stmt : *probe.stmts) {
+      stmt->accept(*this);
+    }
+
+    b_.CreateRet(ConstantInt::get(module_->getContext(), APInt(64, 0)));
+
+  } else {
+    // build a separate BPF programs for each wildcard match
+    for (auto &attach_point : *probe.attach_points) {
+      std::string file_name;
+      switch (probetype(attach_point->provider))
+      {
+        case ProbeType::kprobe:
+        case ProbeType::kretprobe:
+          file_name = "/sys/kernel/debug/tracing/available_filter_functions";
+          break;
+        case ProbeType::tracepoint:
+          file_name = "/sys/kernel/debug/tracing/available_events";
+          break;
+        default:
+          std::cerr << "Wildcard matches aren't available on probe type '"
+                    << attach_point->provider << "'" << std::endl;
+          return;
+      }
+      auto matches = bpftrace_.find_wildcard_matches(attach_point->target, attach_point->func, file_name);
+      for (auto &match : matches) {
+        probefull_ = attach_point->name(match);
+        Function *func = Function::Create(func_type, Function::ExternalLinkage, attach_point->name(match), module_.get());
+        func->setSection("s_" + attach_point->name(match));
+        BasicBlock *entry = BasicBlock::Create(module_->getContext(), "entry", func);
+        b_.SetInsertPoint(entry);
+
+        // check: do the following 8 lines need to be in the wildcard loop?
+        ctx_ = func->arg_begin();
+        if (probe.pred) {
+          probe.pred->accept(*this);
+        }
+        for (Statement *stmt : *probe.stmts) {
+          stmt->accept(*this);
+        }
+        b_.CreateRet(ConstantInt::get(module_->getContext(), APInt(64, 0)));
+      }
+    }
+  }
 }

 void CodegenLLVM::visit(Program &program)
@@ -537,7 +931,7 @@ AllocaInst *CodegenLLVM::getMapKey(Map &map)
    for (Expression *expr : *map.vargs) {
      expr->accept(*this);
      Value *offset_val = b_.CreateGEP(key, {b_.getInt64(0), b_.getInt64(offset)});
-      if (expr->type.type == Type::string)
+      if (expr->type.type == Type::string || expr->type.type == Type::usym)
        b_.CreateMemCpy(offset_val, expr_, expr->type.size, 1);
      else
        b_.CreateStore(expr_, offset_val);
@@ -552,7 +946,7 @@ AllocaInst *CodegenLLVM::getMapKey(Map &map)
  return key;
 }

-AllocaInst *CodegenLLVM::getQuantizeMapKey(Map &map, Value *log2)
+AllocaInst *CodegenLLVM::getHistMapKey(Map &map, Value *log2)
 {
  AllocaInst *key;
  if (map.vargs) {
@@ -567,7 +961,7 @@ AllocaInst *CodegenLLVM::getQuantizeMapKey(Map &map, Value *log2)
    for (Expression *expr : *map.vargs) {
      expr->accept(*this);
      Value *offset_val = b_.CreateGEP(key, {b_.getInt64(0), b_.getInt64(offset)});
-      if (expr->type.type == Type::string)
+      if (expr->type.type == Type::string || expr->type.type == Type::usym)
        b_.CreateMemCpy(offset_val, expr_, expr->type.size, 1);
      else
        b_.CreateStore(expr_, offset_val);
@@ -702,6 +1096,75 @@ void CodegenLLVM::createLog2Function()
  b_.CreateRet(b_.CreateLoad(result));
 }

+void CodegenLLVM::createLinearFunction()
+{
+  // lhist() returns a bucket index for the given value. The first and last
+  //   bucket indexes are special: they are 0 for the less-than-range
+  //   bucket, and index max_bucket+2 for the greater-than-range bucket.
+  //   Indexes 1 to max_bucket+1 span the buckets in the range.
+  //
+  // int lhist(int value, int min, int max, int step)
+  // {
+  // 	int result;
+  //
+  // 	if (value < min)
+  // 		return 0;
+  // 	if (value > max)
+  // 		return 1 + (max - min) / step;
+  // 	result = 1 + (value - min) / step;
+  //
+  // 	return result;
+  // }
+
+  // inlined function initialization
+  FunctionType *linear_func_type = FunctionType::get(b_.getInt64Ty(), {b_.getInt64Ty(), b_.getInt64Ty(), b_.getInt64Ty(), b_.getInt64Ty()}, false);
+  Function *linear_func = Function::Create(linear_func_type, Function::InternalLinkage, "linear", module_.get());
+  linear_func->addFnAttr(Attribute::AlwaysInline);
+  linear_func->setSection("helpers");
+  BasicBlock *entry = BasicBlock::Create(module_->getContext(), "entry", linear_func);
+  b_.SetInsertPoint(entry);
+
+  // pull in arguments
+  Value *value_alloc = b_.CreateAllocaBPF(SizedType(Type::integer, 8));
+  Value *min_alloc = b_.CreateAllocaBPF(SizedType(Type::integer, 8));
+  Value *max_alloc = b_.CreateAllocaBPF(SizedType(Type::integer, 8));
+  Value *step_alloc = b_.CreateAllocaBPF(SizedType(Type::integer, 8));
+  Value *result_alloc = b_.CreateAllocaBPF(SizedType(Type::integer, 8));
+  Value *value = linear_func->arg_begin()+0;
+  Value *min = linear_func->arg_begin()+1;
+  Value *max = linear_func->arg_begin()+2;
+  Value *step = linear_func->arg_begin()+3;
+  b_.CreateStore(value, value_alloc);
+  b_.CreateStore(min, min_alloc);
+  b_.CreateStore(max, max_alloc);
+  b_.CreateStore(step, step_alloc);
+
+  // algorithm
+  Value *cmp = b_.CreateICmpSLT(b_.CreateLoad(value_alloc), b_.CreateLoad(min_alloc));
+  BasicBlock *lt_min = BasicBlock::Create(module_->getContext(), "lhist.lt_min", linear_func);
+  BasicBlock *ge_min = BasicBlock::Create(module_->getContext(), "lhist.ge_min", linear_func);
+  b_.CreateCondBr(cmp, lt_min, ge_min);
+
+  b_.SetInsertPoint(lt_min);
+  b_.CreateRet(b_.getInt64(0));
+
+  b_.SetInsertPoint(ge_min);
+  Value *cmp1 = b_.CreateICmpSGT(b_.CreateLoad(value_alloc), b_.CreateLoad(max_alloc));
+  BasicBlock *le_max = BasicBlock::Create(module_->getContext(), "lhist.le_max", linear_func);
+  BasicBlock *gt_max = BasicBlock::Create(module_->getContext(), "lhist.gt_max", linear_func);
+  b_.CreateCondBr(cmp1, gt_max, le_max);
+
+  b_.SetInsertPoint(gt_max);
+  Value *div = b_.CreateSDiv(b_.CreateSub(b_.CreateLoad(max_alloc), b_.CreateLoad(min_alloc)), b_.CreateLoad(step_alloc));
+  b_.CreateStore(b_.CreateAdd(div, b_.getInt64(1)), result_alloc);
+  b_.CreateRet(b_.CreateLoad(result_alloc));
+
+  b_.SetInsertPoint(le_max);
+  Value *div3 = b_.CreateSDiv(b_.CreateSub(b_.CreateLoad(value_alloc), b_.CreateLoad(min_alloc)), b_.CreateLoad(step_alloc));
+  b_.CreateStore(b_.CreateAdd(div3, b_.getInt64(1)), result_alloc);
+  b_.CreateRet(b_.CreateLoad(result_alloc));
+}
+
 void CodegenLLVM::createStrcmpFunction()
 {
  // Returns 1 if strings match, 0 otherwise
@@ -746,6 +1209,7 @@ void CodegenLLVM::createStrcmpFunction()
 std::unique_ptr<BpfOrc> CodegenLLVM::compile(bool debug, std::ostream &out)
 {
  createLog2Function();
+  createLinearFunction();
  createStrcmpFunction();
  root_->accept(*this);


--- a/src/ast/codegen_llvm.h
+++ b/src/ast/codegen_llvm.h
@@ -35,6 +35,7 @@ public:
  void visit(Variable &var) override;
  void visit(Binop &binop) override;
  void visit(Unop &unop) override;
+  void visit(Ternary &ternary) override;
  void visit(FieldAccess &acc) override;
  void visit(Cast &cast) override;
  void visit(ExprStatement &expr) override;
@@ -45,11 +46,12 @@ public:
  void visit(Probe &probe) override;
  void visit(Program &program) override;
  AllocaInst *getMapKey(Map &map);
-  AllocaInst *getQuantizeMapKey(Map &map, Value *log2);
+  AllocaInst *getHistMapKey(Map &map, Value *log2);
  Value      *createLogicalAnd(Binop &binop);
  Value      *createLogicalOr(Binop &binop);

  void createLog2Function();
+  void createLinearFunction();
  void createStrcmpFunction();
  std::unique_ptr<BpfOrc> compile(bool debug=false, std::ostream &out=std::cerr);

@@ -63,6 +65,7 @@ private:
  Value *expr_ = nullptr;
  Value *ctx_;
  BPFtrace &bpftrace_;
+  std::string probefull_;

  std::map<std::string, Value *> variables_;
  int printf_id_ = 0;

--- a/src/ast/irbuilderbpf.cpp
+++ b/src/ast/irbuilderbpf.cpp
@@ -25,7 +25,7 @@ IRBuilderBPF::IRBuilderBPF(LLVMContext &context,
      &module_);
 }

-AllocaInst *IRBuilderBPF::CreateAllocaBPF(llvm::Type *ty, const std::string &name)
+AllocaInst *IRBuilderBPF::CreateAllocaBPF(llvm::Type *ty, llvm::Value *arraysize, const std::string &name)
 {
  Function *parent = GetInsertBlock()->getParent();
  BasicBlock &entry_block = parent->getEntryBlock();
@@ -35,17 +35,28 @@ AllocaInst *IRBuilderBPF::CreateAllocaBPF(llvm::Type *ty, const std::string &nam
    SetInsertPoint(&entry_block);
  else
    SetInsertPoint(&entry_block.front());
-  AllocaInst *alloca = CreateAlloca(ty, nullptr, name); // TODO dodgy
+  AllocaInst *alloca = CreateAlloca(ty, arraysize, name); // TODO dodgy
  restoreIP(ip);

  CreateLifetimeStart(alloca);
  return alloca;
 }

+AllocaInst *IRBuilderBPF::CreateAllocaBPF(llvm::Type *ty, const std::string &name)
+{
+  return CreateAllocaBPF(ty, nullptr, name);
+}
+
 AllocaInst *IRBuilderBPF::CreateAllocaBPF(const SizedType &stype, const std::string &name)
 {
  llvm::Type *ty = GetType(stype);
-  return CreateAllocaBPF(ty, name);
+  return CreateAllocaBPF(ty, nullptr, name);
+}
+
+AllocaInst *IRBuilderBPF::CreateAllocaBPF(const SizedType &stype, llvm::Value *arraysize, const std::string &name)
+{
+  llvm::Type *ty = GetType(stype);
+  return CreateAllocaBPF(ty, arraysize, name);
 }

 AllocaInst *IRBuilderBPF::CreateAllocaBPF(int bytes, const std::string &name)
@@ -57,7 +68,7 @@ AllocaInst *IRBuilderBPF::CreateAllocaBPF(int bytes, const std::string &name)
 llvm::Type *IRBuilderBPF::GetType(const SizedType &stype)
 {
  llvm::Type *ty;
-  if (stype.type == Type::string || (stype.type == Type::cast && !stype.is_pointer))
+  if (stype.type == Type::string || stype.type == Type::usym || (stype.type == Type::cast && !stype.is_pointer))
  {
    ty = ArrayType::get(getInt8Ty(), stype.size);
  }
@@ -96,6 +107,26 @@ CallInst *IRBuilderBPF::CreateBpfPseudoCall(Map &map)
  return CreateBpfPseudoCall(mapfd);
 }

+CallInst *IRBuilderBPF::CreateGetJoinMap(Value *ctx)
+{
+  Value *map_ptr = CreateBpfPseudoCall(bpftrace_.join_map_->mapfd_);
+  AllocaInst *key = CreateAllocaBPF(getInt32Ty(), "key");
+  Value *keyv = getInt32(0);
+  CreateStore(keyv, key);
+
+  FunctionType *lookup_func_type = FunctionType::get(
+      getInt8PtrTy(),
+      {getInt8PtrTy(), getInt8PtrTy()},
+      false);
+  PointerType *lookup_func_ptr_type = PointerType::get(lookup_func_type, 0);
+  Constant *lookup_func = ConstantExpr::getCast(
+      Instruction::IntToPtr,
+      getInt64(BPF_FUNC_map_lookup_elem),
+      lookup_func_ptr_type);
+  CallInst *call = CreateCall(lookup_func, {map_ptr, key}, "join_elem");
+  return call;
+}
+
 Value *IRBuilderBPF::CreateMapLookupElem(Map &map, AllocaInst *key)
 {
  Value *map_ptr = CreateBpfPseudoCall(map);
@@ -201,7 +232,22 @@ void IRBuilderBPF::CreateProbeRead(AllocaInst *dst, size_t size, Value *src)
  CallInst *call = CreateCall(proberead_func, {dst, getInt64(size), src}, "probe_read");
 }

-void IRBuilderBPF::CreateProbeReadStr(AllocaInst *dst, size_t size, Value *src)
+CallInst *IRBuilderBPF::CreateProbeReadStr(AllocaInst *dst, size_t size, Value *src)
+{
+  // int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
+  FunctionType *probereadstr_func_type = FunctionType::get(
+      getInt64Ty(),
+      {getInt8PtrTy(), getInt64Ty(), getInt8PtrTy()},
+      false);
+  PointerType *probereadstr_func_ptr_type = PointerType::get(probereadstr_func_type, 0);
+  Constant *probereadstr_func = ConstantExpr::getCast(
+      Instruction::IntToPtr,
+      getInt64(BPF_FUNC_probe_read_str),
+      probereadstr_func_ptr_type);
+  return CreateCall(probereadstr_func, {dst, getInt64(size), src}, "probe_read_str");
+}
+
+CallInst *IRBuilderBPF::CreateProbeReadStr(Value *dst, size_t size, Value *src)
 {
  // int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr)
  FunctionType *probereadstr_func_type = FunctionType::get(
@@ -213,7 +259,7 @@ void IRBuilderBPF::CreateProbeReadStr(AllocaInst *dst, size_t size, Value *src)
      Instruction::IntToPtr,
      getInt64(BPF_FUNC_probe_read_str),
      probereadstr_func_ptr_type);
-  CallInst *call = CreateCall(probereadstr_func, {dst, getInt64(size), src}, "probe_read_str");
+  return CreateCall(probereadstr_func, {dst, getInt64(size), src}, "map_read_str");
 }

 CallInst *IRBuilderBPF::CreateGetNs()
@@ -268,6 +314,32 @@ CallInst *IRBuilderBPF::CreateGetCpuId()
  return CreateCall(getcpuid_func, {}, "get_cpu_id");
 }

+CallInst *IRBuilderBPF::CreateGetCurrentTask()
+{
+  // u64 bpf_get_current_task(void)
+  // Return: current task_struct
+  FunctionType *getcurtask_func_type = FunctionType::get(getInt64Ty(), false);
+  PointerType *getcurtask_func_ptr_type = PointerType::get(getcurtask_func_type, 0);
+  Constant *getcurtask_func = ConstantExpr::getCast(
+      Instruction::IntToPtr,
+      getInt64(BPF_FUNC_get_current_task),
+      getcurtask_func_ptr_type);
+  return CreateCall(getcurtask_func, {}, "get_cur_task");
+}
+
+CallInst *IRBuilderBPF::CreateGetRandom()
+{
+  // u64 bpf_get_prandom_u32(void)
+  // Return: random
+  FunctionType *getrandom_func_type = FunctionType::get(getInt64Ty(), false);
+  PointerType *getrandom_func_ptr_type = PointerType::get(getrandom_func_type, 0);
+  Constant *getrandom_func = ConstantExpr::getCast(
+      Instruction::IntToPtr,
+      getInt64(BPF_FUNC_get_prandom_u32),
+      getrandom_func_ptr_type);
+  return CreateCall(getrandom_func, {}, "get_random");
+}
+
 CallInst *IRBuilderBPF::CreateGetStackId(Value *ctx, bool ustack)
 {
  Value *map_ptr = CreateBpfPseudoCall(bpftrace_.stackid_map_->mapfd_);

--- a/src/ast/irbuilderbpf.h
+++ b/src/ast/irbuilderbpf.h
@@ -20,6 +20,8 @@ public:

  AllocaInst *CreateAllocaBPF(llvm::Type *ty, const std::string &name="");
  AllocaInst *CreateAllocaBPF(const SizedType &stype, const std::string &name="");
+  AllocaInst *CreateAllocaBPF(llvm::Type *ty, llvm::Value *arraysize, const std::string &name="");
+  AllocaInst *CreateAllocaBPF(const SizedType &stype, llvm::Value *arraysize, const std::string &name="");
  AllocaInst *CreateAllocaBPF(int bytes, const std::string &name="");
  llvm::Type *GetType(const SizedType &stype);
  CallInst   *CreateBpfPseudoCall(int mapfd);
@@ -28,12 +30,16 @@ public:
  void        CreateMapUpdateElem(Map &map, AllocaInst *key, Value *val);
  void        CreateMapDeleteElem(Map &map, AllocaInst *key);
  void        CreateProbeRead(AllocaInst *dst, size_t size, Value *src);
-  void        CreateProbeReadStr(AllocaInst *dst, size_t size, Value *src);
+  CallInst   *CreateProbeReadStr(AllocaInst *dst, size_t size, Value *src);
+  CallInst   *CreateProbeReadStr(Value *dst, size_t size, Value *src);
  CallInst   *CreateGetNs();
  CallInst   *CreateGetPidTgid();
  CallInst   *CreateGetUidGid();
  CallInst   *CreateGetCpuId();
+  CallInst   *CreateGetCurrentTask();
+  CallInst   *CreateGetRandom();
  CallInst   *CreateGetStackId(Value *ctx, bool ustack);
+  CallInst   *CreateGetJoinMap(Value *ctx);
  void        CreateGetCurrentComm(AllocaInst *buf, size_t size);
  void        CreatePerfEventOutput(Value *ctx, Value *data, size_t size);


--- a/src/ast/printer.cpp
+++ b/src/ast/printer.cpp
@@ -86,6 +86,18 @@ void Printer::visit(Unop &unop)
  --depth_;
 }

+void Printer::visit(Ternary &ternary)
+{
+  std::string indent(depth_, ' ');
+  out_ << indent << "?:" << std::endl;
+
+  ++depth_;
+  ternary.cond->accept(*this);
+  ternary.left->accept(*this);
+  ternary.right->accept(*this);
+  --depth_;
+}
+
 void Printer::visit(FieldAccess &acc)
 {
  std::string indent(depth_, ' ');

--- a/src/ast/printer.h
+++ b/src/ast/printer.h
@@ -18,6 +18,7 @@ public:
  void visit(Variable &var) override;
  void visit(Binop &binop) override;
  void visit(Unop &unop) override;
+  void visit(Ternary &ternary) override;
  void visit(FieldAccess &acc) override;
  void visit(Cast &cast) override;
  void visit(ExprStatement &expr) override;

--- a/src/ast/semantic_analyser.cpp
+++ b/src/ast/semantic_analyser.cpp
@@ -4,6 +4,7 @@
 #include "parser.tab.hh"
 #include "printf.h"
 #include "arch/arch.h"
+#include <sys/stat.h>

 #include "libbpf.h"

@@ -31,6 +32,8 @@ void SemanticAnalyser::visit(Builtin &builtin)
      builtin.ident == "uid" ||
      builtin.ident == "gid" ||
      builtin.ident == "cpu" ||
+      builtin.ident == "curtask" ||
+      builtin.ident == "rand" ||
      builtin.ident == "retval") {
    builtin.type = SizedType(Type::integer, 8);
  }
@@ -54,7 +57,7 @@ void SemanticAnalyser::visit(Builtin &builtin)
          type == ProbeType::tracepoint)
        builtin.type = SizedType(Type::sym, 8);
      else if (type == ProbeType::uprobe || type == ProbeType::uretprobe)
-        builtin.type = SizedType(Type::usym, 8);
+        builtin.type = SizedType(Type::usym, 16);
      else
        err_ << "The func builtin can not be used with '" << attach_point->provider
             << "' probes" << std::endl;
@@ -67,6 +70,10 @@ void SemanticAnalyser::visit(Builtin &builtin)
      err_ << arch::name() << " doesn't support " << builtin.ident << std::endl;
    builtin.type = SizedType(Type::integer, 8);
  }
+  else if (builtin.ident == "name") {
+    builtin.type = SizedType(Type::name, 8);
+    probe_->need_expansion = true;
+  }
  else {
    builtin.type = SizedType(Type::none, 0);
    err_ << "Unknown builtin variable: '" << builtin.ident << "'" << std::endl;
@@ -81,12 +88,46 @@ void SemanticAnalyser::visit(Call &call)
    }
  }

-  if (call.func == "quantize") {
+  if (call.func == "hist") {
    check_assignment(call, true, false);
    check_nargs(call, 1);
    check_arg(call, Type::integer, 0);

-    call.type = SizedType(Type::quantize, 8);
+    call.type = SizedType(Type::hist, 8);
+  }
+  else if (call.func == "lhist") {
+    check_nargs(call, 4);
+    check_arg(call, Type::integer, 0);
+    check_arg(call, Type::integer, 1);
+    check_arg(call, Type::integer, 2);
+    check_arg(call, Type::integer, 3);
+
+    if (is_final_pass()) {
+      Expression &min_arg = *call.vargs->at(1);
+      Expression &max_arg = *call.vargs->at(2);
+      Expression &step_arg = *call.vargs->at(3);
+      Integer &min = static_cast<Integer&>(min_arg);
+      Integer &max = static_cast<Integer&>(max_arg);
+      Integer &step = static_cast<Integer&>(step_arg);
+      if (step.n <= 0)
+        err_ << "lhist() step must be >= 1 (" << step.n << " provided)" << std::endl;
+      else
+      {
+        int buckets = (max.n - min.n) / step.n;
+        if (buckets > 1000)
+          err_ << "lhist() too many buckets, must be <= 1000 (would need " << buckets << ")" << std::endl;
+      }
+      if (min.n > max.n)
+        err_ << "lhist() min must be less than max (provided min " << min.n << " and max " << max.n << ")" << std::endl;
+      if ((max.n - min.n) < step.n)
+        err_ << "lhist() step is too large for the given range (provided step " << step.n << " for range " << (max.n - min.n) << ")" << std::endl;
+
+      // store args for later passing to bpftrace::Map
+      auto search = map_args_.find(call.map->ident);
+      if (search == map_args_.end())
+        map_args_.insert({call.map->ident, *call.vargs});
+    }
+    call.type = SizedType(Type::lhist, 8);
  }
  else if (call.func == "count") {
    check_assignment(call, true, false);
@@ -94,6 +135,36 @@ void SemanticAnalyser::visit(Call &call)

    call.type = SizedType(Type::count, 8);
  }
+  else if (call.func == "sum") {
+    check_assignment(call, true, false);
+    check_nargs(call, 1);
+
+    call.type = SizedType(Type::sum, 8);
+  }
+  else if (call.func == "min") {
+    check_assignment(call, true, false);
+    check_nargs(call, 1);
+
+    call.type = SizedType(Type::min, 8);
+  }
+  else if (call.func == "max") {
+    check_assignment(call, true, false);
+    check_nargs(call, 1);
+
+    call.type = SizedType(Type::max, 8);
+  }
+  else if (call.func == "avg") {
+    check_assignment(call, true, false);
+    check_nargs(call, 1);
+
+    call.type = SizedType(Type::avg, 8);
+  }
+  else if (call.func == "stats") {
+    check_assignment(call, true, false);
+    check_nargs(call, 1);
+
+    call.type = SizedType(Type::stats, 8);
+  }
  else if (call.func == "delete") {
    check_assignment(call, false, false);
    if (check_nargs(call, 1)) {
@@ -113,7 +184,14 @@ void SemanticAnalyser::visit(Call &call)
    else if (call.func == "sym")
      call.type = SizedType(Type::sym, 8);
    else if (call.func == "usym")
-      call.type = SizedType(Type::usym, 8);
+      call.type = SizedType(Type::usym, 16);
+  }
+  else if (call.func == "join") {
+    check_assignment(call, false, false);
+    check_nargs(call, 1);
+    check_arg(call, Type::integer, 0);
+    call.type = SizedType(Type::none, 0);
+    needs_join_map_ = true;
  }
  else if (call.func == "reg") {
    if (check_nargs(call, 1)) {
@@ -149,6 +227,57 @@ void SemanticAnalyser::visit(Call &call)

    call.type = SizedType(Type::none, 0);
  }
+  else if (call.func == "exit") {
+    check_nargs(call, 0);
+  }
+  else if (call.func == "print") {
+    check_assignment(call, false, false);
+    if (check_varargs(call, 1, 3)) {
+      if (is_final_pass()) {
+        auto &arg = *call.vargs->at(0);
+        if (!arg.is_map)
+          err_ << "print() expects a map to be provided" << std::endl;
+        if (call.vargs->size() > 1)
+          check_arg(call, Type::integer, 1, true);
+        if (call.vargs->size() > 2)
+          check_arg(call, Type::integer, 2, true);
+      }
+    }
+  }
+  else if (call.func == "clear") {
+    check_assignment(call, false, false);
+    check_nargs(call, 1);
+    if (check_nargs(call, 1)) {
+      auto &arg = *call.vargs->at(0);
+      if (!arg.is_map)
+        err_ << "clear() expects a map to be provided" << std::endl;
+    }
+  }
+  else if (call.func == "zero") {
+    check_assignment(call, false, false);
+    check_nargs(call, 1);
+    if (check_nargs(call, 1)) {
+      auto &arg = *call.vargs->at(0);
+      if (!arg.is_map)
+        err_ << "zero() expects a map to be provided" << std::endl;
+    }
+  }
+  else if (call.func == "time") {
+    check_assignment(call, false, false);
+    if (check_varargs(call, 0, 1)) {
+      if (is_final_pass()) {
+        if (call.vargs && call.vargs->size() > 0) {
+          check_arg(call, Type::string, 0, true);
+          auto &fmt_arg = *call.vargs->at(0);
+          String &fmt = static_cast<String&>(fmt_arg);
+          bpftrace_.time_args_.push_back(fmt.str);
+        } else {
+          std::string fmt_default = "%H:%M:%S\n";
+          bpftrace_.time_args_.push_back(fmt_default.c_str());
+        }
+      }
+    }
+  }
  else {
    err_ << "Unknown function: '" << call.func << "'" << std::endl;
    call.type = SizedType(Type::none, 0);
@@ -167,6 +296,14 @@ void SemanticAnalyser::visit(Map &map)

  auto search = map_key_.find(map.ident);
  if (search != map_key_.end()) {
+    /*
+     * TODO: this code ensures that map keys are consistent, but
+     * currently prevents print() and clear() being used, since
+     * for example "@x[pid] = count(); ... print(@x)" is detected
+     * as having inconsistent keys. We need a way to do this check
+     * differently for print() and clear() calls. I've commented it
+     * out for now - Brendan.
+     *
    if (search->second != key) {
      err_ << "Argument mismatch for " << map.ident << ": ";
      err_ << "trying to access with arguments: ";
@@ -175,6 +312,7 @@ void SemanticAnalyser::visit(Map &map)
      err_ << search->second.argument_type_list();
      err_ << "\n" << std::endl;
    }
+     */
  }
  else {
    map_key_.insert({map.ident, key});
@@ -263,6 +401,29 @@ void SemanticAnalyser::visit(Unop &unop)
  }
 }

+void SemanticAnalyser::visit(Ternary &ternary)
+{
+  ternary.cond->accept(*this);
+  ternary.left->accept(*this);
+  ternary.right->accept(*this);
+  Type &lhs = ternary.left->type.type;
+  Type &rhs = ternary.right->type.type;
+  if (is_final_pass()) {
+    if (lhs != rhs) {
+      err_ << "Ternary operator must return the same type: ";
+      err_ << "have '" << lhs << "' ";
+      err_ << "and '" << rhs << "'" << std::endl;
+    }
+  }
+  if (lhs == Type::string)
+    ternary.type = SizedType(lhs, STRING_SIZE);
+  else if (lhs == Type::integer)
+    ternary.type = SizedType(lhs, 8);
+  else {
+    err_ << "Ternary return type unsupported " << lhs << std::endl;
+  }
+}
+
 void SemanticAnalyser::visit(FieldAccess &acc)
 {
  acc.expr->accept(*this);
@@ -438,6 +599,13 @@ void SemanticAnalyser::visit(AttachPoint &ap)
    if (ap.func == "")
      err_ << "uprobes should be attached to a function" << std::endl;
  }
+  else if (ap.provider == "usdt") {
+    if (ap.target == "" || ap.func == "")
+      err_ << "usdt probe must have a target" << std::endl;
+    struct stat s;
+    if (stat(ap.target.c_str(), &s) != 0)
+      err_ << "usdt target file " << ap.target << " does not exist" << std::endl;
+  }
  else if (ap.provider == "tracepoint") {
    if (ap.target == "" || ap.func == "")
      err_ << "tracepoint probe must have a target" << std::endl;
@@ -455,6 +623,53 @@ void SemanticAnalyser::visit(AttachPoint &ap)
    else if (ap.freq <= 0)
      err_ << "profile frequency should be a positive integer" << std::endl;
  }
+  else if (ap.provider == "interval") {
+    if (ap.target == "")
+      err_ << "interval probe must have unit of time" << std::endl;
+    else if (ap.target != "ms" &&
+             ap.target != "s")
+      err_ << ap.target << " is not an accepted unit of time" << std::endl;
+    if (ap.func != "")
+      err_ << "interval probe must have an integer frequency" << std::endl;
+  }
+  else if (ap.provider == "software") {
+    if (ap.target == "")
+      err_ << "software probe must have a software event name" << std::endl;
+    else if (ap.target != "cpu-clock" && ap.target != "cpu" &&
+             ap.target != "task-clock" &&
+             ap.target != "page-faults" && ap.target != "faults" &&
+             ap.target != "context-switches" && ap.target != "cs" &&
+             ap.target != "cpu-migrations" &&
+             ap.target != "minor-faults" &&
+             ap.target != "major-faults" &&
+             ap.target != "alignment-faults" &&
+             ap.target != "emulation-faults" &&
+             ap.target != "dummy" &&
+             ap.target != "bpf-output")
+      err_ << ap.target << " is not a software probe" << std::endl;
+    if (ap.func != "")
+      err_ << "software probe can only have an integer count" << std::endl;
+    else if (ap.freq < 0)
+      err_ << "software count should be a positive integer" << std::endl;
+  }
+  else if (ap.provider == "hardware") {
+    if (ap.target == "")
+      err_ << "hardware probe must have a hardware event name" << std::endl;
+    else if (ap.target != "cpu-cycles" && ap.target != "cycles" &&
+             ap.target != "instructions" &&
+             ap.target != "cache-references" &&
+             ap.target != "cache-misses" &&
+             ap.target != "branch-instructions" && ap.target != "branches" &&
+             ap.target != "bus-cycles" &&
+             ap.target != "frontend-stalls" &&
+             ap.target != "backend-stalls" &&
+             ap.target != "ref-cycles")
+      err_ << ap.target << " is not a hardware probe" << std::endl;
+    if (ap.func != "")
+      err_ << "hardware probe can only have an integer count" << std::endl;
+    else if (ap.freq < 0)
+      err_ << "hardware frequency should be a positive integer" << std::endl;
+  }
  else if (ap.provider == "BEGIN" || ap.provider == "END") {
    if (ap.target != "" || ap.func != "")
      err_ << "BEGIN/END probes should not have a target" << std::endl;
@@ -535,19 +750,52 @@ int SemanticAnalyser::create_maps(bool debug)
    if (debug)
      bpftrace_.maps_[map_name] = std::make_unique<bpftrace::FakeMap>(map_name, type, key);
    else
-      bpftrace_.maps_[map_name] = std::make_unique<bpftrace::Map>(map_name, type, key);
+    {
+      if (type.type == Type::lhist)
+      {
+        // store lhist args to the bpftrace::Map
+        auto map_args = map_args_.find(map_name);
+        if (map_args == map_args_.end())
+          abort();
+        Expression &min_arg = *map_args->second.at(1);
+        Expression &max_arg = *map_args->second.at(2);
+        Expression &step_arg = *map_args->second.at(3);
+        Integer &min = static_cast<Integer&>(min_arg);
+        Integer &max = static_cast<Integer&>(max_arg);
+        Integer &step = static_cast<Integer&>(step_arg);
+        bpftrace_.maps_[map_name] = std::make_unique<bpftrace::Map>(map_name, type, key, min.n, max.n, step.n);
+      }
+      else
+        bpftrace_.maps_[map_name] = std::make_unique<bpftrace::Map>(map_name, type, key);
+    }
  }

  if (debug)
  {
    if (needs_stackid_map_)
      bpftrace_.stackid_map_ = std::make_unique<bpftrace::FakeMap>(BPF_MAP_TYPE_STACK_TRACE);
+    if (needs_join_map_)
+    {
+      // join uses map storage as we'd like to process data larger than can fit on the BPF stack.
+      std::string map_ident = "join";
+      SizedType type = SizedType(Type::join, 8 + bpftrace_.join_argnum_ * bpftrace_.join_argsize_);
+      MapKey key;
+      bpftrace_.join_map_ = std::make_unique<bpftrace::FakeMap>(map_ident, type, key);
+    }
    bpftrace_.perf_event_map_ = std::make_unique<bpftrace::FakeMap>(BPF_MAP_TYPE_PERF_EVENT_ARRAY);
  }
  else
  {
    if (needs_stackid_map_)
      bpftrace_.stackid_map_ = std::make_unique<bpftrace::Map>(BPF_MAP_TYPE_STACK_TRACE);
+    if (needs_join_map_)
+    {
+      // join uses map storage as we'd like to process data larger than can fit on the BPF stack.
+      std::string map_ident = "join";
+      SizedType type = SizedType(Type::join, 8 + bpftrace_.join_argnum_ * bpftrace_.join_argsize_);
+      MapKey key;
+      bpftrace_.join_map_ = std::make_unique<bpftrace::Map>(map_ident, type, key);
+    }
    bpftrace_.perf_event_map_ = std::make_unique<bpftrace::Map>(BPF_MAP_TYPE_PERF_EVENT_ARRAY);
  }


--- a/src/ast/semantic_analyser.h
+++ b/src/ast/semantic_analyser.h
@@ -26,6 +26,7 @@ public:
  void visit(Variable &var) override;
  void visit(Binop &binop) override;
  void visit(Unop &unop) override;
+  void visit(Ternary &ternary) override;
  void visit(FieldAccess &acc) override;
  void visit(Cast &cast) override;
  void visit(ExprStatement &expr) override;
@@ -59,7 +60,9 @@ private:
  std::map<std::string, SizedType> variable_val_;
  std::map<std::string, SizedType> map_val_;
  std::map<std::string, MapKey> map_key_;
+  std::map<std::string, ExpressionList> map_args_;
  bool needs_stackid_map_ = false;
+  bool needs_join_map_ = false;
  bool has_begin_probe_ = false;
  bool has_end_probe_ = false;
 };

--- a/src/attached_probe.cpp
+++ b/src/attached_probe.cpp
@@ -7,7 +7,9 @@
 #include <unistd.h>

 #include "attached_probe.h"
+#include "bpftrace.h"
 #include "bcc_syms.h"
+#include "bcc_usdt.h"
 #include "common.h"
 #include "libbpf.h"
 #include <linux/perf_event.h>
@@ -15,6 +17,8 @@

 namespace bpftrace {

+const int BPF_LOG_SIZE = 100 * 1024;
+
 bpf_probe_attach_type attachtype(ProbeType t)
 {
  switch (t)
@@ -23,6 +27,7 @@ bpf_probe_attach_type attachtype(ProbeType t)
    case ProbeType::kretprobe: return BPF_PROBE_RETURN; break;
    case ProbeType::uprobe:    return BPF_PROBE_ENTRY;  break;
    case ProbeType::uretprobe: return BPF_PROBE_RETURN; break;
+    case ProbeType::usdt:      return BPF_PROBE_ENTRY; break;
    default: abort();
  }
 }
@@ -35,8 +40,12 @@ bpf_prog_type progtype(ProbeType t)
    case ProbeType::kretprobe:  return BPF_PROG_TYPE_KPROBE; break;
    case ProbeType::uprobe:     return BPF_PROG_TYPE_KPROBE; break;
    case ProbeType::uretprobe:  return BPF_PROG_TYPE_KPROBE; break;
+    case ProbeType::usdt:       return BPF_PROG_TYPE_KPROBE; break;
    case ProbeType::tracepoint: return BPF_PROG_TYPE_TRACEPOINT; break;
    case ProbeType::profile:      return BPF_PROG_TYPE_PERF_EVENT; break;
+    case ProbeType::interval:      return BPF_PROG_TYPE_PERF_EVENT; break;
+    case ProbeType::software:   return BPF_PROG_TYPE_PERF_EVENT; break;
+    case ProbeType::hardware:   return BPF_PROG_TYPE_PERF_EVENT; break;
    default: abort();
  }
 }
@@ -46,6 +55,8 @@ AttachedProbe::AttachedProbe(Probe &probe, std::tuple<uint8_t *, uintptr_t> func
  : probe_(probe), func_(func)
 {
  load_prog();
+  if (bt_verbose)
+    std::cerr << "Attaching " << probe_.name << std::endl;
  switch (probe_.type)
  {
    case ProbeType::kprobe:
@@ -62,6 +73,29 @@ AttachedProbe::AttachedProbe(Probe &probe, std::tuple<uint8_t *, uintptr_t> func
    case ProbeType::profile:
      attach_profile();
      break;
+    case ProbeType::interval:
+      attach_interval();
+      break;
+    case ProbeType::software:
+      attach_software();
+      break;
+    case ProbeType::hardware:
+      attach_hardware();
+      break;
+    default:
+      abort();
+  }
+}
+
+AttachedProbe::AttachedProbe(Probe &probe, std::tuple<uint8_t *, uintptr_t> func, int pid)
+  : probe_(probe), func_(func)
+{
+  load_prog();
+  switch (probe_.type)
+  {
+    case ProbeType::usdt:
+      attach_usdt(pid);
+      break;
    default:
      abort();
  }
@@ -88,12 +122,16 @@ AttachedProbe::~AttachedProbe()
      break;
    case ProbeType::uprobe:
    case ProbeType::uretprobe:
+    case ProbeType::usdt:
      err = bpf_detach_uprobe(eventname().c_str());
      break;
    case ProbeType::tracepoint:
      err = bpf_detach_tracepoint(probe_.path.c_str(), eventname().c_str());
      break;
    case ProbeType::profile:
+    case ProbeType::interval:
+    case ProbeType::software:
+    case ProbeType::hardware:
      break;
    default:
      abort();
@@ -125,6 +163,7 @@ std::string AttachedProbe::eventname() const
      return eventprefix() + probe_.attach_point;
    case ProbeType::uprobe:
    case ProbeType::uretprobe:
+    case ProbeType::usdt:
      offset_str << std::hex << offset();
      return eventprefix() + sanitise(probe_.path) + "_" + offset_str.str();
    case ProbeType::tracepoint:
@@ -143,7 +182,7 @@ uint64_t AttachedProbe::offset() const
 {
  bcc_symbol sym;
  int err = bcc_resolve_symname(probe_.path.c_str(), probe_.attach_point.c_str(),
-      0, 0, nullptr, &sym);
+      probe_.loc, 0, nullptr, &sym);

  if (err)
    throw std::runtime_error("Could not resolve symbol: " + probe_.path + ":" + probe_.attach_point);
@@ -188,20 +227,32 @@ void AttachedProbe::load_prog()
  int prog_len = std::get<1>(func_);
  const char *license = "GPL";
  int log_level = 0;
-  char *log_buf = nullptr;
-  unsigned log_buf_size = 0;
+  char log_buf[BPF_LOG_SIZE];
+  char name[STRING_SIZE], *namep;
+  unsigned log_buf_size = sizeof (log_buf);

  // Redirect stderr, so we don't get error messages from BCC
  int old_stderr, new_stderr;
  fflush(stderr);
-  old_stderr = dup(2);
-  new_stderr = open("/dev/null", O_WRONLY);
-  dup2(new_stderr, 2);
-  close(new_stderr);
+  if (bt_debug)
+    log_level = 15;
+  else
+  {
+    old_stderr = dup(2);
+    new_stderr = open("/dev/null", O_WRONLY);
+    dup2(new_stderr, 2);
+    close(new_stderr);
+  }
+
+  // bpf_prog_load rejects colons in the probe name
+  strncpy(name, probe_.name.c_str(), STRING_SIZE);
+  namep = name;
+  if (strrchr(name, ':') != NULL)
+    namep = strrchr(name, ':') + 1;

  for (int attempt=0; attempt<3; attempt++)
  {
-    progfd_ = bpf_prog_load(progtype(probe_.type), probe_.name.c_str(),
+    progfd_ = bpf_prog_load(progtype(probe_.type), namep,
        reinterpret_cast<struct bpf_insn*>(insns), prog_len, license,
        kernel_version(attempt), log_level, log_buf, log_buf_size);
    if (progfd_ >= 0)
@@ -209,12 +260,18 @@ void AttachedProbe::load_prog()
  }

  // Restore stderr
-  fflush(stderr);
-  dup2(old_stderr, 2);
-  close(old_stderr);
+  if (bt_debug == false)
+  {
+    fflush(stderr);
+    dup2(old_stderr, 2);
+    close(old_stderr);
+  }

-  if (progfd_ < 0)
-    throw std::runtime_error("Error loading program: " + probe_.name);
+  if (progfd_ < 0) {
+    if (bt_verbose)
+      std::cerr << std::endl << "Error log: " << std::endl << log_buf << std::endl;
+    throw std::runtime_error("Error loading program: " + probe_.name + (bt_verbose ? "" : " (try -v)"));
+  }
 }

 void AttachedProbe::attach_kprobe()
@@ -222,8 +279,16 @@ void AttachedProbe::attach_kprobe()
  int perf_event_fd = bpf_attach_kprobe(progfd_, attachtype(probe_.type),
      eventname().c_str(), probe_.attach_point.c_str(), 0);

-  if (perf_event_fd < 0)
-    throw std::runtime_error("Error attaching probe: '" + probe_.name + "'");
+  if (perf_event_fd < 0) {
+    if (probe_.orig_name != probe_.name) {
+      // a wildcard expansion couldn't probe something, just print a warning
+      // as this is normal for some kernel functions (eg, do_debug())
+      std::cerr << "Warning: could not attach probe " << probe_.name << ", skipping." << std::endl;
+    } else {
+      // an explicit match failed, so fail as the user must have wanted it
+      throw std::runtime_error("Error attaching probe: '" + probe_.name + "'");
+    }
+  }

  perf_event_fds_.push_back(perf_event_fd);
 }
@@ -241,6 +306,57 @@ void AttachedProbe::attach_uprobe()
  perf_event_fds_.push_back(perf_event_fd);
 }

+void AttachedProbe::attach_usdt(int pid)
+{
+  struct bcc_usdt_location loc = {};
+  int err, i;
+  std::ostringstream offset_str;
+  void *ctx;
+
+  if (pid)
+  {
+    ctx = bcc_usdt_new_frompid(pid, probe_.path.c_str());
+    if (!ctx)
+      throw std::runtime_error("Error initializing context for probe: " + probe_.name + ", for PID: " + std::to_string(pid));
+  }
+  else
+  {
+    ctx = bcc_usdt_new_frompath(probe_.path.c_str());
+    if (!ctx)
+      throw std::runtime_error("Error initializing context for probe: " + probe_.name);
+  }
+
+  // TODO: fn_name may need a unique suffix for each attachment on the same probe:
+  std::string fn_name = "probe_" + probe_.attach_point + "_1";
+  err = bcc_usdt_enable_probe(ctx, probe_.attach_point.c_str(), fn_name.c_str());
+  if (err)
+    throw std::runtime_error("Error finding or enabling probe: " + probe_.name);
+
+  std::string provider_name;
+  if ((i = probe_.path.rfind("/")) != std::string::npos)
+     provider_name = probe_.path.substr(i + 1);
+  else
+     provider_name = probe_.path;
+
+  err = bcc_usdt_get_location(ctx, provider_name.c_str(), probe_.attach_point.c_str(), 0, &loc);
+  if (err)
+    throw std::runtime_error("Error finding location for probe: " + probe_.name);
+  probe_.loc = loc.address;
+
+  int perf_event_fd = bpf_attach_uprobe(progfd_, attachtype(probe_.type),
+      eventname().c_str(), probe_.path.c_str(), loc.address - 0x400000, pid == 0 ? -1 : pid);
+
+  if (perf_event_fd < 0)
+  {
+    if (pid)
+      throw std::runtime_error("Error attaching probe: " + probe_.name + ", to PID: " + std::to_string(pid));
+    else
+      throw std::runtime_error("Error attaching probe: " + probe_.name);
+  }
+
+  perf_event_fds_.push_back(perf_event_fd);
+}
+
 void AttachedProbe::attach_tracepoint()
 {
  int perf_event_fd = bpf_attach_tracepoint(progfd_, probe_.path.c_str(),
@@ -296,4 +412,184 @@ void AttachedProbe::attach_profile()
  }
 }

+void AttachedProbe::attach_interval()
+{
+  int pid = -1;
+  int group_fd = -1;
+  int cpu = 0;
+
+  uint64_t period, freq;
+  if (probe_.path == "s")
+  {
+    period = probe_.freq * 1e9;
+    freq = 0;
+  }
+  else if (probe_.path == "ms")
+  {
+    period = probe_.freq * 1e6;
+    freq = 0;
+  }
+  else
+  {
+    abort();
+  }
+
+  int perf_event_fd = bpf_attach_perf_event(progfd_, PERF_TYPE_SOFTWARE,
+      PERF_COUNT_SW_CPU_CLOCK, period, freq, pid, cpu, group_fd);
+
+  if (perf_event_fd < 0)
+    throw std::runtime_error("Error attaching probe: " + probe_.name);
+
+  perf_event_fds_.push_back(perf_event_fd);
+}
+
+void AttachedProbe::attach_software()
+{
+  int pid = -1;
+  int group_fd = -1;
+
+  uint64_t period = probe_.freq;
+  uint64_t defaultp = 1;
+  uint32_t type;
+
+  // from linux/perf_event.h, with aliases from perf:
+  if (probe_.path == "cpu-clock" || probe_.path == "cpu")
+  {
+    type = PERF_COUNT_SW_CPU_CLOCK;
+    defaultp = 1000000;
+  }
+  else if (probe_.path == "task-clock")
+  {
+    type = PERF_COUNT_SW_TASK_CLOCK;
+  }
+  else if (probe_.path == "page-faults" || probe_.path == "faults")
+  {
+    type = PERF_COUNT_SW_PAGE_FAULTS;
+    defaultp = 100;
+  }
+  else if (probe_.path == "context-switches" || probe_.path == "cs")
+  {
+    type = PERF_COUNT_SW_CONTEXT_SWITCHES;
+    defaultp = 1000;
+  }
+  else if (probe_.path == "cpu-migrations")
+  {
+    type = PERF_COUNT_SW_CPU_MIGRATIONS;
+  }
+  else if (probe_.path == "minor-faults")
+  {
+    type = PERF_COUNT_SW_PAGE_FAULTS_MIN;
+    defaultp = 100;
+  }
+  else if (probe_.path == "major-faults")
+  {
+    type = PERF_COUNT_SW_PAGE_FAULTS_MAJ;
+  }
+  else if (probe_.path == "alignment-faults")
+  {
+    type = PERF_COUNT_SW_ALIGNMENT_FAULTS;
+  }
+  else if (probe_.path == "emulation-faults")
+  {
+    type = PERF_COUNT_SW_EMULATION_FAULTS;
+  }
+  else if (probe_.path == "dummy")
+  {
+    type = PERF_COUNT_SW_DUMMY;
+  }
+  else if (probe_.path == "bpf-output")
+  {
+    type = PERF_COUNT_SW_BPF_OUTPUT;
+  }
+  else
+  {
+    abort();
+  }
+
+  if (period == 0)
+    period = defaultp;
+
+  std::vector<int> cpus = ebpf::get_online_cpus();
+  for (int cpu : cpus)
+  {
+    int perf_event_fd = bpf_attach_perf_event(progfd_, PERF_TYPE_SOFTWARE,
+        type, period, 0, pid, cpu, group_fd);
+
+    if (perf_event_fd < 0)
+      throw std::runtime_error("Error attaching probe: " + probe_.name);
+
+    perf_event_fds_.push_back(perf_event_fd);
+  }
+}
+
+void AttachedProbe::attach_hardware()
+{
+  int pid = -1;
+  int group_fd = -1;
+
+  uint64_t period = probe_.freq;
+  uint64_t defaultp = 1000000;
+  uint32_t type;
+
+  // from linux/perf_event.h, with aliases from perf:
+  if (probe_.path == "cpu-cycles" || probe_.path == "cycles")
+  {
+    type = PERF_COUNT_HW_CPU_CYCLES;
+  }
+  else if (probe_.path == "instructions")
+  {
+    type = PERF_COUNT_HW_INSTRUCTIONS;
+  }
+  else if (probe_.path == "cache-references")
+  {
+    type = PERF_COUNT_HW_CACHE_REFERENCES;
+  }
+  else if (probe_.path == "cache-misses")
+  {
+    type = PERF_COUNT_HW_CACHE_MISSES;
+  }
+  else if (probe_.path == "branch-instructions" || probe_.path == "branches")
+  {
+    type = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+    defaultp = 100000;
+  }
+  else if (probe_.path == "bus-cycles")
+  {
+    type = PERF_COUNT_HW_BUS_CYCLES;
+    defaultp = 100000;
+  }
+  else if (probe_.path == "frontend-stalls")
+  {
+    type = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND;
+  }
+  else if (probe_.path == "backend-stalls")
+  {
+    type = PERF_COUNT_HW_STALLED_CYCLES_BACKEND;
+  }
+  else if (probe_.path == "ref-cycles")
+  {
+    type = PERF_COUNT_HW_REF_CPU_CYCLES;
+  }
+  // can add PERF_COUNT_HW_CACHE_... here
+  else
+  {
+    abort();
+  }
+
+  if (period == 0)
+    period = defaultp;
+
+  std::vector<int> cpus = ebpf::get_online_cpus();
+  for (int cpu : cpus)
+  {
+    int perf_event_fd = bpf_attach_perf_event(progfd_, PERF_TYPE_HARDWARE,
+        type, period, 0, pid, cpu, group_fd);
+
+    if (perf_event_fd < 0)
+      throw std::runtime_error("Error attaching probe: " + probe_.name);
+
+    perf_event_fds_.push_back(perf_event_fd);
+  }
+}
+
 } // namespace bpftrace
--- a/src/attached_probe.h
+++ b/src/attached_probe.h
@@ -13,7 +13,8 @@ class AttachedProbe
 {
 public:
  AttachedProbe(Probe &probe, std::tuple<uint8_t *, uintptr_t> func);
-  virtual ~AttachedProbe();
+  AttachedProbe(Probe &probe, std::tuple<uint8_t *, uintptr_t> func, int pid);
+  ~AttachedProbe();
  AttachedProbe(const AttachedProbe &) = delete;
  AttachedProbe& operator=(const AttachedProbe &) = delete;

@@ -25,8 +26,12 @@ private:
  void load_prog();
  void attach_kprobe();
  void attach_uprobe();
+  void attach_usdt(int pid);
  void attach_tracepoint();
  void attach_profile();
+  void attach_interval();
+  void attach_software();
+  void attach_hardware();

  Probe &probe_;
  std::tuple<uint8_t *, uintptr_t> func_;

--- a/src/bpftrace.cpp
+++ b/src/bpftrace.cpp
@@ -5,6 +5,7 @@
 #include <regex>
 #include <sstream>
 #include <sys/epoll.h>
+#include <time.h>

 #include "bcc_syms.h"
 #include "perf_reader.h"
@@ -16,6 +17,9 @@

 namespace bpftrace {

+bool bt_debug = false;
+bool bt_verbose = false;
+
 int BPFtrace::add_probe(ast::Probe &p)
 {
  for (auto attach_point : *p.attach_points)
@@ -26,8 +30,9 @@ int BPFtrace::add_probe(ast::Probe &p)
      probe.path = "/proc/self/exe";
      probe.attach_point = "BEGIN_trigger";
      probe.type = probetype(attach_point->provider);
-      probe.prog_name = p.name();
+      probe.orig_name = p.name();
      probe.name = p.name();
+      probe.loc = 0;
      special_probes_.push_back(probe);
      continue;
    }
@@ -37,8 +42,9 @@ int BPFtrace::add_probe(ast::Probe &p)
      probe.path = "/proc/self/exe";
      probe.attach_point = "END_trigger";
      probe.type = probetype(attach_point->provider);
-      probe.prog_name = p.name();
+      probe.orig_name = p.name();
      probe.name = p.name();
+      probe.loc = 0;
      special_probes_.push_back(probe);
      continue;
    }
@@ -79,9 +85,10 @@ int BPFtrace::add_probe(ast::Probe &p)
      probe.path = attach_point->target;
      probe.attach_point = func;
      probe.type = probetype(attach_point->provider);
-      probe.prog_name = p.name();
+      probe.orig_name = p.name();
      probe.name = attach_point->name(func);
      probe.freq = attach_point->freq;
+      probe.loc = 0;
      probes_.push_back(probe);
    }
  }
@@ -113,7 +120,9 @@ std::set<std::string> BPFtrace::find_wildcard_matches(const std::string &prefix,
    if (std::regex_search(line, match, func_regex))
    {
      assert(match.size() == 2);
-      matches.insert(match[1]);
+      // skip the ".part.N" kprobe variants, as they can't be traced:
+      if (std::strstr(match.str(1).c_str(), ".part.") == NULL)
+        matches.insert(match[1]);
    }
  }
  return matches;
@@ -129,11 +138,75 @@ void perf_event_printer(void *cb_cookie, void *data, int size)
  auto bpftrace = static_cast<BPFtrace*>(cb_cookie);
  auto printf_id = *static_cast<uint64_t*>(data);
  auto arg_data = static_cast<uint8_t*>(data);
+  int err;
+
+  // async actions
+  if (printf_id == asyncactionint(AsyncAction::exit))
+  {
+    err = bpftrace->print_maps();
+    exit(err);
+  }
+  else if (printf_id == asyncactionint(AsyncAction::print))
+  {
+    std::string arg = (const char *)(static_cast<uint8_t*>(data) + sizeof(uint64_t) + 2 * sizeof(uint64_t));
+    uint64_t top = (uint64_t)*(static_cast<uint64_t*>(data) + sizeof(uint64_t) / sizeof(uint64_t));
+    uint64_t div = (uint64_t)*(static_cast<uint64_t*>(data) + (sizeof(uint64_t) + sizeof(uint64_t)) / sizeof(uint64_t));
+    bpftrace->print_map_ident(arg, top, div);
+    return;
+  }
+  else if (printf_id == asyncactionint(AsyncAction::clear))
+  {
+    std::string arg = (const char *)(arg_data+sizeof(uint64_t));
+    bpftrace->clear_map_ident(arg);
+    return;
+  }
+  else if (printf_id == asyncactionint(AsyncAction::zero))
+  {
+    std::string arg = (const char *)(arg_data+sizeof(uint64_t));
+    bpftrace->zero_map_ident(arg);
+    return;
+  }
+  else if (printf_id == asyncactionint(AsyncAction::time))
+  {
+    char timestr[STRING_SIZE];
+    time_t t;
+    struct tm *tmp;
+    t = time(NULL);
+    tmp = localtime(&t);
+    if (tmp == NULL) {
+      perror("localtime");
+      return;
+    }
+    uint64_t time_id = (uint64_t)*(static_cast<uint64_t*>(data) + sizeof(uint64_t) / sizeof(uint64_t));
+    auto fmt = bpftrace->time_args_[time_id].c_str();
+    if (strftime(timestr, sizeof(timestr), fmt, tmp) == 0) {
+      fprintf(stderr, "strftime returned 0");
+      return;
+    }
+    printf("%s", timestr);
+    return;
+  }
+  else if (printf_id == asyncactionint(AsyncAction::join))
+  {
+     const char *joinstr = " ";
+     for (int i = 0; i < bpftrace->join_argnum_; i++) {
+       auto *arg = arg_data+sizeof(uint64_t) + i * bpftrace->join_argsize_;
+       if (arg[0] == 0)
+         break;
+       if (i)
+         printf("%s", joinstr);
+       printf("%s", arg);
+     }
+     printf("\n");
+     return;
+  }

+  // printf
  auto fmt = std::get<0>(bpftrace->printf_args_[printf_id]).c_str();
  auto args = std::get<1>(bpftrace->printf_args_[printf_id]);
  std::vector<uint64_t> arg_values;
  std::vector<std::unique_ptr<char>> resolved_symbols;
+  char *name;
  for (auto arg : args)
  {
    switch (arg.type.type)
@@ -167,9 +240,13 @@ void perf_event_printer(void *cb_cookie, void *data, int size)
        break;
      case Type::usym:
        resolved_symbols.emplace_back(strdup(
-              bpftrace->resolve_usym(*(uint64_t*)(arg_data+arg.offset)).c_str()));
+              bpftrace->resolve_usym(*(uint64_t*)(arg_data+arg.offset), *(uint64_t*)(arg_data+arg.offset + 8)).c_str()));
        arg_values.push_back((uint64_t)resolved_symbols.back().get());
        break;
+      case Type::name:
+        name = strdup(bpftrace->resolve_name(*(uint64_t*)(arg_data+arg.offset)).c_str());
+        arg_values.push_back((uint64_t)name);
+        break;
      default:
        abort();
    }
@@ -213,15 +290,27 @@ void perf_event_lost(void *cb_cookie, uint64_t lost)

 std::unique_ptr<AttachedProbe> BPFtrace::attach_probe(Probe &probe, const BpfOrc &bpforc)
 {
-  auto func = bpforc.sections_.find("s_" + probe.prog_name);
+  // use the single-probe program if it exists (as is the case with wildcards
+  // and the name builtin, which must be expanded into separate programs per
+  // probe), else try to find a the program based on the original probe name
+  // that includes wildcards.
+  auto func = bpforc.sections_.find("s_" + probe.name);
+  if (func == bpforc.sections_.end())
+    func = bpforc.sections_.find("s_" + probe.orig_name);
  if (func == bpforc.sections_.end())
  {
-    std::cerr << "Code not generated for probe: " << probe.name << std::endl;
+    if (probe.name != probe.orig_name)
+      std::cerr << "Code not generated for probe: " << probe.name << " from: " << probe.orig_name << std::endl;
+    else
+      std::cerr << "Code not generated for probe: " << probe.name << std::endl;
    return nullptr;
  }
  try
  {
-    return std::make_unique<AttachedProbe>(probe, func->second);
+    if (probe.type == ProbeType::usdt)
+      return std::make_unique<AttachedProbe>(probe, func->second, pid_);
+    else
+      return std::make_unique<AttachedProbe>(probe, func->second);
  }
  catch (std::runtime_error e)
  {
@@ -254,6 +343,9 @@ int BPFtrace::run(std::unique_ptr<BpfOrc> bpforc)
    attached_probes_.push_back(std::move(attached_probe));
  }

+  if (bt_verbose)
+    std::cerr << "Running..." << std::endl;
+
  poll_perf_events(epollfd);
  attached_probes_.clear();

@@ -277,7 +369,7 @@ int BPFtrace::setup_perf_events()
  online_cpus_ = cpus.size();
  for (int cpu : cpus)
  {
-    int page_cnt = 8;
+    int page_cnt = 64;
    void *reader = bpf_open_perf_buffer(&perf_event_printer, &perf_event_lost, this, -1, cpu, page_cnt);
    if (reader == nullptr)
    {
@@ -325,10 +417,12 @@ int BPFtrace::print_maps()
  {
    IMap &map = *mapmap.second.get();
    int err;
-    if (map.type_.type == Type::quantize)
-      err = print_map_quantize(map);
+    if (map.type_.type == Type::hist || map.type_.type == Type::lhist)
+      err = print_map_hist(map, 0, 0);
+    else if (map.type_.type == Type::avg || map.type_.type == Type::stats)
+      err = print_map_stats(map);
    else
-      err = print_map(map);
+      err = print_map(map, 0, 0);

    if (err)
      return err;
@@ -337,7 +431,142 @@ int BPFtrace::print_maps()
  return 0;
 }

-int BPFtrace::print_map(IMap &map)
+// print a map given an ident string
+int BPFtrace::print_map_ident(const std::string &ident, uint32_t top, uint32_t div)
+{
+  int err = 0;
+  for(auto &mapmap : maps_)
+  {
+    IMap &map = *mapmap.second.get();
+    if (map.name_ == ident) {
+      if (map.type_.type == Type::hist)
+        err = print_map_hist(map, top, div);
+      else
+        err = print_map(map, top, div);
+      return err;
+    }
+  }
+
+  return -2;
+}
+
+// clear a map (delete all keys) given an ident string
+int BPFtrace::clear_map_ident(const std::string &ident)
+{
+  int err = 0;
+  for(auto &mapmap : maps_)
+  {
+    IMap &map = *mapmap.second.get();
+    if (map.name_ == ident) {
+        err = clear_map(map);
+      return err;
+    }
+  }
+
+  return -2;
+}
+
+// zero a map (set all keys to zero) given an ident string
+int BPFtrace::zero_map_ident(const std::string &ident)
+{
+  int err = 0;
+  for(auto &mapmap : maps_)
+  {
+    IMap &map = *mapmap.second.get();
+    if (map.name_ == ident) {
+        err = zero_map(map);
+      return err;
+    }
+  }
+
+  return -2;
+}
+
+// clear a map
+int BPFtrace::clear_map(IMap &map)
+{
+  std::vector<uint8_t> old_key;
+  try
+  {
+    if (map.type_.type == Type::hist)
+      // hist maps have 8 extra bytes for the bucket number
+      old_key = find_empty_key(map, map.key_.size() + 8);
+    else
+      old_key = find_empty_key(map, map.key_.size());
+  }
+  catch (std::runtime_error &e)
+  {
+    std::cerr << "Error getting key for map '" << map.name_ << "': "
+              << e.what() << std::endl;
+    return -2;
+  }
+  auto key(old_key);
+
+  // snapshot keys, then operate on them
+  std::vector<std::vector<uint8_t>> keys;
+  while (bpf_get_next_key(map.mapfd_, old_key.data(), key.data()) == 0)
+  {
+    keys.push_back(key);
+    old_key = key;
+  }
+
+  for (auto &key : keys)
+  {
+    int err = bpf_delete_elem(map.mapfd_, key.data());
+    if (err)
+    {
+      std::cerr << "Error looking up elem: " << err << std::endl;
+      return -1;
+    }
+  }
+
+  return 0;
+}
+
+// zero a map
+int BPFtrace::zero_map(IMap &map)
+{
+  std::vector<uint8_t> old_key;
+  try
+  {
+    if (map.type_.type == Type::hist)
+      // hist maps have 8 extra bytes for the bucket number
+      old_key = find_empty_key(map, map.key_.size() + 8);
+    else
+      old_key = find_empty_key(map, map.key_.size());
+  }
+  catch (std::runtime_error &e)
+  {
+    std::cerr << "Error getting key for map '" << map.name_ << "': "
+              << e.what() << std::endl;
+    return -2;
+  }
+  auto key(old_key);
+
+  // snapshot keys, then operate on them
+  std::vector<std::vector<uint8_t>> keys;
+  while (bpf_get_next_key(map.mapfd_, old_key.data(), key.data()) == 0)
+  {
+    keys.push_back(key);
+    old_key = key;
+  }
+
+  uint64_t zero = 0;
+  for (auto &key : keys)
+  {
+    int err = bpf_update_elem(map.mapfd_, key.data(), &zero, BPF_EXIST);
+
+    if (err)
+    {
+      std::cerr << "Error looking up elem: " << err << std::endl;
+      return -1;
+    }
+  }
+
+  return 0;
+}
+
+int BPFtrace::print_map(IMap &map, uint32_t top, uint32_t div)
 {
  std::vector<uint8_t> old_key;
  try
@@ -357,7 +586,8 @@ int BPFtrace::print_map(IMap &map)
  while (bpf_get_next_key(map.mapfd_, old_key.data(), key.data()) == 0)
  {
    int value_size = map.type_.size;
-    if (map.type_.type == Type::count)
+    if (map.type_.type == Type::count ||
+        map.type_.type == Type::sum || map.type_.type == Type::min || map.type_.type == Type::max)
      value_size *= ncpus_;
    auto value = std::vector<uint8_t>(value_size);
    int err = bpf_lookup_elem(map.mapfd_, key.data(), value.data());
@@ -372,39 +602,68 @@ int BPFtrace::print_map(IMap &map)
    old_key = key;
  }

-  if (map.type_.type == Type::count)
+  if (map.type_.type == Type::count || map.type_.type == Type::sum)
  {
    std::sort(values_by_key.begin(), values_by_key.end(), [&](auto &a, auto &b)
    {
      return reduce_value(a.second, ncpus_) < reduce_value(b.second, ncpus_);
    });
  }
+  else if (map.type_.type == Type::min)
+  {
+    std::sort(values_by_key.begin(), values_by_key.end(), [&](auto &a, auto &b)
+    {
+      return min_value(a.second, ncpus_) < min_value(b.second, ncpus_);
+    });
+  }
+  else if (map.type_.type == Type::max)
+  {
+    std::sort(values_by_key.begin(), values_by_key.end(), [&](auto &a, auto &b)
+    {
+      return max_value(a.second, ncpus_) < max_value(b.second, ncpus_);
+    });
+  }
  else
  {
    sort_by_key(map.key_.args_, values_by_key);
  };

+  if (div == 0)
+    div = 1;
+  uint32_t i = 0;
  for (auto &pair : values_by_key)
  {
    auto key = pair.first;
    auto value = pair.second;

+    if (top)
+    {
+      if (i++ < (values_by_key.size() - top))
+        continue;
+    }
+
    std::cout << map.name_ << map.key_.argument_value_list(*this, key) << ": ";

    if (map.type_.type == Type::stack)
-      std::cout << get_stack(*(uint32_t*)value.data(), false, 8);
+      std::cout << get_stack(*(uint64_t*)value.data(), false, 8);
    else if (map.type_.type == Type::ustack)
-      std::cout << get_stack(*(uint32_t*)value.data(), true, 8);
+      std::cout << get_stack(*(uint64_t*)value.data(), true, 8);
    else if (map.type_.type == Type::sym)
      std::cout << resolve_sym(*(uintptr_t*)value.data());
    else if (map.type_.type == Type::usym)
-      std::cout << resolve_usym(*(uintptr_t*)value.data());
+      std::cout << resolve_usym(*(uintptr_t*)value.data(), *(uint64_t*)(value.data() + 8));
    else if (map.type_.type == Type::string)
      std::cout << value.data() << std::endl;
-    else if (map.type_.type == Type::count)
-      std::cout << reduce_value(value, ncpus_) << std::endl;
+    else if (map.type_.type == Type::count || map.type_.type == Type::sum)
+      std::cout << reduce_value(value, ncpus_) / div << std::endl;
+    else if (map.type_.type == Type::min)
+      std::cout << min_value(value, ncpus_) / div << std::endl;
+    else if (map.type_.type == Type::max)
+      std::cout << max_value(value, ncpus_) / div << std::endl;
+    else if (map.type_.type == Type::name)
+      std::cout << resolve_name(*(uint64_t*)value.data()) << std::endl;
    else
-      std::cout << *(int64_t*)value.data() << std::endl;
+      std::cout << *(int64_t*)value.data() / div << std::endl;
  }

  std::cout << std::endl;
@@ -412,11 +671,11 @@ int BPFtrace::print_map(IMap &map)
  return 0;
 }

-int BPFtrace::print_map_quantize(IMap &map)
+int BPFtrace::print_map_hist(IMap &map, uint32_t top, uint32_t div)
 {
-  // A quantize-map adds an extra 8 bytes onto the end of its key for storing
+  // A hist-map adds an extra 8 bytes onto the end of its key for storing
  // the bucket number.
-  // e.g. A map defined as: @x[1, 2] = @quantize(3);
+  // e.g. A map defined as: @x[1, 2] = @hist(3);
  // would actually be stored with the key: [1, 2, 3]

  std::vector<uint8_t> old_key;
@@ -454,7 +713,10 @@ int BPFtrace::print_map_quantize(IMap &map)
    if (values_by_key.find(key_prefix) == values_by_key.end())
    {
      // New key - create a list of buckets for it
-      values_by_key[key_prefix] = std::vector<uint64_t>(65);
+      if (map.type_.type == Type::hist)
+        values_by_key[key_prefix] = std::vector<uint64_t>(65);
+      else
+        values_by_key[key_prefix] = std::vector<uint64_t>(1002);
    }
    values_by_key[key_prefix].at(bucket) = reduce_value(value, ncpus_);

@@ -477,13 +739,26 @@ int BPFtrace::print_map_quantize(IMap &map)
    return a.second < b.second;
  });

+  if (div == 0)
+    div = 1;
+  uint32_t i = 0;
  for (auto &key_count : total_counts_by_key)
  {
    auto &key = key_count.first;
    auto &value = values_by_key[key];
+
+    if (top)
+    {
+      if (i++ < (values_by_key.size() - top))
+        continue;
+    }
+
    std::cout << map.name_ << map.key_.argument_value_list(*this, key) << ": " << std::endl;

-    print_quantize(value);
+    if (map.type_.type == Type::hist)
+      print_hist(value, div);
+    else
+      print_lhist(value, map.lqmin, map.lqmax, map.lqstep);

    std::cout << std::endl;
  }
@@ -491,14 +766,96 @@ int BPFtrace::print_map_quantize(IMap &map)
  return 0;
 }

-int BPFtrace::print_quantize(const std::vector<uint64_t> &values) const
+int BPFtrace::print_map_stats(IMap &map)
+{
+  // A hist-map adds an extra 8 bytes onto the end of its key for storing
+  // the bucket number.
+
+  std::vector<uint8_t> old_key;
+  try
+  {
+    old_key = find_empty_key(map, map.key_.size() + 8);
+  }
+  catch (std::runtime_error &e)
+  {
+    std::cerr << "Error getting key for map '" << map.name_ << "': "
+              << e.what() << std::endl;
+    return -2;
+  }
+  auto key(old_key);
+
+  std::map<std::vector<uint8_t>, std::vector<uint64_t>> values_by_key;
+
+  while (bpf_get_next_key(map.mapfd_, old_key.data(), key.data()) == 0)
+  {
+    auto key_prefix = std::vector<uint8_t>(map.key_.size());
+    int bucket = key.at(map.key_.size());
+
+    for (size_t i=0; i<map.key_.size(); i++)
+      key_prefix.at(i) = key.at(i);
+
+    int value_size = map.type_.size * ncpus_;
+    auto value = std::vector<uint8_t>(value_size);
+    int err = bpf_lookup_elem(map.mapfd_, key.data(), value.data());
+    if (err)
+    {
+      std::cerr << "Error looking up elem: " << err << std::endl;
+      return -1;
+    }
+
+    if (values_by_key.find(key_prefix) == values_by_key.end())
+    {
+      // New key - create a list of buckets for it
+      values_by_key[key_prefix] = std::vector<uint64_t>(2);
+    }
+    values_by_key[key_prefix].at(bucket) = reduce_value(value, ncpus_);
+
+    old_key = key;
+  }
+
+  // Sort based on sum of counts in all buckets
+  std::vector<std::pair<std::vector<uint8_t>, uint64_t>> total_counts_by_key;
+  for (auto &map_elem : values_by_key)
+  {
+    assert(map_elem.second.size() == 2);
+    uint64_t count = map_elem.second.at(0);
+    uint64_t total = map_elem.second.at(1);
+    assert(count != 0);
+    total_counts_by_key.push_back({map_elem.first, total / count});
+  }
+  std::sort(total_counts_by_key.begin(), total_counts_by_key.end(), [&](auto &a, auto &b)
+  {
+    return a.second < b.second;
+  });
+
+  for (auto &key_count : total_counts_by_key)
+  {
+    auto &key = key_count.first;
+    auto &value = values_by_key[key];
+    std::cout << map.name_ << map.key_.argument_value_list(*this, key) << ": ";
+
+    uint64_t count = value.at(0);
+    uint64_t total = value.at(1);
+
+    if (map.type_.type == Type::stats)
+      std::cout << "count " << count << ", average " << total / count << ", total " << total << std::endl;
+    else
+      std::cout << total / count << std::endl;
+  }
+
+  std::cout << std::endl;
+
+  return 0;
+}
+
+int BPFtrace::print_hist(const std::vector<uint64_t> &values, uint32_t div) const
 {
  int max_index = -1;
  int max_value = 0;

  for (size_t i = 0; i < values.size(); i++)
  {
-    int v = values.at(i);
+    int v = values.at(i) / div;
    if (v != 0)
      max_index = i;
    if (v > max_value)
@@ -517,12 +874,59 @@ int BPFtrace::print_quantize(const std::vector<uint64_t> &values) const
    }
    else
    {
-      header << "[" << quantize_index_label(i);
-      header << ", " << quantize_index_label(i+1) << ")";
+      header << "[" << hist_index_label(i);
+      header << ", " << hist_index_label(i+1) << ")";
    }

+    int max_width = 52;
+    int bar_width = values.at(i)/((float)max_value*max_width * div);
+    std::string bar(bar_width, '@');
+
+    std::cout << std::setw(16) << std::left << header.str()
+              << std::setw(8) << std::right << (values.at(i) / div)
+              << " |" << std::setw(max_width) << std::left << bar << "|"
+              << std::endl;
+  }
+
+  return 0;
+}
+
+int BPFtrace::print_lhist(const std::vector<uint64_t> &values, int min, int max, int step) const
+{
+  int max_index = -1;
+  int max_value = 0;
+  int buckets = (max - min) / step;	// excluding lt and gt buckets
+
+  for (size_t i = 0; i < values.size(); i++)
+  {
+    int v = values.at(i);
+    if (v != 0)
+      max_index = i;
+    if (v > max_value)
+      max_value = v;
+  }
+
+  if (max_index == -1)
+    return 0;
+
+  std::ostringstream lt;
+  lt << "(...," << min << "]";
+  std::ostringstream gt;
+
+  for (int i = 0; i <= buckets + 1; i++)
+  {
    int max_width = 52;
    int bar_width = values.at(i)/(float)max_value*max_width;
+    std::ostringstream header;
+    if (i == 0) {
+      header << "(...," << min << "]";
+    } else if (i == (buckets + 1)) {
+      header << "[" << max << ",...)";
+    } else {
+      header << "[" << (i - 1) * step + min;
+      header << ", " << i * step + min << ")";
+    }
+
    std::string bar(bar_width, '@');

    std::cout << std::setw(16) << std::left << header.str()
@@ -534,7 +938,7 @@ int BPFtrace::print_quantize(const std::vector<uint64_t> &values) const
  return 0;
 }

-std::string BPFtrace::quantize_index_label(int power)
+std::string BPFtrace::hist_index_label(int power)
 {
  char suffix = '\0';
  if (power >= 40)
@@ -575,12 +979,39 @@ uint64_t BPFtrace::reduce_value(const std::vector<uint8_t> &value, int ncpus)
  return sum;
 }

+uint64_t BPFtrace::max_value(const std::vector<uint8_t> &value, int ncpus)
+{
+  uint64_t val, max = 0;
+  for (int i=0; i<ncpus; i++)
+  {
+    val = *(uint64_t*)(value.data() + i*sizeof(uint64_t*));
+    if (val > max)
+      max = val;
+  }
+  return max;
+}
+
+uint64_t BPFtrace::min_value(const std::vector<uint8_t> &value, int ncpus)
+{
+  uint64_t val, max = 0;
+  for (int i=0; i<ncpus; i++)
+  {
+    val = *(uint64_t*)(value.data() + i*sizeof(uint64_t*));
+    if (val > max)
+      max = val;
+  }
+  return (0xffffffff - max);
+}
+
 std::vector<uint8_t> BPFtrace::find_empty_key(IMap &map, size_t size) const
 {
  if (size == 0) size = 8;
  auto key = std::vector<uint8_t>(size);
  int value_size = map.type_.size;
-  if (map.type_.type == Type::count || map.type_.type == Type::quantize)
+  if (map.type_.type == Type::count || map.type_.type == Type::hist ||
+      map.type_.type == Type::sum || map.type_.type == Type::min ||
+      map.type_.type == Type::max || map.type_.type == Type::avg ||
+      map.type_.type == Type::stats)
    value_size *= ncpus_;
  auto value = std::vector<uint8_t>(value_size);

@@ -598,13 +1029,15 @@ std::vector<uint8_t> BPFtrace::find_empty_key(IMap &map, size_t size) const
  throw std::runtime_error("Could not find empty key");
 }

-std::string BPFtrace::get_stack(uint32_t stackid, bool ustack, int indent)
+std::string BPFtrace::get_stack(uint64_t stackidpid, bool ustack, int indent)
 {
+  uint32_t stackid = stackidpid & 0xffffffff;
+  int pid = stackidpid >> 32;
  auto stack_trace = std::vector<uint64_t>(MAX_STACK_SIZE);
  int err = bpf_lookup_elem(stackid_map_->mapfd_, &stackid, stack_trace.data());
  if (err)
  {
-    std::cerr << "Error looking up stack id " << stackid << ": " << err << std::endl;
+    std::cerr << "Error looking up stack id " << stackid << " (pid " << pid << "): " << err << std::endl;
    return "";
  }

@@ -619,7 +1052,7 @@ std::string BPFtrace::get_stack(uint32_t stackid, bool ustack, int indent)
    if (!ustack)
      stack << padding << resolve_sym(addr, true) << std::endl;
    else
-      stack << padding << resolve_usym(addr) << std::endl;
+      stack << padding << resolve_usym(addr, pid, true) << std::endl;
  }

  return stack.str();
@@ -644,14 +1077,51 @@ std::string BPFtrace::resolve_sym(uintptr_t addr, bool show_offset)
  return symbol.str();
 }

-std::string BPFtrace::resolve_usym(uintptr_t addr) const
+std::string BPFtrace::resolve_usym(uintptr_t addr, int pid, bool show_offset)
 {
-  // TODO
+  struct bcc_symbol sym;
  std::ostringstream symbol;
-  symbol << (void*)addr;
+  struct bcc_symbol_option symopts;
+  void *psyms;
+
+  // TODO: deal with these:
+  symopts = {.use_debug_file = false,
+	     .check_debug_file_crc = false,
+	     .use_symbol_type = BCC_SYM_ALL_TYPES};
+
+  if (pid_sym_.find(pid) == pid_sym_.end())
+  {
+    // not cached, create new ProcSyms cache
+    psyms = bcc_symcache_new(pid, &symopts);
+    pid_sym_[pid] = psyms;
+  }
+  else
+  {
+    psyms = pid_sym_[pid];
+  }
+
+  if (((ProcSyms *)psyms)->resolve_addr(addr, &sym))
+  {
+    symbol << sym.name;
+    if (show_offset)
+      symbol << "+" << sym.offset;
+  }
+  else
+  {
+    symbol << (void*)addr;
+  }
+
+  // TODO: deal with process exit and clearing its psyms entry
+
  return symbol.str();
 }

+std::string BPFtrace::resolve_name(uint64_t name_id)
+{
+  assert(name_id < name_ids_.size());
+  return name_ids_[name_id];
+}
+
 void BPFtrace::sort_by_key(std::vector<SizedType> key_args,
    std::vector<std::pair<std::vector<uint8_t>, std::vector<uint8_t>>> &values_by_key)
 {

--- a/src/bpftrace.h
+++ b/src/bpftrace.h
@@ -18,6 +18,10 @@ namespace bpftrace {

 class BpfOrc;

+// globals
+extern bool bt_debug;
+extern bool bt_verbose;
+
 class BPFtrace
 {
 public:
@@ -27,21 +31,31 @@ public:
  int num_probes() const;
  int run(std::unique_ptr<BpfOrc> bpforc);
  int print_maps();
-  std::string get_stack(uint32_t stackid, bool ustack, int indent=0);
+  int print_map_ident(const std::string &ident, uint32_t top, uint32_t div);
+  int clear_map_ident(const std::string &ident);
+  int zero_map_ident(const std::string &ident);
+  std::string get_stack(uint64_t stackidpid, bool ustack, int indent=0);
  std::string resolve_sym(uintptr_t addr, bool show_offset=false);
-  std::string resolve_usym(uintptr_t addr) const;
+  std::string resolve_usym(uintptr_t addr, int pid, bool show_offset=false);
+  std::string resolve_name(uint64_t name_id);
+  int pid_;

  std::map<std::string, std::unique_ptr<IMap>> maps_;
  std::map<std::string, Struct> structs_;
  std::vector<std::tuple<std::string, std::vector<Field>>> printf_args_;
+  std::vector<std::string> time_args_;
  std::unique_ptr<IMap> stackid_map_;
+  std::unique_ptr<IMap> join_map_;
  std::unique_ptr<IMap> perf_event_map_;
+  std::vector<std::string> name_ids_;
+  int join_argnum_;
+  int join_argsize_;

  static void sort_by_key(std::vector<SizedType> key_args,
      std::vector<std::pair<std::vector<uint8_t>, std::vector<uint8_t>>> &values_by_key);
+  virtual std::set<std::string> find_wildcard_matches(const std::string &prefix, const std::string &attach_point, const std::string &file_name);

 protected:
-  virtual std::set<std::string> find_wildcard_matches(const std::string &prefix, const std::string &attach_point, const std::string &file_name);
  std::vector<Probe> probes_;
  std::vector<Probe> special_probes_;

@@ -49,17 +63,25 @@ private:
  std::vector<std::unique_ptr<AttachedProbe>> attached_probes_;
  std::vector<std::unique_ptr<AttachedProbe>> special_attached_probes_;
  KSyms ksyms_;
+  std::map<int, void *> pid_sym_;
  int ncpus_;
  int online_cpus_;

  std::unique_ptr<AttachedProbe> attach_probe(Probe &probe, const BpfOrc &bpforc);
  int setup_perf_events();
  void poll_perf_events(int epollfd, int timeout=-1);
-  int print_map(IMap &map);
-  int print_map_quantize(IMap &map);
-  int print_quantize(const std::vector<uint64_t> &values) const;
+  int clear_map(IMap &map);
+  int zero_map(IMap &map);
+  int print_map(IMap &map, uint32_t top, uint32_t div);
+  int print_map_hist(IMap &map, uint32_t top, uint32_t div);
+  int print_map_lhist(IMap &map);
+  int print_map_stats(IMap &map);
+  int print_hist(const std::vector<uint64_t> &values, uint32_t div) const;
+  int print_lhist(const std::vector<uint64_t> &values, int min, int max, int step) const;
  static uint64_t reduce_value(const std::vector<uint8_t> &value, int ncpus);
-  static std::string quantize_index_label(int power);
+  static uint64_t min_value(const std::vector<uint8_t> &value, int ncpus);
+  static uint64_t max_value(const std::vector<uint8_t> &value, int ncpus);
+  static std::string hist_index_label(int power);
  std::vector<uint8_t> find_empty_key(IMap &map, size_t size) const;
 };


--- a/src/imap.h
+++ b/src/imap.h
@@ -20,6 +20,11 @@ public:
  std::string name_;
  SizedType type_;
  MapKey key_;
+
+  // used by lhist(). TODO: move to separate Map object.
+  int lqmin;
+  int lqmax;
+  int lqstep;
 };

 } // namespace bpftrace
--- a/src/lexer.l
+++ b/src/lexer.l
@@ -28,6 +28,7 @@ space  {hspace}|{vspace}
 path   :(\\.|[_\-\./a-zA-Z0-9])*:
 %x STR
 %x STRUCT
+%x COMMENT

 %%

@@ -37,9 +38,15 @@ path   :(\\.|[_\-\./a-zA-Z0-9])*:

 {hspace}+               { loc.step(); }
 {vspace}+               { loc.lines(yyleng); loc.step(); }
-"//".*$  // Comments

-pid|tid|uid|gid|nsecs|cpu|comm|stack|ustack|arg[0-9]|retval|func {
+"//".*$                 // single-line comments
+"/*"                    BEGIN(COMMENT);   // multi-line comments
+<COMMENT>"/*"           driver.error(loc, std::string("nested comments unsupported"));
+<COMMENT>"*/"           BEGIN(INITIAL);
+<COMMENT>"EOF"          driver.error(loc, std::string("end of file during comment"));
+<COMMENT>.|"\n"         ;
+
+pid|tid|uid|gid|nsecs|cpu|comm|stack|ustack|arg[0-9]|retval|func|name|curtask|rand {
                          return Parser::make_BUILTIN(yytext, loc); }
 {path}                  { return Parser::make_PATH(yytext, loc); }
 {map}                   { return Parser::make_MAP(yytext, loc); }
@@ -77,6 +84,7 @@ pid|tid|uid|gid|nsecs|cpu|comm|stack|ustack|arg[0-9]|retval|func {
 "."                     { return Parser::make_DOT(loc); }
 "->"                    { return Parser::make_PTR(loc); }
 "#".*                   { return Parser::make_CPREPROC(yytext, loc); }
+"?"                     { return Parser::make_QUES(loc); }

 \"                      { BEGIN(STR); buffer.clear(); }
 <STR>\"                 { BEGIN(INITIAL); return Parser::make_STRING(buffer, loc); }

--- a/src/list.cpp
+++ b/src/list.cpp
+#include <sys/types.h>
+#include <dirent.h>
+#include <fstream>
+#include <iomanip>
+#include <iostream>
+#include <sstream>
+#include <regex>
+#include <vector>
+#include <string>
+
+#include "list.h"
+#include "bpftrace.h"
+
+namespace bpftrace {
+
+const std::string kprobe_path = "/sys/kernel/debug/tracing/available_filter_functions";
+const std::string tp_path = "/sys/kernel/debug/tracing/events";
+
+bool search_probe(const std::string &probe, const std::string search)
+{
+  std::string s = search;
+  char remove[] = "*.?";
+  unsigned int i;
+
+  // TODO: glob searching instead of discarding wildcards
+  for (i = 0; i < strlen(remove); ++i)
+  {
+    s.erase(std::remove(s.begin(), s.end(), remove[i]), s.end());
+  }
+
+  if (probe.find(s) == std::string::npos)
+    return true;
+
+  return false;
+}
+
+void list_dir(const std::string path, std::vector<std::string> &files)
+{
+  // yes, I know about std::filesystem::directory_iterator, but no, it wasn't available
+  DIR *dp;
+  struct dirent *dep;
+  if ((dp = opendir(path.c_str())) == NULL)
+    return;
+
+  while ((dep = readdir(dp)) != NULL)
+    files.push_back(std::string(dep->d_name));
+}
+
+void list_probes(const std::string &search)
+{
+  unsigned int i, j;
+  std::string line, probe;
+
+  // software
+  // TODO: add here
+
+  // hardware
+  // TODO: add here
+
+  // tracepoints
+  std::vector<std::string> cats = std::vector<std::string>();
+  list_dir(tp_path, cats);
+  for (i = 0; i < cats.size(); i++)
+  {
+    if (cats[i] == "." || cats[i] == ".." || cats[i] == "enable" || cats[i] == "filter")
+      continue;
+    std::vector<std::string> events = std::vector<std::string>();
+    list_dir(tp_path + "/" + cats[i], events);
+    for (j = 0; j < events.size(); j++)
+    {
+      if (events[j] == "." || events[j] == ".." || events[j] == "enable" || events[j] == "filter")
+        continue;
+      probe = "tracepoint:" + cats[i] + ":" + events[j];
+      if (search_probe(probe, search))
+        continue;
+      std::cout << probe << std::endl;
+    }
+  }
+
+  // kprobes
+  std::cout << std::endl;
+  std::ifstream file(kprobe_path);
+  if (file.fail())
+  {
+    std::cerr << strerror(errno) << ": " << kprobe_path << std::endl;
+    return;
+  }
+
+  std::set<std::string> matches;
+  size_t loc;
+  while (std::getline(file, line))
+  {
+    loc = line.find_first_of(" ");
+    if (loc == std::string::npos)
+      probe = "kprobe:" + line;
+    else
+      probe = "kprobe:" + line.substr(0, loc);
+
+    if (!search.empty())
+    {
+      if (search_probe(probe, search))
+        continue;
+    }
+
+    std::cout << probe << std::endl;
+  }
+
+}
+
+void list_probes()
+{
+  const std::string search = "";
+  list_probes(search);
+}
+
+} // namespace bpftrace
--- a/src/list.h
+++ b/src/list.h
+#include <sstream>
+
+namespace bpftrace {
+
+void list_probes(const std::string &search);
+void list_probes();
+
+} // namespace bpftrace
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -8,40 +8,85 @@
 #include "driver.h"
 #include "printer.h"
 #include "semantic_analyser.h"
+#include "list.h"

 using namespace bpftrace;

 void usage()
 {
-  std::cerr << "Usage:" << std::endl;
-  std::cerr << "  bpftrace filename" << std::endl;
-  std::cerr << "  bpftrace -e 'script'" << std::endl;
+  std::cerr << "USAGE:" << std::endl;
+  std::cerr << "    bpftrace [options] filename" << std::endl;
+  std::cerr << "    bpftrace [options] -e 'program'" << std::endl << std::endl;
+  std::cerr << "OPTIONS:" << std::endl;
+  std::cerr << "    -l [search]    list probes" << std::endl;
+  std::cerr << "    -e 'program'   execute this program" << std::endl;
+  std::cerr << "    -p PID    PID for enabling USDT probes" << std::endl;
+  std::cerr << "    -v    verbose messages" << std::endl;
+  std::cerr << "    -d    debug info dry run" << std::endl << std::endl;
+  std::cerr << "EXAMPLES:" << std::endl;
+  std::cerr << "bpftrace -l '*sleep*'" << std::endl;
+  std::cerr << "    list probes containing \"sleep\"" << std::endl;
+  std::cerr << "bpftrace -e 'kprobe:do_nanosleep { printf(\"PID %d sleeping...\\n\", pid); }'" << std::endl;
+  std::cerr << "    trace processes calling sleep" << std::endl;
+  std::cerr << "bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'" << std::endl;
+  std::cerr << "    count syscalls by process name" << std::endl;
 }

 int main(int argc, char *argv[])
 {
  int err;
  Driver driver;
+  char *pid_str = NULL;
+  bool listing = false;

-  std::string script;
-  bool debug = false;
+  std::string script, search;
  int c;
-  while ((c = getopt(argc, argv, "de:")) != -1)
+  while ((c = getopt(argc, argv, "de:lp:v")) != -1)
  {
    switch (c)
    {
      case 'd':
-        debug = true;
+        bt_debug = true;
+        break;
+      case 'v':
+        bt_verbose = true;
        break;
      case 'e':
        script = optarg;
        break;
+      case 'p':
+        pid_str = optarg;
+        break;
+      case 'l':
+        listing = true;
+        break;
      default:
        usage();
        return 1;
    }
  }

+  if (bt_verbose && bt_debug)
+  {
+    // TODO: allow both
+    std::cerr << "USAGE: Use either -v or -d." << std::endl;
+    return 1;
+  }
+
+  // Listing probes
+  if (listing)
+  {
+    if (optind == argc-1)
+      list_probes(argv[optind]);
+    else if (optind == argc)
+      list_probes();
+    else
+    {
+      usage();
+    }
+    return 0;
+  }
+
  if (script.empty())
  {
    // There should only be 1 non-option argument (the script file)
@@ -69,10 +114,22 @@ int main(int argc, char *argv[])

  BPFtrace bpftrace;

-  if (debug)
+  // defaults
+  bpftrace.join_argnum_ = 16;
+  bpftrace.join_argsize_ = 1024;
+
+  // PID is currently only used for USDT probes that need enabling. Future work:
+  // - make PID a filter for all probe types: pass to perf_event_open(), etc.
+  // - provide PID in USDT probe specification as a way to override -p.
+  bpftrace.pid_ = 0;
+  if (pid_str)
+    bpftrace.pid_ = atoi(pid_str);
+
+  if (bt_debug)
  {
    ast::Printer p(std::cout);
    driver.root_->accept(p);
+    std::cout << std::endl;
  }

  ClangParser clang;
@@ -83,14 +140,14 @@ int main(int argc, char *argv[])
  if (err)
    return err;

-  err = semantics.create_maps(debug);
+  err = semantics.create_maps(bt_debug);
  if (err)
    return err;

  ast::CodegenLLVM llvm(driver.root_, bpftrace);
-  auto bpforc = llvm.compile(debug);
+  auto bpforc = llvm.compile(bt_debug);

-  if (debug)
+  if (bt_debug)
    return 0;

  // Empty signal handler for cleanly terminating the program

--- a/src/map.cpp
+++ b/src/map.cpp
@@ -9,29 +9,42 @@

 namespace bpftrace {

-Map::Map(const std::string &name, const SizedType &type, const MapKey &key)
+Map::Map(const std::string &name, const SizedType &type, const MapKey &key, int min, int max, int step)
 {
  name_ = name;
  type_ = type;
  key_ = key;
+  // for lhist maps:
+  lqmin = min;
+  lqmax = max;
+  lqstep = step;

  int key_size = key.size();
-  if (type.type == Type::quantize)
+  if (type.type == Type::hist || type.type == Type::lhist ||
+      type.type == Type::avg || type.type == Type::stats)
    key_size += 8;
  if (key_size == 0)
    key_size = 8;

+  int max_entries = 128;
  enum bpf_map_type map_type;
-  if ((type.type == Type::quantize || type.type == Type::count) &&
+  if ((type.type == Type::hist || type.type == Type::lhist || type.type == Type::count ||
+      type.type == Type::sum || type.type == Type::min || type.type == Type::max ||
+      type.type == Type::avg || type.type == Type::stats) &&
      (LINUX_VERSION_CODE >= KERNEL_VERSION(4, 6, 0)))
  {
      map_type = BPF_MAP_TYPE_PERCPU_HASH;
  }
+  else if (type.type == Type::join)
+  {
+    map_type = BPF_MAP_TYPE_PERCPU_ARRAY;
+    max_entries = 1;
+    key_size = 4;
+  }
  else
    map_type = BPF_MAP_TYPE_HASH;

  int value_size = type.size;
-  int max_entries = 128;
  int flags = 0;
  mapfd_ = bpf_create_map(map_type, name.c_str(), key_size, value_size, max_entries, flags);
  if (mapfd_ < 0)

--- a/src/map.h
+++ b/src/map.h
@@ -6,7 +6,9 @@ namespace bpftrace {

 class Map : public IMap {
 public:
-  Map(const std::string &name, const SizedType &type, const MapKey &key);
+  Map(const std::string &name, const SizedType &type, const MapKey &key)
+    : Map(name, type, key, 0, 0, 0) {};
+  Map(const std::string &name, const SizedType &type, const MapKey &key, int min, int max, int step);
  Map(enum bpf_map_type map_type);
  virtual ~Map() override;
 };

--- a/src/mapkey.cpp
+++ b/src/mapkey.cpp
@@ -55,6 +55,7 @@ std::string MapKey::argument_value(BPFtrace &bpftrace,
    const SizedType &arg,
    const void *data)
 {
+  auto arg_data = static_cast<const uint8_t*>(data);
  switch (arg.type)
  {
    case Type::integer:
@@ -66,13 +67,15 @@ std::string MapKey::argument_value(BPFtrace &bpftrace,
          return std::to_string(*(int32_t*)data);
      }
    case Type::stack:
-      return bpftrace.get_stack(*(uint32_t*)data, false);
+      return bpftrace.get_stack(*(uint64_t*)data, false);
    case Type::ustack:
-      return bpftrace.get_stack(*(uint32_t*)data, true);
+      return bpftrace.get_stack(*(uint64_t*)data, true);
    case Type::sym:
      return bpftrace.resolve_sym(*(uint64_t*)data);
    case Type::usym:
-      return bpftrace.resolve_usym(*(uint64_t*)data);
+      return bpftrace.resolve_usym(*(uint64_t*)data, *(uint64_t*)(arg_data + 8));
+    case Type::name:
+      return bpftrace.name_ids_[*(uint64_t*)data];
    case Type::string:
      return std::string((char*)data);
  }

--- a/src/parser.yy
+++ b/src/parser.yy
@@ -44,6 +44,7 @@ void yyerror(bpftrace::Driver &driver, const char *s);
  RBRACKET "]"
  LPAREN   "("
  RPAREN   ")"
+  QUES     "?"
  ENDPRED  "end predicate"
  COMMA    ","
  ASSIGN   "="
@@ -83,6 +84,7 @@ void yyerror(bpftrace::Driver &driver, const char *s);
 %type <ast::ProbeList *> probes
 %type <ast::Probe *> probe
 %type <ast::Predicate *> pred
+%type <ast::Ternary *> ternary
 %type <ast::StatementList *> block stmts
 %type <ast::Statement *> stmt
 %type <ast::Expression *> expr
@@ -96,6 +98,7 @@ void yyerror(bpftrace::Driver &driver, const char *s);
 %type <std::string> ident

 %right ASSIGN
+%left QUES COLON
 %left LOR
 %left LAND
 %left BOR
@@ -148,6 +151,9 @@ pred : DIV expr ENDPRED { $$ = new ast::Predicate($2); }
     |                  { $$ = nullptr; }
     ;

+ternary : expr QUES expr COLON expr { $$ = new ast::Ternary($1, $3, $5); }
+     ;
+
 block : "{" stmts "}"     { $$ = $2; }
      | "{" stmts ";" "}" { $$ = $2; }
      ;
@@ -164,6 +170,7 @@ stmt : expr         { $$ = new ast::ExprStatement($1); }
 expr : INT             { $$ = new ast::Integer($1); }
     | STRING          { $$ = new ast::String($1); }
     | BUILTIN         { $$ = new ast::Builtin($1); }
+     | ternary         { $$ = $1; }
     | map             { $$ = $1; }
     | var             { $$ = $1; }
     | call            { $$ = $1; }

--- a/src/printf.cpp
+++ b/src/printf.cpp
@@ -33,7 +33,7 @@ std::string verify_format_string(const std::string &fmt, std::vector<Field> args
  for (int i=0; i<num_args; i++, token_iter++)
  {
    Type arg_type = args.at(i).type.type;
-    if (arg_type == Type::sym || arg_type == Type::usym)
+    if (arg_type == Type::sym || arg_type == Type::usym || arg_type == Type::name)
      arg_type = Type::string; // Symbols should be printed as strings
    int offset = 1;


--- a/src/types.cpp
+++ b/src/types.cpp
@@ -29,14 +29,21 @@ std::string typestr(Type t)
  {
    case Type::none:     return "none";     break;
    case Type::integer:  return "integer";  break;
-    case Type::quantize: return "quantize"; break;
+    case Type::hist:     return "hist"; break;
+    case Type::lhist:    return "lhist";    break;
    case Type::count:    return "count";    break;
+    case Type::sum:      return "sum";      break;
+    case Type::min:      return "min";      break;
+    case Type::max:      return "max";      break;
+    case Type::avg:      return "avg";      break;
+    case Type::stats:    return "stats";    break;
    case Type::stack:    return "stack";    break;
    case Type::ustack:   return "ustack";   break;
    case Type::string:   return "string";   break;
    case Type::sym:      return "sym";      break;
    case Type::usym:     return "usym";     break;
    case Type::cast:     return "cast";     break;
+    case Type::name:     return "name";     break;
    default: abort();
  }
 }
@@ -51,6 +58,8 @@ ProbeType probetype(const std::string &type)
    return ProbeType::uprobe;
  else if (type == "uretprobe")
    return ProbeType::uretprobe;
+  else if (type == "usdt")
+    return ProbeType::usdt;
  else if (type == "BEGIN")
    return ProbeType::uprobe;
  else if (type == "END")
@@ -59,7 +68,18 @@ ProbeType probetype(const std::string &type)
    return ProbeType::tracepoint;
  else if (type == "profile")
    return ProbeType::profile;
+  else if (type == "interval")
+    return ProbeType::interval;
+  else if (type == "software")
+    return ProbeType::software;
+  else if (type == "hardware")
+    return ProbeType::hardware;
  abort();
 }

+uint64_t asyncactionint(AsyncAction a)
+{
+  return (uint64_t)a;
+}
+
 } // namespace bpftrace
--- a/src/types.h
+++ b/src/types.h
@@ -15,14 +15,22 @@ enum class Type
 {
  none,
  integer,
-  quantize,
+  hist,
+  lhist,
  count,
+  sum,
+  min,
+  max,
+  avg,
+  stats,
  stack,
  ustack,
  string,
  sym,
  usym,
  cast,
+  join,
+  name,
 };

 std::ostream &operator<<(std::ostream &os, Type type);
@@ -52,8 +60,12 @@ enum class ProbeType
  kretprobe,
  uprobe,
  uretprobe,
+  usdt,
  tracepoint,
  profile,
+  interval,
+  software,
+  hardware,
 };

 std::string typestr(Type t);
@@ -63,11 +75,26 @@ class Probe
 {
 public:
  ProbeType type;
-  std::string path;
-  std::string attach_point;
-  std::string prog_name;
-  std::string name;
+  std::string path;		// file path if used
+  std::string attach_point;	// probe name (last component)
+  std::string orig_name;	// original full probe name,
+				// before wildcard expansion
+  std::string name;		// full probe name
+  uint64_t loc;			// for USDT probes
  int freq;
 };

+enum class AsyncAction
+{
+  // printf reserves 0-9999 for printf_ids
+  exit = 10000,
+  print,
+  clear,
+  zero,
+  time,
+  join,
+};
+
+uint64_t asyncactionint(AsyncAction a);
+
 } // namespace bpftrace
--- a/tests/ast.cpp
+++ b/tests/ast.cpp
@@ -48,6 +48,19 @@ TEST(ast, probe_name_uprobe)
  EXPECT_EQ(uprobe2.name(), "uprobe:/bin/sh:readline,uprobe:/bin/sh:somefunc");
 }

+TEST(ast, probe_name_usdt)
+{
+  AttachPoint ap1("usdt", "/bin/sh", "probe1");
+  AttachPointList attach_points1 = { &ap1 };
+  Probe usdt1(&attach_points1, nullptr, nullptr);
+  EXPECT_EQ(usdt1.name(), "usdt:/bin/sh:probe1");
+
+  AttachPoint ap2("usdt", "/bin/sh", "probe2");
+  AttachPointList attach_points2 = { &ap1, &ap2 };
+  Probe usdt2(&attach_points2, nullptr, nullptr);
+  EXPECT_EQ(usdt2.name(), "usdt:/bin/sh:probe1,usdt:/bin/sh:probe2");
+}
+
 TEST(ast, attach_point_name)
 {
  AttachPoint ap1("kprobe", "sys_read");

--- a/tests/bpftrace.cpp
+++ b/tests/bpftrace.cpp
@@ -27,44 +27,76 @@ using ::testing::ContainerEq;
 using ::testing::Return;
 using ::testing::StrictMock;

-void check_kprobe(Probe &p, const std::string &attach_point, const std::string &prog_name)
+void check_kprobe(Probe &p, const std::string &attach_point, const std::string &orig_name)
 {
  EXPECT_EQ(ProbeType::kprobe, p.type);
  EXPECT_EQ(attach_point, p.attach_point);
-  EXPECT_EQ(prog_name, p.prog_name);
+  EXPECT_EQ(orig_name, p.orig_name);
  EXPECT_EQ("kprobe:" + attach_point, p.name);
 }

-void check_uprobe(Probe &p, const std::string &path, const std::string &attach_point, const std::string &prog_name)
+void check_uprobe(Probe &p, const std::string &path, const std::string &attach_point, const std::string &orig_name)
 {
  EXPECT_EQ(ProbeType::uprobe, p.type);
  EXPECT_EQ(attach_point, p.attach_point);
-  EXPECT_EQ(prog_name, p.prog_name);
+  EXPECT_EQ(orig_name, p.orig_name);
  EXPECT_EQ("uprobe:" + path + ":" + attach_point, p.name);
 }

-void check_tracepoint(Probe &p, const std::string &target, const std::string &func, const std::string &prog_name)
+void check_usdt(Probe &p, const std::string &path, const std::string &attach_point, const std::string &orig_name)
+{
+  EXPECT_EQ(ProbeType::usdt, p.type);
+  EXPECT_EQ(attach_point, p.attach_point);
+  EXPECT_EQ(orig_name, p.orig_name);
+  EXPECT_EQ("usdt:" + path + ":" + attach_point, p.name);
+}
+
+void check_tracepoint(Probe &p, const std::string &target, const std::string &func, const std::string &orig_name)
 {
  EXPECT_EQ(ProbeType::tracepoint, p.type);
  EXPECT_EQ(func, p.attach_point);
-  EXPECT_EQ(prog_name, p.prog_name);
+  EXPECT_EQ(orig_name, p.orig_name);
  EXPECT_EQ("tracepoint:" + target + ":" + func, p.name);
 }

-void check_profile(Probe &p, const std::string &unit, int freq, const std::string &prog_name)
+void check_profile(Probe &p, const std::string &unit, int freq, const std::string &orig_name)
 {
  EXPECT_EQ(ProbeType::profile, p.type);
  EXPECT_EQ(freq, p.freq);
-  EXPECT_EQ(prog_name, p.prog_name);
+  EXPECT_EQ(orig_name, p.orig_name);
  EXPECT_EQ("profile:" + unit + ":" + std::to_string(freq), p.name);
 }

-void check_special_probe(Probe &p, const std::string &attach_point, const std::string &prog_name)
+void check_interval(Probe &p, const std::string &unit, int freq, const std::string &orig_name)
+{
+  EXPECT_EQ(ProbeType::interval, p.type);
+  EXPECT_EQ(freq, p.freq);
+  EXPECT_EQ(orig_name, p.orig_name);
+  EXPECT_EQ("interval:" + unit + ":" + std::to_string(freq), p.name);
+}
+
+void check_software(Probe &p, const std::string &unit, int freq, const std::string &orig_name)
+{
+  EXPECT_EQ(ProbeType::software, p.type);
+  EXPECT_EQ(freq, p.freq);
+  EXPECT_EQ(orig_name, p.orig_name);
+  EXPECT_EQ("software:" + unit + ":" + std::to_string(freq), p.name);
+}
+
+void check_hardware(Probe &p, const std::string &unit, int freq, const std::string &orig_name)
+{
+  EXPECT_EQ(ProbeType::hardware, p.type);
+  EXPECT_EQ(freq, p.freq);
+  EXPECT_EQ(orig_name, p.orig_name);
+  EXPECT_EQ("hardware:" + unit + ":" + std::to_string(freq), p.name);
+}
+
+void check_special_probe(Probe &p, const std::string &attach_point, const std::string &orig_name)
 {
  EXPECT_EQ(ProbeType::uprobe, p.type);
  EXPECT_EQ(attach_point, p.attach_point);
-  EXPECT_EQ(prog_name, p.prog_name);
-  EXPECT_EQ(prog_name, p.name);
+  EXPECT_EQ(orig_name, p.orig_name);
+  EXPECT_EQ(orig_name, p.name);
 }

 TEST(bpftrace, add_begin_probe)
@@ -121,9 +153,9 @@ TEST(bpftrace, add_probes_multiple)
  EXPECT_EQ(2, bpftrace.get_probes().size());
  EXPECT_EQ(0, bpftrace.get_special_probes().size());

-  std::string probe_prog_name = "kprobe:sys_read,kprobe:sys_write";
-  check_kprobe(bpftrace.get_probes().at(0), "sys_read", probe_prog_name);
-  check_kprobe(bpftrace.get_probes().at(1), "sys_write", probe_prog_name);
+  std::string probe_orig_name = "kprobe:sys_read,kprobe:sys_write";
+  check_kprobe(bpftrace.get_probes().at(0), "sys_read", probe_orig_name);
+  check_kprobe(bpftrace.get_probes().at(1), "sys_write", probe_orig_name);
 }

 TEST(bpftrace, add_probes_character_class)
@@ -146,10 +178,10 @@ TEST(bpftrace, add_probes_character_class)
  EXPECT_EQ(3, bpftrace.get_probes().size());
  EXPECT_EQ(0, bpftrace.get_special_probes().size());

-  std::string probe_prog_name = "kprobe:[Ss]y[Ss]_read,kprobe:sys_write";
-  check_kprobe(bpftrace.get_probes().at(0), "SyS_read", probe_prog_name);
-  check_kprobe(bpftrace.get_probes().at(1), "sys_read", probe_prog_name);
-  check_kprobe(bpftrace.get_probes().at(2), "sys_write", probe_prog_name);
+  std::string probe_orig_name = "kprobe:[Ss]y[Ss]_read,kprobe:sys_write";
+  check_kprobe(bpftrace.get_probes().at(0), "SyS_read", probe_orig_name);
+  check_kprobe(bpftrace.get_probes().at(1), "sys_read", probe_orig_name);
+  check_kprobe(bpftrace.get_probes().at(2), "sys_write", probe_orig_name);
 }

 TEST(bpftrace, add_probes_wildcard)
@@ -173,11 +205,11 @@ TEST(bpftrace, add_probes_wildcard)
  EXPECT_EQ(4, bpftrace.get_probes().size());
  EXPECT_EQ(0, bpftrace.get_special_probes().size());

-  std::string probe_prog_name = "kprobe:sys_read,kprobe:my_*,kprobe:sys_write";
-  check_kprobe(bpftrace.get_probes().at(0), "sys_read", probe_prog_name);
-  check_kprobe(bpftrace.get_probes().at(1), "my_one", probe_prog_name);
-  check_kprobe(bpftrace.get_probes().at(2), "my_two", probe_prog_name);
-  check_kprobe(bpftrace.get_probes().at(3), "sys_write", probe_prog_name);
+  std::string probe_orig_name = "kprobe:sys_read,kprobe:my_*,kprobe:sys_write";
+  check_kprobe(bpftrace.get_probes().at(0), "sys_read", probe_orig_name);
+  check_kprobe(bpftrace.get_probes().at(1), "my_one", probe_orig_name);
+  check_kprobe(bpftrace.get_probes().at(2), "my_two", probe_orig_name);
+  check_kprobe(bpftrace.get_probes().at(3), "sys_write", probe_orig_name);
 }

 TEST(bpftrace, add_probes_wildcard_no_matches)
@@ -201,9 +233,9 @@ TEST(bpftrace, add_probes_wildcard_no_matches)
  EXPECT_EQ(2, bpftrace.get_probes().size());
  EXPECT_EQ(0, bpftrace.get_special_probes().size());

-  std::string probe_prog_name = "kprobe:sys_read,kprobe:my_*,kprobe:sys_write";
-  check_kprobe(bpftrace.get_probes().at(0), "sys_read", probe_prog_name);
-  check_kprobe(bpftrace.get_probes().at(1), "sys_write", probe_prog_name);
+  std::string probe_orig_name = "kprobe:sys_read,kprobe:my_*,kprobe:sys_write";
+  check_kprobe(bpftrace.get_probes().at(0), "sys_read", probe_orig_name);
+  check_kprobe(bpftrace.get_probes().at(1), "sys_write", probe_orig_name);
 }

 TEST(bpftrace, add_probes_uprobe)
@@ -220,6 +252,20 @@ TEST(bpftrace, add_probes_uprobe)
  check_uprobe(bpftrace.get_probes().at(0), "/bin/sh", "foo", "uprobe:/bin/sh:foo");
 }

+TEST(bpftrace, add_probes_usdt)
+{
+  ast::AttachPoint a("usdt", "/bin/sh", "foo");
+  ast::AttachPointList attach_points = { &a };
+  ast::Probe probe(&attach_points, nullptr, nullptr);
+
+  StrictMock<MockBPFtrace> bpftrace;
+
+  EXPECT_EQ(0, bpftrace.add_probe(probe));
+  EXPECT_EQ(1, bpftrace.get_probes().size());
+  EXPECT_EQ(0, bpftrace.get_special_probes().size());
+  check_usdt(bpftrace.get_probes().at(0), "/bin/sh", "foo", "usdt:/bin/sh:foo");
+}
+
 TEST(bpftrace, add_probes_uprobe_wildcard)
 {
  ast::AttachPoint a("uprobe", "/bin/sh", "foo*");
@@ -245,8 +291,8 @@ TEST(bpftrace, add_probes_tracepoint)
  EXPECT_EQ(1, bpftrace.get_probes().size());
  EXPECT_EQ(0, bpftrace.get_special_probes().size());

-  std::string probe_prog_name = "tracepoint:sched:sched_switch";
-  check_tracepoint(bpftrace.get_probes().at(0), "sched", "sched_switch", probe_prog_name);
+  std::string probe_orig_name = "tracepoint:sched:sched_switch";
+  check_tracepoint(bpftrace.get_probes().at(0), "sched", "sched_switch", probe_orig_name);
 }

 TEST(bpftrace, add_probes_tracepoint_wildcard)
@@ -268,9 +314,9 @@ TEST(bpftrace, add_probes_tracepoint_wildcard)
  EXPECT_EQ(2, bpftrace.get_probes().size());
  EXPECT_EQ(0, bpftrace.get_special_probes().size());

-  std::string probe_prog_name = "tracepoint:sched:sched_*";
-  check_tracepoint(bpftrace.get_probes().at(0), "sched", "sched_one", probe_prog_name);
-  check_tracepoint(bpftrace.get_probes().at(1), "sched", "sched_two", probe_prog_name);
+  std::string probe_orig_name = "tracepoint:sched:sched_*";
+  check_tracepoint(bpftrace.get_probes().at(0), "sched", "sched_one", probe_orig_name);
+  check_tracepoint(bpftrace.get_probes().at(1), "sched", "sched_two", probe_orig_name);
 }

 TEST(bpftrace, add_probes_tracepoint_wildcard_no_matches)
@@ -305,8 +351,56 @@ TEST(bpftrace, add_probes_profile)
  EXPECT_EQ(1, bpftrace.get_probes().size());
  EXPECT_EQ(0, bpftrace.get_special_probes().size());

-  std::string probe_prog_name = "profile:ms:997";
-  check_profile(bpftrace.get_probes().at(0), "ms", 997, probe_prog_name);
+  std::string probe_orig_name = "profile:ms:997";
+  check_profile(bpftrace.get_probes().at(0), "ms", 997, probe_orig_name);
+}
+
+TEST(bpftrace, add_probes_interval)
+{
+  ast::AttachPoint a("interval", "s", 1);
+  ast::AttachPointList attach_points = { &a };
+  ast::Probe probe(&attach_points, nullptr, nullptr);
+
+  StrictMock<MockBPFtrace> bpftrace;
+
+  EXPECT_EQ(0, bpftrace.add_probe(probe));
+  EXPECT_EQ(1, bpftrace.get_probes().size());
+  EXPECT_EQ(0, bpftrace.get_special_probes().size());
+
+  std::string probe_orig_name = "interval:s:1";
+  check_interval(bpftrace.get_probes().at(0), "s", 1, probe_orig_name);
+}
+
+TEST(bpftrace, add_probes_software)
+{
+  ast::AttachPoint a("software", "faults", 1000);
+  ast::AttachPointList attach_points = { &a };
+  ast::Probe probe(&attach_points, nullptr, nullptr);
+
+  StrictMock<MockBPFtrace> bpftrace;
+
+  EXPECT_EQ(0, bpftrace.add_probe(probe));
+  EXPECT_EQ(1, bpftrace.get_probes().size());
+  EXPECT_EQ(0, bpftrace.get_special_probes().size());
+
+  std::string probe_orig_name = "software:faults:1000";
+  check_software(bpftrace.get_probes().at(0), "faults", 1000, probe_orig_name);
+}
+
+TEST(bpftrace, add_probes_hardware)
+{
+  ast::AttachPoint a("hardware", "cache-references", 1000000);
+  ast::AttachPointList attach_points = { &a };
+  ast::Probe probe(&attach_points, nullptr, nullptr);
+
+  StrictMock<MockBPFtrace> bpftrace;
+
+  EXPECT_EQ(0, bpftrace.add_probe(probe));
+  EXPECT_EQ(1, bpftrace.get_probes().size());
+  EXPECT_EQ(0, bpftrace.get_special_probes().size());
+
+  std::string probe_orig_name = "hardware:cache-references:1000000";
+  check_hardware(bpftrace.get_probes().at(0), "cache-references", 1000000, probe_orig_name);
 }

 std::pair<std::vector<uint8_t>, std::vector<uint8_t>> key_value_pair_int(std::vector<uint64_t> key, int val)

--- a/tests/codegen.cpp
+++ b/tests/codegen.cpp
@@ -321,18 +321,21 @@ define i64 @"kprobe:f"(i8*) local_unnamed_addr section "s_kprobe:f" {
 entry:
  %"@x_val" = alloca i64, align 8
  %"@x_key" = alloca i64, align 8
+  %get_pid_tgid = tail call i64 inttoptr (i64 14 to i64 ()*)()
+  %1 = shl i64 %get_pid_tgid, 32
  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 2)
  %get_stackid = tail call i64 inttoptr (i64 27 to i64 (i8*, i8*, i64)*)(i8* %0, i64 %pseudo, i64 0)
-  %1 = bitcast i64* %"@x_key" to i8*
-  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  %2 = or i64 %get_stackid, %1
+  %3 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %3)
  store i64 0, i64* %"@x_key", align 8
-  %2 = bitcast i64* %"@x_val" to i8*
-  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
-  store i64 %get_stackid, i64* %"@x_val", align 8
+  %4 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
+  store i64 %2, i64* %"@x_val", align 8
  %pseudo1 = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
-  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
-  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %3)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
  ret i64 0
 }

@@ -358,18 +361,21 @@ define i64 @"kprobe:f"(i8*) local_unnamed_addr section "s_kprobe:f" {
 entry:
  %"@x_val" = alloca i64, align 8
  %"@x_key" = alloca i64, align 8
+  %get_pid_tgid = tail call i64 inttoptr (i64 14 to i64 ()*)()
+  %1 = shl i64 %get_pid_tgid, 32
  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 2)
  %get_stackid = tail call i64 inttoptr (i64 27 to i64 (i8*, i8*, i64)*)(i8* %0, i64 %pseudo, i64 256)
-  %1 = bitcast i64* %"@x_key" to i8*
-  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  %2 = or i64 %get_stackid, %1
+  %3 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %3)
  store i64 0, i64* %"@x_key", align 8
-  %2 = bitcast i64* %"@x_val" to i8*
-  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
-  store i64 %get_stackid, i64* %"@x_val", align 8
+  %4 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
+  store i64 %2, i64* %"@x_val", align 8
  %pseudo1 = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
-  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
-  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %3)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
  ret i64 0
 }

@@ -519,6 +525,78 @@ attributes #1 = { argmemonly nounwind }
 )EXPECTED");
 }

+TEST(codegen, builtin_curtask)
+{
+  test("kprobe:f { @x = curtask }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %get_cur_task = tail call i64 inttoptr (i64 35 to i64 ()*)()
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %2 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
+  store i64 %get_cur_task, i64* %"@x_val", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, builtin_rand)
+{
+  test("kprobe:f { @x = rand }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %get_random = tail call i64 inttoptr (i64 7 to i64 ()*)()
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %2 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
+  store i64 %get_random, i64* %"@x_val", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
 TEST(codegen, builtin_comm)
 {
  test("kprobe:f { @x = comm }",
@@ -745,9 +823,174 @@ attributes #1 = { argmemonly nounwind }
 )EXPECTED");
 }

-TEST(codegen, call_quantize)
+TEST(codegen, builtin_name)
 {
-  test("kprobe:f { @x = quantize(pid) }",
+  test("tracepoint:syscalls:sys_enter_nanosleep { @x = name }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"tracepoint:syscalls:sys_enter_nanosleep"(i8* nocapture readnone) local_unnamed_addr section "s_tracepoint:syscalls:sys_enter_nanosleep" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %2 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
+  store i64 0, i64* %"@x_val", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, builtin_func_wild)
+{
+  test("tracepoint:syscalls:sys_enter_nanoslee* { @x = func }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"tracepoint:syscalls:sys_enter_nanoslee*"(i8*) local_unnamed_addr section "s_tracepoint:syscalls:sys_enter_nanoslee*" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %func = alloca i64, align 8
+  %1 = bitcast i64* %func to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  %2 = getelementptr i8, i8* %0, i64 128
+  %probe_read = call i64 inttoptr (i64 4 to i64 (i8*, i64, i8*)*)(i64* nonnull %func, i64 8, i8* %2)
+  %3 = load i64, i64* %func, align 8
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  %4 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
+  store i64 0, i64* %"@x_key", align 8
+  %5 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %5)
+  store i64 %3, i64* %"@x_val", align 8
+  %pseudo = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %5)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, builtin_name_wild)
+{
+  test("tracepoint:syscalls:sys_enter_nanoslee* { @x = name }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"tracepoint:syscalls:sys_enter_nanosleep"(i8* nocapture readnone) local_unnamed_addr section "s_tracepoint:syscalls:sys_enter_nanosleep" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %2 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
+  store i64 1, i64* %"@x_val", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_usym_key)
+{
+  test("kprobe:f { @x[usym(0)] = count() }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca [16 x i8], align 8
+  %1 = getelementptr inbounds [16 x i8], [16 x i8]* %"@x_key", i64 0, i64 0
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  %get_pid_tgid = tail call i64 inttoptr (i64 14 to i64 ()*)()
+  %2 = lshr i64 %get_pid_tgid, 32
+  %usym.sroa.0.0..sroa_cast = bitcast [16 x i8]* %"@x_key" to i64*
+  store i64 0, i64* %usym.sroa.0.0..sroa_cast, align 8
+  %usym.sroa.4.0..sroa_idx = getelementptr inbounds [16 x i8], [16 x i8]* %"@x_key", i64 0, i64 8
+  %usym.sroa.4.0..sroa_cast = bitcast i8* %usym.sroa.4.0..sroa_idx to i64*
+  store i64 %2, i64* %usym.sroa.4.0..sroa_cast, align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %lookup_elem = call i8* inttoptr (i64 1 to i8* (i8*, i8*)*)(i64 %pseudo, [16 x i8]* nonnull %"@x_key")
+  %map_lookup_cond = icmp eq i8* %lookup_elem, null
+  br i1 %map_lookup_cond, label %lookup_merge, label %lookup_success
+
+lookup_success:                                   ; preds = %entry
+  %3 = load i64, i8* %lookup_elem, align 8
+  %phitmp = add i64 %3, 1
+  br label %lookup_merge
+
+lookup_merge:                                     ; preds = %entry, %lookup_success
+  %lookup_elem_val.0 = phi i64 [ %phitmp, %lookup_success ], [ 1, %entry ]
+  %4 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
+  store i64 %lookup_elem_val.0, i64* %"@x_val", align 8
+  %pseudo1 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, [16 x i8]* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_hist)
+{
+  test("kprobe:f { @x = hist(pid) }",

 R"EXPECTED(; Function Attrs: nounwind
 declare i64 @llvm.bpf.pseudo(i64, i64) #0
@@ -816,9 +1059,342 @@ attributes #1 = { argmemonly nounwind }
 )EXPECTED");
 }

-TEST(codegen, call_count)
+TEST(codegen, call_lhist)
+{
+  test("kprobe:f { @x = lhist(pid, 0, 100, 1) }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %get_pid_tgid = tail call i64 inttoptr (i64 14 to i64 ()*)()
+  %get_pid_tgid1 = tail call i64 inttoptr (i64 14 to i64 ()*)()
+  %1 = lshr i64 %get_pid_tgid1, 32
+  %2 = icmp ugt i64 %get_pid_tgid1, 433791696895
+  %3 = add nuw nsw i64 %1, 1
+  %linear3 = select i1 %2, i64 101, i64 %3
+  %4 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
+  store i64 %linear3, i64* %"@x_key", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %lookup_elem = call i8* inttoptr (i64 1 to i8* (i8*, i8*)*)(i64 %pseudo, i64* nonnull %"@x_key")
+  %map_lookup_cond = icmp eq i8* %lookup_elem, null
+  br i1 %map_lookup_cond, label %lookup_merge, label %lookup_success
+
+lookup_success:                                   ; preds = %entry
+  %5 = load i64, i8* %lookup_elem, align 8
+  %phitmp = add i64 %5, 1
+  br label %lookup_merge
+
+lookup_merge:                                     ; preds = %entry, %lookup_success
+  %lookup_elem_val.0 = phi i64 [ %phitmp, %lookup_success ], [ 1, %entry ]
+  %6 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %6)
+  store i64 %lookup_elem_val.0, i64* %"@x_val", align 8
+  %pseudo2 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo2, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %6)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_count)
+{
+  test("kprobe:f { @x = count() }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %lookup_elem = call i8* inttoptr (i64 1 to i8* (i8*, i8*)*)(i64 %pseudo, i64* nonnull %"@x_key")
+  %map_lookup_cond = icmp eq i8* %lookup_elem, null
+  br i1 %map_lookup_cond, label %lookup_merge, label %lookup_success
+
+lookup_success:                                   ; preds = %entry
+  %2 = load i64, i8* %lookup_elem, align 8
+  %phitmp = add i64 %2, 1
+  br label %lookup_merge
+
+lookup_merge:                                     ; preds = %entry, %lookup_success
+  %lookup_elem_val.0 = phi i64 [ %phitmp, %lookup_success ], [ 1, %entry ]
+  %3 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %3)
+  store i64 %lookup_elem_val.0, i64* %"@x_val", align 8
+  %pseudo1 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %3)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_min)
+{
+  test("kprobe:f { @x = min(pid) }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %lookup_elem = call i8* inttoptr (i64 1 to i8* (i8*, i8*)*)(i64 %pseudo, i64* nonnull %"@x_key")
+  %map_lookup_cond = icmp eq i8* %lookup_elem, null
+  br i1 %map_lookup_cond, label %lookup_merge, label %lookup_success
+
+lookup_success:                                   ; preds = %entry
+  %2 = load i64, i8* %lookup_elem, align 8
+  br label %lookup_merge
+
+lookup_merge:                                     ; preds = %entry, %lookup_success
+  %lookup_elem_val.0 = phi i64 [ %2, %lookup_success ], [ 0, %entry ]
+  %3 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %3)
+  %get_pid_tgid = call i64 inttoptr (i64 14 to i64 ()*)()
+  %4 = lshr i64 %get_pid_tgid, 32
+  %5 = xor i64 %4, 4294967295
+  %6 = icmp slt i64 %5, %lookup_elem_val.0
+  br i1 %6, label %min.lt, label %min.ge
+
+min.lt:                                           ; preds = %lookup_merge, %min.ge
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %3)
+  ret i64 0
+
+min.ge:                                           ; preds = %lookup_merge
+  store i64 %5, i64* %"@x_val", align 8
+  %pseudo1 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  br label %min.lt
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_max)
+{
+  test("kprobe:f { @x = max(pid) }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %lookup_elem = call i8* inttoptr (i64 1 to i8* (i8*, i8*)*)(i64 %pseudo, i64* nonnull %"@x_key")
+  %map_lookup_cond = icmp eq i8* %lookup_elem, null
+  br i1 %map_lookup_cond, label %lookup_merge, label %lookup_success
+
+lookup_success:                                   ; preds = %entry
+  %2 = load i64, i8* %lookup_elem, align 8
+  br label %lookup_merge
+
+lookup_merge:                                     ; preds = %entry, %lookup_success
+  %lookup_elem_val.0 = phi i64 [ %2, %lookup_success ], [ 0, %entry ]
+  %3 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %3)
+  %get_pid_tgid = call i64 inttoptr (i64 14 to i64 ()*)()
+  %4 = lshr i64 %get_pid_tgid, 32
+  %5 = icmp slt i64 %4, %lookup_elem_val.0
+  br i1 %5, label %min.lt, label %min.ge
+
+min.lt:                                           ; preds = %lookup_merge, %min.ge
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %3)
+  ret i64 0
+
+min.ge:                                           ; preds = %lookup_merge
+  store i64 %4, i64* %"@x_val", align 8
+  %pseudo1 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  br label %min.lt
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_sum)
+{
+  test("kprobe:f { @x = sum(pid) }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %lookup_elem = call i8* inttoptr (i64 1 to i8* (i8*, i8*)*)(i64 %pseudo, i64* nonnull %"@x_key")
+  %map_lookup_cond = icmp eq i8* %lookup_elem, null
+  br i1 %map_lookup_cond, label %lookup_merge, label %lookup_success
+
+lookup_success:                                   ; preds = %entry
+  %2 = load i64, i8* %lookup_elem, align 8
+  br label %lookup_merge
+
+lookup_merge:                                     ; preds = %entry, %lookup_success
+  %lookup_elem_val.0 = phi i64 [ %2, %lookup_success ], [ 0, %entry ]
+  %3 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %3)
+  %get_pid_tgid = call i64 inttoptr (i64 14 to i64 ()*)()
+  %4 = lshr i64 %get_pid_tgid, 32
+  %5 = add i64 %4, %lookup_elem_val.0
+  store i64 %5, i64* %"@x_val", align 8
+  %pseudo1 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %3)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_avg)
+{
+  test("kprobe:f { @x = avg(pid) }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key2" = alloca i64, align 8
+  %"@x_num" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %lookup_elem = call i8* inttoptr (i64 1 to i8* (i8*, i8*)*)(i64 %pseudo, i64* nonnull %"@x_key")
+  %map_lookup_cond = icmp eq i8* %lookup_elem, null
+  br i1 %map_lookup_cond, label %lookup_merge, label %lookup_success
+
+lookup_success:                                   ; preds = %entry
+  %2 = load i64, i8* %lookup_elem, align 8
+  %phitmp = add i64 %2, 1
+  br label %lookup_merge
+
+lookup_merge:                                     ; preds = %entry, %lookup_success
+  %lookup_elem_val.0 = phi i64 [ %phitmp, %lookup_success ], [ 1, %entry ]
+  %3 = bitcast i64* %"@x_num" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %3)
+  store i64 %lookup_elem_val.0, i64* %"@x_num", align 8
+  %pseudo1 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, i64* nonnull %"@x_key", i64* nonnull %"@x_num", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %3)
+  %4 = bitcast i64* %"@x_key2" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
+  store i64 1, i64* %"@x_key2", align 8
+  %pseudo3 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %lookup_elem4 = call i8* inttoptr (i64 1 to i8* (i8*, i8*)*)(i64 %pseudo3, i64* nonnull %"@x_key2")
+  %map_lookup_cond9 = icmp eq i8* %lookup_elem4, null
+  br i1 %map_lookup_cond9, label %lookup_merge7, label %lookup_success5
+
+lookup_success5:                                  ; preds = %lookup_merge
+  %5 = load i64, i8* %lookup_elem4, align 8
+  br label %lookup_merge7
+
+lookup_merge7:                                    ; preds = %lookup_merge, %lookup_success5
+  %lookup_elem_val8.0 = phi i64 [ %5, %lookup_success5 ], [ 0, %lookup_merge ]
+  %6 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %6)
+  %get_pid_tgid = call i64 inttoptr (i64 14 to i64 ()*)()
+  %7 = lshr i64 %get_pid_tgid, 32
+  %8 = add i64 %7, %lookup_elem_val8.0
+  store i64 %8, i64* %"@x_val", align 8
+  %pseudo10 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem11 = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo10, i64* nonnull %"@x_key2", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %6)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_stats)
 {
-  test("kprobe:f { @x = count() }",
+  test("kprobe:f { @x = stats(pid) }",

 R"EXPECTED(; Function Attrs: nounwind
 declare i64 @llvm.bpf.pseudo(i64, i64) #0
@@ -829,6 +1405,8 @@ declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
 define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
 entry:
  %"@x_val" = alloca i64, align 8
+  %"@x_key2" = alloca i64, align 8
+  %"@x_num" = alloca i64, align 8
  %"@x_key" = alloca i64, align 8
  %1 = bitcast i64* %"@x_key" to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
@@ -845,13 +1423,37 @@ lookup_success:                                   ; preds = %entry

 lookup_merge:                                     ; preds = %entry, %lookup_success
  %lookup_elem_val.0 = phi i64 [ %phitmp, %lookup_success ], [ 1, %entry ]
-  %3 = bitcast i64* %"@x_val" to i8*
+  %3 = bitcast i64* %"@x_num" to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %3)
-  store i64 %lookup_elem_val.0, i64* %"@x_val", align 8
+  store i64 %lookup_elem_val.0, i64* %"@x_num", align 8
  %pseudo1 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
-  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo1, i64* nonnull %"@x_key", i64* nonnull %"@x_num", i64 0)
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %3)
+  %4 = bitcast i64* %"@x_key2" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
+  store i64 1, i64* %"@x_key2", align 8
+  %pseudo3 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %lookup_elem4 = call i8* inttoptr (i64 1 to i8* (i8*, i8*)*)(i64 %pseudo3, i64* nonnull %"@x_key2")
+  %map_lookup_cond9 = icmp eq i8* %lookup_elem4, null
+  br i1 %map_lookup_cond9, label %lookup_merge7, label %lookup_success5
+
+lookup_success5:                                  ; preds = %lookup_merge
+  %5 = load i64, i8* %lookup_elem4, align 8
+  br label %lookup_merge7
+
+lookup_merge7:                                    ; preds = %lookup_merge, %lookup_success5
+  %lookup_elem_val8.0 = phi i64 [ %5, %lookup_success5 ], [ 0, %lookup_merge ]
+  %6 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %6)
+  %get_pid_tgid = call i64 inttoptr (i64 14 to i64 ()*)()
+  %7 = lshr i64 %get_pid_tgid, 32
+  %8 = add i64 %7, %lookup_elem_val8.0
+  store i64 %8, i64* %"@x_val", align 8
+  %pseudo10 = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem11 = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo10, i64* nonnull %"@x_key2", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %6)
  ret i64 0
 }

@@ -1003,6 +1605,230 @@ attributes #1 = { argmemonly nounwind }
 )EXPECTED");
 }

+TEST(codegen, call_exit)
+{
+  test("kprobe:f { exit() }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8*) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %perfdata = alloca [8 x i8], align 8
+  %1 = getelementptr inbounds [8 x i8], [8 x i8]* %perfdata, i64 0, i64 0
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 10000, [8 x i8]* %perfdata, align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %get_cpu_id = tail call i64 inttoptr (i64 8 to i64 ()*)()
+  %perf_event_output = call i64 inttoptr (i64 25 to i64 (i8*, i8*, i64, i8*, i64)*)(i8* %0, i64 %pseudo, i64 %get_cpu_id, [8 x i8]* nonnull %perfdata, i64 8)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_print)
+{
+  test("BEGIN { @x = 1; } kprobe:f { print(@x); }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @BEGIN(i8* nocapture readnone) local_unnamed_addr section "s_BEGIN" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %2 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
+  store i64 1, i64* %"@x_val", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8*) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %perfdata = alloca [26 x i8], align 8
+  %1 = getelementptr inbounds [26 x i8], [26 x i8]* %perfdata, i64 0, i64 0
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 10001, [26 x i8]* %perfdata, align 8
+  %2 = getelementptr inbounds [26 x i8], [26 x i8]* %perfdata, i64 0, i64 8
+  %3 = getelementptr inbounds [26 x i8], [26 x i8]* %perfdata, i64 0, i64 24
+  %4 = bitcast i8* %3 to i16*
+  call void @llvm.memset.p0i8.i64(i8* nonnull %2, i8 0, i64 16, i32 8, i1 false)
+  store i16 30784, i16* %4, align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 2)
+  %get_cpu_id = tail call i64 inttoptr (i64 8 to i64 ()*)()
+  %perf_event_output = call i64 inttoptr (i64 25 to i64 (i8*, i8*, i64, i8*, i64)*)(i8* %0, i64 %pseudo, i64 %get_cpu_id, [26 x i8]* nonnull %perfdata, i64 26)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i32, i1) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_clear)
+{
+  test("BEGIN { @x = 1; } kprobe:f { clear(@x); }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @BEGIN(i8* nocapture readnone) local_unnamed_addr section "s_BEGIN" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %2 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
+  store i64 1, i64* %"@x_val", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8*) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %perfdata = alloca [10 x i8], align 8
+  %1 = getelementptr inbounds [10 x i8], [10 x i8]* %perfdata, i64 0, i64 0
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 10002, [10 x i8]* %perfdata, align 8
+  %2 = getelementptr inbounds [10 x i8], [10 x i8]* %perfdata, i64 0, i64 8
+  %3 = bitcast i8* %2 to i16*
+  store i16 30784, i16* %3, align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 2)
+  %get_cpu_id = tail call i64 inttoptr (i64 8 to i64 ()*)()
+  %perf_event_output = call i64 inttoptr (i64 25 to i64 (i8*, i8*, i64, i8*, i64)*)(i8* %0, i64 %pseudo, i64 %get_cpu_id, [10 x i8]* nonnull %perfdata, i64 10)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  ret i64 0
+}
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_zero)
+{
+  test("BEGIN { @x = 1; } kprobe:f { zero(@x); }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @BEGIN(i8* nocapture readnone) local_unnamed_addr section "s_BEGIN" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %1 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 0, i64* %"@x_key", align 8
+  %2 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
+  store i64 1, i64* %"@x_val", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8*) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %perfdata = alloca [10 x i8], align 8
+  %1 = getelementptr inbounds [10 x i8], [10 x i8]* %perfdata, i64 0, i64 0
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 10003, [10 x i8]* %perfdata, align 8
+  %2 = getelementptr inbounds [10 x i8], [10 x i8]* %perfdata, i64 0, i64 8
+  %3 = bitcast i8* %2 to i16*
+  store i16 30784, i16* %3, align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 2)
+  %get_cpu_id = tail call i64 inttoptr (i64 8 to i64 ()*)()
+  %perf_event_output = call i64 inttoptr (i64 25 to i64 (i8*, i8*, i64, i8*, i64)*)(i8* %0, i64 %pseudo, i64 %get_cpu_id, [10 x i8]* nonnull %perfdata, i64 10)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  ret i64 0
+}
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, call_time)
+{
+  test("kprobe:f { time(); }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8*) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %perfdata = alloca [16 x i8], align 8
+  %1 = getelementptr inbounds [16 x i8], [16 x i8]* %perfdata, i64 0, i64 0
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  store i64 10004, [16 x i8]* %perfdata, align 8
+  %2 = getelementptr inbounds [16 x i8], [16 x i8]* %perfdata, i64 0, i64 8
+  store i64 0, i8* %2, align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %get_cpu_id = tail call i64 inttoptr (i64 8 to i64 ()*)()
+  %perf_event_output = call i64 inttoptr (i64 25 to i64 (i8*, i8*, i64, i8*, i64)*)(i8* %0, i64 %pseudo, i64 %get_cpu_id, [16 x i8]* nonnull %perfdata, i64 16)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+// TODO: add a join() test. It gets stuck in codegen.compile().
+
 TEST(codegen, int_propagation)
 {
  test("kprobe:f { @x = 1234; @y = @x }",
@@ -1377,6 +2203,104 @@ attributes #1 = { argmemonly nounwind }
 )EXPECTED");
 }

+TEST(codegen, ternary_int)
+{
+  test("kprobe:f { @x = pid < 10000 ? 1 : 2; }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_val" = alloca i64, align 8
+  %"@x_key" = alloca i64, align 8
+  %get_pid_tgid = tail call i64 inttoptr (i64 14 to i64 ()*)()
+  %1 = icmp ult i64 %get_pid_tgid, 42949672960000
+  %. = select i1 %1, i64 1, i64 2
+  %2 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
+  store i64 0, i64* %"@x_key", align 8
+  %3 = bitcast i64* %"@x_val" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %3)
+  store i64 %., i64* %"@x_val", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", i64* nonnull %"@x_val", i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %2)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %3)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
+TEST(codegen, ternary_str)
+{
+  test("kprobe:f { @x = pid < 10000 ? \"lo\" : \"hi\"; }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1
+
+define i64 @"kprobe:f"(i8* nocapture readnone) local_unnamed_addr section "s_kprobe:f" {
+entry:
+  %"@x_key" = alloca i64, align 8
+  %buf = alloca [64 x i8], align 1
+  %1 = getelementptr inbounds [64 x i8], [64 x i8]* %buf, i64 0, i64 0
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
+  %get_pid_tgid = tail call i64 inttoptr (i64 14 to i64 ()*)()
+  %2 = icmp ult i64 %get_pid_tgid, 42949672960000
+  br i1 %2, label %left, label %right
+
+left:                                             ; preds = %entry
+  store i8 108, i8* %1, align 1
+  %str.sroa.3.0..sroa_idx = getelementptr inbounds [64 x i8], [64 x i8]* %buf, i64 0, i64 1
+  store i8 111, i8* %str.sroa.3.0..sroa_idx, align 1
+  %str.sroa.4.0..sroa_idx = getelementptr inbounds [64 x i8], [64 x i8]* %buf, i64 0, i64 2
+  call void @llvm.memset.p0i8.i64(i8* nonnull %str.sroa.4.0..sroa_idx, i8 0, i64 61, i32 1, i1 false)
+  br label %done
+
+right:                                            ; preds = %entry
+  store i8 104, i8* %1, align 1
+  %str1.sroa.3.0..sroa_idx = getelementptr inbounds [64 x i8], [64 x i8]* %buf, i64 0, i64 1
+  store i8 105, i8* %str1.sroa.3.0..sroa_idx, align 1
+  %str1.sroa.4.0..sroa_idx = getelementptr inbounds [64 x i8], [64 x i8]* %buf, i64 0, i64 2
+  call void @llvm.memset.p0i8.i64(i8* nonnull %str1.sroa.4.0..sroa_idx, i8 0, i64 61, i32 1, i1 false)
+  br label %done
+
+done:                                             ; preds = %right, %left
+  %3 = getelementptr inbounds [64 x i8], [64 x i8]* %buf, i64 0, i64 63
+  store i8 0, i8* %3, align 1
+  %4 = bitcast i64* %"@x_key" to i8*
+  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
+  store i64 0, i64* %"@x_key", align 8
+  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
+  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@x_key", [64 x i8]* nonnull %buf, i64 0)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
+  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
+  ret i64 0
+}
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i32, i1) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
 TEST(codegen, struct_integers)
 {
  auto expected = R"EXPECTED(; Function Attrs: nounwind

--- a/tests/parser.cpp
+++ b/tests/parser.cpp
@@ -29,12 +29,15 @@ TEST(Parser, builtin_variables)
  test("kprobe:f { gid }", "Program\n kprobe:f\n  builtin: gid\n");
  test("kprobe:f { nsecs }", "Program\n kprobe:f\n  builtin: nsecs\n");
  test("kprobe:f { cpu }", "Program\n kprobe:f\n  builtin: cpu\n");
+  test("kprobe:f { curtask }", "Program\n kprobe:f\n  builtin: curtask\n");
+  test("kprobe:f { rand }", "Program\n kprobe:f\n  builtin: rand\n");
  test("kprobe:f { comm }", "Program\n kprobe:f\n  builtin: comm\n");
  test("kprobe:f { stack }", "Program\n kprobe:f\n  builtin: stack\n");
  test("kprobe:f { ustack }", "Program\n kprobe:f\n  builtin: ustack\n");
  test("kprobe:f { arg0 }", "Program\n kprobe:f\n  builtin: arg0\n");
  test("kprobe:f { retval }", "Program\n kprobe:f\n  builtin: retval\n");
  test("kprobe:f { func }", "Program\n kprobe:f\n  builtin: func\n");
+  test("kprobe:f { name }", "Program\n kprobe:f\n  builtin: name\n");
 }

 TEST(Parser, map_assign)
@@ -63,6 +66,41 @@ TEST(Parser, map_assign)
      "  =\n"
      "   map: @x\n"
      "   call: count\n");
+  test("kprobe:sys_read { @x = sum(arg2); }",
+      "Program\n"
+      " kprobe:sys_read\n"
+      "  =\n"
+      "   map: @x\n"
+      "   call: sum\n"
+      "    builtin: arg2\n");
+  test("kprobe:sys_read { @x = min(arg2); }",
+      "Program\n"
+      " kprobe:sys_read\n"
+      "  =\n"
+      "   map: @x\n"
+      "   call: min\n"
+      "    builtin: arg2\n");
+  test("kprobe:sys_read { @x = max(arg2); }",
+      "Program\n"
+      " kprobe:sys_read\n"
+      "  =\n"
+      "   map: @x\n"
+      "   call: max\n"
+      "    builtin: arg2\n");
+  test("kprobe:sys_read { @x = avg(arg2); }",
+      "Program\n"
+      " kprobe:sys_read\n"
+      "  =\n"
+      "   map: @x\n"
+      "   call: avg\n"
+      "    builtin: arg2\n");
+  test("kprobe:sys_read { @x = stats(arg2); }",
+      "Program\n"
+      " kprobe:sys_read\n"
+      "  =\n"
+      "   map: @x\n"
+      "   call: stats\n"
+      "    builtin: arg2\n");
  test("kprobe:sys_open { @x = \"mystring\" }",
      "Program\n"
      " kprobe:sys_open\n"
@@ -184,9 +222,59 @@ TEST(Parser, expressions)
      "  int: 1\n");
 }

+TEST(Parser, ternary_int)
+{
+  test("kprobe:sys_open { @x = pid < 10000 ? 1 : 2 }",
+       "Program\n"
+       " kprobe:sys_open\n"
+       "  =\n"
+       "   map: @x\n"
+       "   ?:\n"
+       "    <\n"
+       "     builtin: pid\n"
+       "     int: 10000\n"
+       "    int: 1\n"
+       "    int: 2\n");
+}
+
+TEST(Parser, ternary_str)
+{
+  test("kprobe:sys_open { @x = pid < 10000 ? \"lo\" : \"high\" }",
+       "Program\n"
+       " kprobe:sys_open\n"
+       "  =\n"
+       "   map: @x\n"
+       "   ?:\n"
+       "    <\n"
+       "     builtin: pid\n"
+       "     int: 10000\n"
+       "    string: lo\n"
+       "    string: high\n");
+}
+
+TEST(Parser, ternary_nested)
+{
+  test("kprobe:sys_open { @x = pid < 10000 ? pid < 5000 ? 1 : 2 : 3 }",
+       "Program\n"
+       " kprobe:sys_open\n"
+       "  =\n"
+       "   map: @x\n"
+       "   ?:\n"
+       "    <\n"
+       "     builtin: pid\n"
+       "     int: 10000\n"
+       "    ?:\n"
+       "     <\n"
+       "      builtin: pid\n"
+       "      int: 5000\n"
+       "     int: 1\n"
+       "     int: 2\n"
+       "    int: 3\n");
+}
+
 TEST(Parser, call)
 {
-  test("kprobe:sys_open { @x = count(); @y = quantize(1,2,3); delete(@x); }",
+  test("kprobe:sys_open { @x = count(); @y = hist(1,2,3); delete(@x); }",
      "Program\n"
      " kprobe:sys_open\n"
      "  =\n"
@@ -194,7 +282,7 @@ TEST(Parser, call)
      "   call: count\n"
      "  =\n"
      "   map: @y\n"
-      "   call: quantize\n"
+      "   call: hist\n"
      "    int: 1\n"
      "    int: 2\n"
      "    int: 3\n"
@@ -228,6 +316,14 @@ TEST(Parser, uprobe)
      "  int: 1\n");
 }

+TEST(Parser, usdt)
+{
+  test("usdt:/my/program:probe { 1; }",
+      "Program\n"
+      " usdt:/my/program:probe\n"
+      "  int: 1\n");
+}
+
 TEST(Parser, escape_chars)
 {
  test("kprobe:sys_open { \"newline\\nand tab\\tbackslash\\\\quote\\\"here\" }",
@@ -260,6 +356,30 @@ TEST(Parser, profile_probe)
      "  int: 1\n");
 }

+TEST(Parser, interval_probe)
+{
+  test("interval:s:1 { 1 }",
+      "Program\n"
+      " interval:s:1\n"
+      "  int: 1\n");
+}
+
+TEST(Parser, software_probe)
+{
+  test("software:faults:1000 { 1 }",
+      "Program\n"
+      " software:faults:1000\n"
+      "  int: 1\n");
+}
+
+TEST(Parser, hardware_probe)
+{
+  test("hardware:cache-references:1000000 { 1 }",
+      "Program\n"
+      " hardware:cache-references:1000000\n"
+      "  int: 1\n");
+}
+
 TEST(Parser, multiple_attach_points_kprobe)
 {
  test("BEGIN,kprobe:sys_open,uprobe:/bin/sh:foo,tracepoint:syscalls:sys_enter_* { 1 }",

--- a/tests/semantic_analyser.cpp
+++ b/tests/semantic_analyser.cpp
@@ -57,25 +57,41 @@ TEST(semantic_analyser, builtin_variables)
  test("kprobe:f { gid }", 0);
  test("kprobe:f { nsecs }", 0);
  test("kprobe:f { cpu }", 0);
+  test("kprobe:f { curtask }", 0);
+  test("kprobe:f { rand }", 0);
  test("kprobe:f { comm }", 0);
  test("kprobe:f { stack }", 0);
  test("kprobe:f { ustack }", 0);
  test("kprobe:f { arg0 }", 0);
  test("kprobe:f { retval }", 0);
  test("kprobe:f { func }", 0);
+  test("kprobe:f { name }", 0);
 //  test("kprobe:f { fake }", 1);
 }

 TEST(semantic_analyser, builtin_functions)
 {
-  test("kprobe:f { @x = quantize(123) }", 0);
+  test("kprobe:f { @x = hist(123) }", 0);
  test("kprobe:f { @x = count() }", 0);
+  test("kprobe:f { @x = sum(pid) }", 0);
+  test("kprobe:f { @x = min(pid) }", 0);
+  test("kprobe:f { @x = max(pid) }", 0);
+  test("kprobe:f { @x = avg(pid) }", 0);
+  test("kprobe:f { @x = stats(pid) }", 0);
  test("kprobe:f { @x = 1; delete(@x) }", 0);
+  test("kprobe:f { @x = 1; print(@x) }", 0);
+  test("kprobe:f { @x = 1; clear(@x) }", 0);
+  test("kprobe:f { @x = 1; zero(@x) }", 0);
+  test("kprobe:f { time() }", 0);
+  test("kprobe:f { exit() }", 0);
  test("kprobe:f { str(0xffff) }", 0);
  test("kprobe:f { printf(\"hello\\n\") }", 0);
+  test("kprobe:f { join(0) }", 0);
  test("kprobe:f { sym(0xffff) }", 0);
  test("kprobe:f { usym(0xffff) }", 0);
  test("kprobe:f { reg(\"ip\") }", 0);
+  test("kprobe:f { @x = count(pid) }", 1);
+  test("kprobe:f { @x = sum(pid, 123) }", 1);
  test("kprobe:f { fake() }", 1);
 }

@@ -102,17 +118,26 @@ TEST(semantic_analyser, predicate_expressions)
  test("kprobe:f / @mymap / { @mymap = \"str\" }", 10);
 }

+TEST(semantic_analyser, ternary_experssions)
+{
+  test("kprobe:f { @x = pid < 10000 ? 1 : 2 }", 0);
+  test("kprobe:f { @x = pid < 10000 ? \"lo\" : \"high\" }", 0);
+  test("kprobe:f { @x = pid < 10000 ? 1 : \"high\" }", 10);
+  test("kprobe:f { @x = pid < 10000 ? \"lo\" : 2 }", 10);
+}
+
 TEST(semantic_analyser, mismatched_call_types)
 {
  test("kprobe:f { @x = 1; @x = count(); }", 1);
-  test("kprobe:f { @x = 1; @x = quantize(0); }", 1);
+  test("kprobe:f { @x = count(); @x = sum(pid); }", 1);
+  test("kprobe:f { @x = 1; @x = hist(0); }", 1);
 }

-TEST(semantic_analyser, call_quantize)
+TEST(semantic_analyser, call_hist)
 {
-  test("kprobe:f { @x = quantize(1); }", 0);
-  test("kprobe:f { @x = quantize(); }", 1);
-  test("kprobe:f { quantize(); }", 1);
+  test("kprobe:f { @x = hist(1); }", 0);
+  test("kprobe:f { @x = hist(); }", 1);
+  test("kprobe:f { hist(); }", 1);
 }

 TEST(semantic_analyser, call_count)
@@ -122,6 +147,41 @@ TEST(semantic_analyser, call_count)
  test("kprobe:f { count(); }", 1);
 }

+TEST(semantic_analyser, call_sum)
+{
+  test("kprobe:f { @x = sum(); }", 1);
+  test("kprobe:f { @x = sum(123); }", 0);
+  test("kprobe:f { sum(); }", 1);
+}
+
+TEST(semantic_analyser, call_min)
+{
+  test("kprobe:f { @x = min(); }", 1);
+  test("kprobe:f { @x = min(123); }", 0);
+  test("kprobe:f { min(); }", 1);
+}
+
+TEST(semantic_analyser, call_max)
+{
+  test("kprobe:f { @x = max(); }", 1);
+  test("kprobe:f { @x = max(123); }", 0);
+  test("kprobe:f { max(); }", 1);
+}
+
+TEST(semantic_analyser, call_avg)
+{
+  test("kprobe:f { @x = avg(); }", 1);
+  test("kprobe:f { @x = avg(123); }", 0);
+  test("kprobe:f { avg(); }", 1);
+}
+
+TEST(semantic_analyser, call_stats)
+{
+  test("kprobe:f { @x = stats(); }", 1);
+  test("kprobe:f { @x = stats(123); }", 0);
+  test("kprobe:f { stats(); }", 1);
+}
+
 TEST(semantic_analyser, call_delete)
 {
  test("kprobe:f { @x = 1; delete(@x); }", 0);
@@ -131,6 +191,43 @@ TEST(semantic_analyser, call_delete)
  test("kprobe:f { $y = delete(@x); }", 1);
 }

+TEST(semantic_analyser, call_exit)
+{
+  test("kprobe:f { exit(); }", 0);
+  test("kprobe:f { exit(1); }", 1);
+}
+
+TEST(semantic_analyser, call_print)
+{
+  test("kprobe:f { @x = count(); print(@x); }", 0);
+  test("kprobe:f { @x = count(); print(@x, 5); }", 0);
+  test("kprobe:f { @x = count(); print(@x, 5, 10); }", 0);
+  test("kprobe:f { @x = count(); print(@x, 5, 10, 1); }", 1);
+  test("kprobe:f { @x = count(); @x = print(); }", 1);
+}
+
+TEST(semantic_analyser, call_clear)
+{
+  test("kprobe:f { @x = count(); clear(@x); }", 0);
+  test("kprobe:f { @x = count(); clear(@x, 1); }", 1);
+  test("kprobe:f { @x = count(); @x = clear(); }", 1);
+}
+
+TEST(semantic_analyser, call_zero)
+{
+  test("kprobe:f { @x = count(); zero(@x); }", 0);
+  test("kprobe:f { @x = count(); zero(@x, 1); }", 1);
+  test("kprobe:f { @x = count(); @x = zero(); }", 1);
+}
+
+TEST(semantic_analyser, call_time)
+{
+  test("kprobe:f { time(); }", 0);
+  test("kprobe:f { time(\"%M:%S\"); }", 0);
+  test("kprobe:f { time(\"%M:%S\", 1); }", 1);
+  test("kprobe:f { @x = time(); }", 1);
+}
+
 TEST(semantic_analyser, call_str)
 {
  test("kprobe:f { str(arg0); }", 0);
@@ -164,6 +261,24 @@ TEST(semantic_analyser, call_reg)
  test("kprobe:f { reg(123); }", 1);
 }

+TEST(semantic_analyser, call_func)
+{
+  test("kprobe:f { @[func] = count(); }", 0);
+  test("kprobe:f { printf(\"%s\", func);  }", 0);
+  test("kprobe:f { func(\"blah\"); }", 1);
+  test("kprobe:f { func(); }", 1);
+  test("kprobe:f { func(123); }", 1);
+}
+
+TEST(semantic_analyser, call_name)
+{
+  test("kprobe:f { @[name] = count(); }", 0);
+  test("kprobe:f { printf(\"%s\", name);  }", 0);
+  test("kprobe:f { name(\"blah\"); }", 1);
+  test("kprobe:f { name(); }", 1);
+  test("kprobe:f { name(123); }", 1);
+}
+
 TEST(semantic_analyser, map_reassignment)
 {
  test("kprobe:f { @x = 1; @x = 2; }", 0);
@@ -332,6 +447,16 @@ TEST(semantic_analyser, printf_format_multi)
  test("kprobe:f { printf(\"%d %s %d\", 1, 2, \"mystr\") }", 10);
 }

+TEST(semantic_analyser, join)
+{
+  test("kprobe:f { join(arg0) }", 0);
+  test("kprobe:f { printf(\"%s\", join(arg0)) }", 10);
+  test("kprobe:f { join() }", 1);
+  test("kprobe:f { $fmt = \"mystring\"; join($fmt) }", 10);
+  test("kprobe:f { @x = join(arg0) }", 1);
+  test("kprobe:f { $x = join(arg0) }", 1);
+}
+
 TEST(semantic_analyser, kprobe)
 {
  test("kprobe:f { 1 }", 0);
@@ -354,6 +479,13 @@ TEST(semantic_analyser, uprobe)
  test("uretprobe { 1 }", 1);
 }

+TEST(semantic_analyser, usdt)
+{
+  test("usdt:/bin/sh:probe { 1 }", 0);
+  test("usdt:/notexistfile:probe { 1 }", 1);
+  test("usdt { 1 }", 1);
+}
+
 TEST(semantic_analyser, begin_end_probes)
 {
  test("BEGIN { 1 }", 0);