Merge pull request #774 from goldshtn/usdt-tools

USDT tools for higher level languages

Merge pull request #774 from goldshtn/usdt-tools
USDT tools for higher level languages
b2fe72fc · 4ast · GitHub · 0107455c · d4a4248f · b2fe72fc
Commit b2fe72fc authored Dec 19, 2016 by 4ast Committed by GitHub Dec 19, 2016
21 changed files
--- a/README.md
+++ b/README.md
@@ -129,6 +129,12 @@ Examples:
 - tools/[tplist](tools/tplist.py): Display kernel tracepoints or USDT probes and their formats. [Examples](tools/tplist_example.txt).
 - tools/[trace](tools/trace.py): Trace arbitrary functions, with filters. [Examples](tools/trace_example.txt)
 - tools/[ttysnoop](tools/ttysnoop.py): Watch live output from a tty or pts device. [Examples](tools/ttysnoop_example.txt)
+- tools/[ucalls](tools/ucalls.py): Summarize method calls or Linux syscalls in high-level languages. [Examples](tools/ucalls_example.txt)
+- tools/[uflow](tools/uflow.py): Print a method flow graph in high-level languages. [Examples](tools/uflow_example.txt)
+- tools/[ugc](tools/ugc.py): Trace garbage collection events in high-level languages. [Examples](tools/ugc_example.txt)
+- tools/[uobjnew](tools/uobjnew.py): Summarize object allocation events by object type and number of bytes allocated. [Examples](tools/uobjnew_example.txt)
+- tools/[ustat](tools/ustat.py): Collect events such as GCs, thread creations, object allocations, exceptions and more in high-level languages. [Examples](tools/ustat_example.txt)
+- tools/[uthreads](tools/uthreads.py): Trace thread creation events in Java and raw pthreads. [Examples](tools/uthreads_example.txt)
 - tools/[vfscount](tools/vfscount.py) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt).
 - tools/[vfsstat](tools/vfsstat.py) tools/[vfsstat.c](tools/vfsstat.c): Count some VFS calls, with column output. [Examples](tools/vfsstat_example.txt).
 - tools/[wakeuptime](tools/wakeuptime.py): Summarize sleep to wakeup time by waker kernel stack. [Examples](tools/wakeuptime_example.txt).

--- a/man/man8/ucalls.8
+++ b/man/man8/ucalls.8
+.TH ucalls 8  "2016-11-07" "USER COMMANDS"
+.SH NAME
+ucalls \- Summarize method calls from high-level languages and Linux syscalls.
+.SH SYNOPSIS
+.B ucalls [-l {java,python,ruby}] [-h] [-T TOP] [-L] [-S] [-v] [-m] pid [interval]
+.SH DESCRIPTION
+This tool summarizes method calls from high-level languages such as Python, 
+Java, and Ruby. It can also trace Linux system calls. Whenever a method is 
+invoked, ucalls records the call count and optionally the method's execution
+time (latency) and displays a summary.
+
+This uses in-kernel eBPF maps to store per process summaries for efficiency.
+
+This tool relies on USDT probes embedded in many high-level languages, such as
+Node, Java, Python, and Ruby. It requires a runtime instrumented with these 
+probes, which in some cases requires building from source with a USDT-specific
+flag, such as "--enable-dtrace" or "--with-dtrace". For Java, method probes are
+not enabled by default, and can be turned on by running the Java process with
+the "-XX:+ExtendedDTraceProbes" flag.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-l {java,python,ruby,node}
+The language to trace. If not provided, only syscalls are traced (when the \-S
+option is used).
+.TP
+\-T TOP
+Print only the top methods by frequency or latency.
+.TP
+\-L
+Collect method invocation latency (duration).
+.TP
+\-S
+Collect Linux syscalls frequency and timing.
+.TP
+\-v
+Print the resulting BPF program, for debugging purposes.
+.TP
+\-m
+Print times in milliseconds (the default is microseconds).
+.TP
+pid
+The process id to trace.
+.TP
+interval
+Print summary after this number of seconds and then exit. By default, wait for
+Ctrl+C to terminate.
+.SH EXAMPLES
+.TP
+Trace the top 10 Ruby method calls:
+#
+.B ucalls -T 10 -l ruby 1344
+.TP
+Trace Python method calls and Linux syscalls including latency in milliseconds:
+#
+.B ucalls -l python -mL 2020
+.TP
+Trace only syscalls and print a summary after 10 seconds:
+#
+.B ucalls -S 788 10
+.SH OVERHEAD
+Tracing individual method calls will produce a considerable overhead in all
+high-level languages. For languages with just-in-time compilation, such as 
+Java, the overhead can be more considerable than for interpreted languages. 
+On the other hand, syscall tracing will typically be tolerable for most 
+processes, unless they have a very unusual rate of system calls.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _example.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Sasha Goldshtein
+.SH SEE ALSO
+ustat(8), argdist(8)
--- a/man/man8/uflow.8
+++ b/man/man8/uflow.8
+.TH uflow 8  "2016-11-07" "USER COMMANDS"
+.SH NAME
+uflow \- Print a flow graph of method calls in high-level languages.
+.SH SYNOPSIS
+.B uflow [-h] [-M METHOD] [-C CLAZZ] [-v] {java,python,ruby} pid
+.SH DESCRIPTION
+uflow traces method calls and prints them in a flow graph that can facilitate
+debugging and diagnostics by following the program's execution (method flow).
+
+This tool relies on USDT probes embedded in many high-level languages, such as
+Node, Java, Python, and Ruby. It requires a runtime instrumented with these 
+probes, which in some cases requires building from source with a USDT-specific
+flag, such as "--enable-dtrace" or "--with-dtrace". For Java processes, the
+startup flag "-XX:+ExtendedDTraceProbes" is required.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-M METHOD
+Print only method calls where the method name begins with this string.
+.TP
+\-C CLAZZ
+Print only method calls where the class name begins with this string. The class
+name interpretation strongly depends on the language. For example, in Java use
+"package/subpackage/ClassName" to refer to classes.
+.TP
+\-v
+Print the resulting BPF program, for debugging purposes.
+.TP
+{java,python,ruby}
+The language to trace.
+.TP
+pid
+The process id to trace.
+.SH EXAMPLES
+.TP
+Follow method flow in a Ruby process:
+#
+.B uflow ruby 148
+.TP
+Follow method flow in a Java process where the class name is java.lang.Thread:
+#
+.B uflow -C java/lang/Thread java 1802
+.SH FIELDS
+.TP
+CPU
+The CPU number on which the method was invoked. This is useful to easily see
+where the output skips to a different CPU.
+.TP
+PID
+The process id.
+.TP
+TID
+The thread id.
+.TP
+TIME
+The duration of the method call.
+.TP
+METHOD
+The method name.
+.SH OVERHEAD
+This tool has extremely high overhead because it prints every method call. For
+some scenarios, you might see lost samples in the output as the tool is unable
+to keep up with the rate of data coming from the kernel. Filtering by class 
+or method prefix can help reduce the amount of data printed, but there is still
+a very high overhead in the collection mechanism. Do not use for performance-
+sensitive production scenarios, and always test first.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _example.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Sasha Goldshtein
+.SH SEE ALSO
+trace(8), ustat(8)
--- a/man/man8/ugc.8
+++ b/man/man8/ugc.8
+.TH ugc 8  "2016-11-07" "USER COMMANDS"
+.SH NAME
+ugc \- Trace garbage collection events in high-level languages.
+.SH SYNOPSIS
+.B ugc [-h] [-v] [-m] {java,python,ruby,node} pid
+.SH DESCRIPTION
+This traces garbage collection events as they occur, including their duration
+and any additional information (such as generation collected or type of GC)
+provided by the respective language's runtime.
+
+This tool relies on USDT probes embedded in many high-level languages, such as
+Node, Java, Python, and Ruby. It requires a runtime instrumented with these 
+probes, which in some cases requires building from source with a USDT-specific
+flag, such as "--enable-dtrace" or "--with-dtrace".
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-v
+Print the resulting BPF program, for debugging purposes.
+.TP
+\-m
+Print times in milliseconds. The default is microseconds.
+.TP
+{java,python,ruby,node}
+The language to trace.
+.TP
+pid
+The process id to trace.
+.SH EXAMPLES
+.TP
+Trace garbage collections in a specific Node process:
+#
+.B ugc node 148
+.TP
+Trace garbage collections in a specific Java process, and print GC times in
+milliseconds:
+#
+.B ugc -m java 6004
+.SH FIELDS
+.TP
+START
+The start time of the GC, in seconds from the beginning of the trace.
+.TP
+DESCRIPTION
+The runtime-provided description of this garbage collection event.
+.TP
+TIME
+The duration of the garbage collection event.
+.SH OVERHEAD
+Garbage collection events, even if frequent, should not produce a considerable
+overhead when traced because they are still not very common. Even hundreds of 
+GCs per second (which is a very high rate) will still produce a fairly 
+negligible overhead.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _example.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Sasha Goldshtein
+.SH SEE ALSO
+trace(8), ustat(8), uobjnew(8)
--- a/man/man8/uobjnew.8
+++ b/man/man8/uobjnew.8
+.TH uobjnew 8  "2016-11-07" "USER COMMANDS"
+.SH NAME
+uobjnew \- Summarize object allocations in high-level languages.
+.SH SYNOPSIS
+.B uobjnew [-h] [-C TOP_COUNT] [-S TOP_SIZE] [-v] {java,ruby,c} pid [interval]
+.SH DESCRIPTION
+uobjnew traces object allocations in high-level languages (including "malloc")
+and prints summaries of the most frequently allocated types by number of 
+objects or number of bytes.
+
+This tool relies on USDT probes embedded in many high-level languages, such as
+Node, Java, Python, and Ruby. It requires a runtime instrumented with these 
+probes, which in some cases requires building from source with a USDT-specific
+flag, such as "--enable-dtrace" or "--with-dtrace". For Java, the Java process
+must be started with the "-XX:+ExtendedDTraceProbes" flag.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-C TOP_COUNT
+Print the top object types sorted by number of instances.
+.TP
+\-S TOP_SIZE
+Print the top object types sorted by size.
+.TP
+\-v
+Print the resulting BPF program, for debugging purposes.
+.TP
+{java,ruby,c}
+The language to trace.
+.TP
+pid
+The process id to trace.
+.TP
+interval
+Wait this many seconds and then print the summary and exit. By default, wait
+for Ctrl+C to exit.
+.SH EXAMPLES
+.TP
+Trace object allocations in a Ruby process:
+#
+.B uobjnew ruby 148
+.TP
+Trace object allocations from "malloc" and print the top 10 by total size:
+#
+.B uobjnew -S 10 c 1788
+.SH FIELDS
+.TP
+TYPE
+The object type being allocated. For C (malloc), this is the block size.
+.TP
+ALLOCS
+The number of objects allocated.
+.TP
+BYTES
+The number of bytes allocated.
+.SH OVERHEAD
+Object allocation events are quite frequent, and therefore the overhead from
+running this tool can be considerable. Use with caution and make sure to 
+test before using in a production environment. Nonetheless, even thousands of
+allocations per second will likely produce a reasonable overhead when 
+investigating a problem.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _example.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Sasha Goldshtein
+.SH SEE ALSO
+ustat(8), ugc(8), memleak(8)
--- a/man/man8/ustat.8
+++ b/man/man8/ustat.8
+.TH ustat 8  "2016-11-07" "USER COMMANDS"
+.SH NAME
+ustat \- Activity stats from high-level languages.
+.SH SYNOPSIS
+.B ustat [-l {java,python,ruby,node}] [-C] [-S {cload,excp,gc,method,objnew,thread}] [-r MAXROWS] [-d] [interval [count]]
+.SH DESCRIPTION
+This is "top" for high-level language events, such as garbage collections,
+exceptions, thread creations, object allocations, method calls, and more. The
+events are aggregated for each process and printed in a top-like table, which
+can be sorted by various fields.
+
+This uses in-kernel eBPF maps to store per process summaries for efficiency.
+
+This tool relies on USDT probes embedded in many high-level languages, such as
+Node, Java, Python, and Ruby. It requires a runtime instrumented with these 
+probes, which in some cases requires building from source with a USDT-specific
+flag, such as "--enable-dtrace" or "--with-dtrace". For Java, some probes are
+not enabled by default, and can be turned on by running the Java process with
+the "-XX:+ExtendedDTraceProbes" flag.
+
+Newly-created processes will only be traced at the next interval. If you run
+this tool with a short interval (say, 1-5 seconds), this should be virtually
+unnoticeable. For longer intervals, you might miss processes that were started
+and terminated during the interval window.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-l {java,python,ruby,node}
+The language to trace. By default, all languages are traced.
+.TP
+\-C
+Do not clear the screen between updates.
+.TP
+\-S {cload,excp,gc,method,objnew,thread}
+Sort the output by the specified field.
+.TP
+\-r MAXROWS
+Do not print more than this number of rows.
+.TP
+\-d
+Print the resulting BPF program, for debugging purposes.
+.TP
+interval
+Interval between updates, seconds.
+.TP
+count
+Number of interval summaries.
+.SH EXAMPLES
+.TP
+Summarize activity in high-level languages, 1 second refresh:
+#
+.B ustat
+.TP
+Don't clear the screen, and top 8 rows only:
+#
+.B ustat -Cr 8
+.TP
+5 second summaries, 10 times only:
+#
+.B ustat 5 10
+.SH FIELDS
+.TP
+loadavg
+The contents of /proc/loadavg
+.TP
+PID
+Process ID.
+.TP
+CMDLINE
+Process command line (often the second and following arguments will give you a
+hint as to which application is being run.
+.TP
+METHOD/s
+Count of method invocations during interval.
+.TP
+GC/s
+Count of garbage collections during interval.
+.TP
+OBJNEW/s
+Count of objects allocated during interval.
+.TP
+CLOAD/s
+Count of classes loaded during interval.
+.TP
+EXC/s
+Count of exceptions thrown during interval.
+.TP
+THR/s
+Count of threads created during interval.
+.SH OVERHEAD
+When using this tool with high-frequency events, such as method calls, a very
+significant slow-down can be expected. However, many of the high-level 
+languages covered by this tool already have a fairly high per-method invocation
+cost, especially when running in interpreted mode. For the lower-frequency 
+events, such as garbage collections or thread creations, the overhead should 
+not be significant. Specifically, when probing Java processes and not using the
+"-XX:+ExtendedDTraceProbes" flag, the most expensive probes are not emitted,
+and the overhead should be acceptable.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _example.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Sasha Goldshtein
+.SH SEE ALSO
+trace(8), argdist(8), tplist(8)
--- a/man/man8/uthreads.8
+++ b/man/man8/uthreads.8
+.TH uthreads 8  "2016-11-07" "USER COMMANDS"
+.SH NAME
+uthreads \- Trace thread creation events in Java or pthreads.
+.SH SYNOPSIS
+.B uthreads [-h] [-l {java}] [-v] pid
+.SH DESCRIPTION
+This traces thread creation events in Java processes, or pthread creation
+events in any process. When a thread is created, its name or start address
+is printed.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-l {java}
+The language to trace (currently only Java is supported). When no language is
+specified, only pthread creations are traced.
+.TP
+\-v
+Print the resulting BPF program, for debugging purposes.
+.TP
+pid
+The process id to trace.
+.SH EXAMPLES
+.TP
+Trace Java thread creations:
+#
+.B uthreads -l java 148
+.TP
+Trace pthread creations:
+#
+.B uthreads 1802
+.SH FIELDS
+.TP
+TIME
+The event's time in seconds from the beginning of the trace.
+.TP
+ID
+The thread's ID. The information in this column depends on the runtime.
+.TP
+TYPE
+Event type -- thread start, stop, or pthread event.
+.TP
+DESCRIPTION
+The thread's name or start address function name.
+.SH OVERHEAD
+Thread start and stop events are usually not very frequent, which makes this
+tool's overhead negligible.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _example.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Sasha Goldshtein
+.SH SEE ALSO
+ustat(8), trace(8)
--- a/src/python/bcc/__init__.py
+++ b/src/python/bcc/__init__.py
@@ -221,7 +221,7 @@ class BPF(object):
                                    "possible cause is missing pid when a " +
                                    "probe in a shared object has multiple " +
                                    "locations")
-                text = usdt_context.get_text() + text
+                text = usdt_text + text

        if text:
            self.module = lib.bpf_module_create_c_from_string(text.encode("ascii"),

--- a/src/python/bcc/usdt.py
+++ b/src/python/bcc/usdt.py
@@ -145,13 +145,15 @@ class USDT(object):
    # This is called by the BPF module's __init__ when it realizes that there
    # is a USDT context and probes need to be attached.
    def attach_uprobes(self, bpf):
+        probes = self.enumerate_active_probes()
+        for (binpath, fn_name, addr, pid) in probes:
+            bpf.attach_uprobe(name=binpath, fn_name=fn_name,
+                              addr=addr, pid=pid)
+
+    def enumerate_active_probes(self):
        probes = []
        def _add_probe(binpath, fn_name, addr, pid):
            probes.append((binpath, fn_name, addr, pid))

        lib.bcc_usdt_foreach_uprobe(self.context, _USDT_PROBE_CB(_add_probe))
-
-        for (binpath, fn_name, addr, pid) in probes:
-            bpf.attach_uprobe(name=binpath, fn_name=fn_name,
-                              addr=addr, pid=pid)
-
+        return probes
--- a/tools/ucalls.py
+++ b/tools/ucalls.py
--- a/tools/ucalls_example.txt
+++ b/tools/ucalls_example.txt
+Demonstrations of ucalls.
+
+
+ucalls summarizes method calls in various high-level languages, including Java,
+Python, Ruby, and Linux system calls. It displays statistics on the most 
+frequently called methods, as well as the latency (duration) of these methods.
+
+Through the syscalls support, ucalls can provide basic information on a 
+process' interaction with the system including syscall counts and latencies. 
+This can then be used for further exploration with other BCC tools like trace,
+argdist, biotop, fileslower, and others.
+
+For example, to trace method call latency in a Java application:
+
+# ucalls -L -l java $(pidof java)
+Tracing calls in process 26877 (language: java)... Ctrl-C to quit.
+
+METHOD                                              # CALLS TIME (us)
+java/io/BufferedInputStream.getBufIfOpen                  1 7.00
+slowy/App.isSimplePrime                                8970 8858.35
+slowy/App.isDivisible                               3228196 3076985.12
+slowy/App.isPrime                                      8969 4841017.64
+^C
+
+
+To trace only syscalls in a particular process and print the top 10 most 
+frequently-invoked ones:
+
+# ucalls -ST 10 3018
+Attached 375 kernel probes for syscall tracing.
+Tracing calls in process 3018 (language: none)... Ctrl-C to quit.
+
+METHOD                                              # CALLS
+sys_rt_sigaction                                          4
+SyS_rt_sigprocmask                                        4
+sys_mprotect                                              5
+sys_read                                                 22
+SyS_write                                                39
+SyS_epoll_wait                                           42
+sys_futex                                               177
+SyS_mmap                                                180
+sys_mmap_pgoff                                          181
+sys_munmap                                              817
+^C
+Detaching kernel probes, please wait...
+
+
+To print only the top 5 methods and report times in milliseconds (the default
+is microseconds):
+
+# ucalls -l python -mT 5 $(pidof python)
+Tracing calls in process 26914 (language: python)... Ctrl-C to quit.
+
+METHOD                                              # CALLS
+<stdin>.<module>                                          1
+<stdin>.fibo                                       14190928
+^C
+
+
+USAGE message:
+
+# ./ucalls.py -h
+usage: ucalls.py [-h] [-l {java,python,ruby}] [-T TOP] [-L] [-S] [-v] [-m]
+                 pid [interval]
+
+Summarize method calls in high-level languages.
+
+positional arguments:
+  pid                   process id to attach to
+  interval              print every specified number of seconds
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -l {java,python,ruby}, --language {java,python,ruby}
+                        language to trace (if none, trace syscalls only)
+  -T TOP, --top TOP     number of most frequent/slow calls to print
+  -L, --latency         record method latency from enter to exit (except
+                        recursive calls)
+  -S, --syscalls        record syscall latency (adds overhead)
+  -v, --verbose         verbose mode: print the BPF program (for debugging
+                        purposes)
+  -m, --milliseconds    report times in milliseconds (default is microseconds)
+
+examples:
+    ./ucalls -l java 185        # trace Java calls and print statistics on ^C
+    ./ucalls -l python 2020 1   # trace Python calls and print every second
+    ./ucalls -l java 185 -S     # trace Java calls and syscalls
+    ./ucalls 6712 -S            # trace only syscall counts
+    ./ucalls -l ruby 1344 -T 10 # trace top 10 Ruby method calls
+    ./ucalls -l ruby 1344 -L    # trace Ruby calls including latency
+    ./ucalls -l ruby 1344 -LS   # trace Ruby calls and syscalls with latency
+    ./ucalls -l python 2020 -mL # trace Python calls including latency in ms
--- a/tools/uflow.py
+++ b/tools/uflow.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# uflow  Trace method execution flow in high-level languages.
+#        For Linux, uses BCC, eBPF.
+#
+# USAGE: uflow [-C CLASS] [-M METHOD] [-v] {java,python,ruby} pid
+#
+# Copyright 2016 Sasha Goldshtein
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 27-Oct-2016   Sasha Goldshtein   Created this.
+
+from __future__ import print_function
+import argparse
+from bcc import BPF, USDT
+import ctypes as ct
+import time
+
+examples = """examples:
+    ./uflow java 185                # trace Java method calls in process 185
+    ./uflow ruby 1344               # trace Ruby method calls in process 1344
+    ./uflow -M indexOf java 185     # trace only 'indexOf'-prefixed methods
+    ./uflow -C '<stdin>' python 180 # trace only REPL-defined methods
+"""
+parser = argparse.ArgumentParser(
+    description="Trace method execution flow in high-level languages.",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("language", choices=["java", "python", "ruby"],
+    help="language to trace")
+parser.add_argument("pid", type=int, help="process id to attach to")
+parser.add_argument("-M", "--method",
+    help="trace only calls to methods starting with this prefix")
+parser.add_argument("-C", "--class", dest="clazz",
+    help="trace only calls to classes starting with this prefix")
+parser.add_argument("-v", "--verbose", action="store_true",
+    help="verbose mode: print the BPF program (for debugging purposes)")
+args = parser.parse_args()
+
+usdt = USDT(pid=args.pid)
+
+program = """
+struct call_t {
+    u64 depth;                  // first bit is direction (0 entry, 1 return)
+    u64 pid;                    // (tgid << 32) + pid from bpf_get_current...
+    u64 timestamp;              // ns
+    char clazz[80];
+    char method[80];
+};
+
+BPF_PERF_OUTPUT(calls);
+BPF_HASH(entry, u64, u64);
+"""
+
+prefix_template = """
+static inline bool prefix_%s(char *actual) {
+    char expected[] = "%s";
+    for (int i = 0; i < sizeof(expected) - 1; ++i) {
+        if (expected[i] != actual[i]) {
+            return false;
+        }
+    }
+    return true;
+}
+"""
+
+if args.clazz:
+    program += prefix_template % ("class", args.clazz)
+if args.method:
+    program += prefix_template % ("method", args.method)
+
+trace_template = """
+int NAME(struct pt_regs *ctx) {
+    u64 *depth, zero = 0, clazz = 0, method = 0 ;
+    struct call_t data = {};
+
+    READ_CLASS
+    READ_METHOD
+    bpf_probe_read(&data.clazz, sizeof(data.clazz), (void *)clazz);
+    bpf_probe_read(&data.method, sizeof(data.method), (void *)method);
+
+    FILTER_CLASS
+    FILTER_METHOD
+
+    data.pid = bpf_get_current_pid_tgid();
+    data.timestamp = bpf_ktime_get_ns();
+    depth = entry.lookup_or_init(&data.pid, &zero);
+    data.depth = DEPTH;
+    UPDATE
+
+    calls.perf_submit(ctx, &data, sizeof(data));
+    return 0;
+}
+"""
+
+def enable_probe(probe_name, func_name, read_class, read_method, is_return):
+    global program, trace_template, usdt
+    depth = "*depth + 1" if not is_return else "*depth | (1ULL << 63)"
+    update = "++(*depth);" if not is_return else  "if (*depth) --(*depth);"
+    filter_class = "if (!prefix_class(data.clazz)) { return 0; }" \
+                   if args.clazz else ""
+    filter_method = "if (!prefix_method(data.method)) { return 0; }" \
+                   if args.method else ""
+    program += trace_template.replace("NAME", func_name)                \
+                             .replace("READ_CLASS", read_class)         \
+                             .replace("READ_METHOD", read_method)       \
+                             .replace("FILTER_CLASS", filter_class)     \
+                             .replace("FILTER_METHOD", filter_method)   \
+                             .replace("DEPTH", depth)                   \
+                             .replace("UPDATE", update)
+    usdt.enable_probe(probe_name, func_name)
+
+usdt = USDT(pid=args.pid)
+
+if args.language == "java":
+    enable_probe("method__entry", "java_entry",
+                 "bpf_usdt_readarg(2, ctx, &clazz);",
+                 "bpf_usdt_readarg(4, ctx, &method);", is_return=False)
+    enable_probe("method__return", "java_return",
+                 "bpf_usdt_readarg(2, ctx, &clazz);",
+                 "bpf_usdt_readarg(4, ctx, &method);", is_return=True)
+elif args.language == "python":
+    enable_probe("function__entry", "python_entry",
+                 "bpf_usdt_readarg(1, ctx, &clazz);",   # filename really
+                 "bpf_usdt_readarg(2, ctx, &method);", is_return=False)
+    enable_probe("function__return", "python_return",
+                 "bpf_usdt_readarg(1, ctx, &clazz);",   # filename really
+                 "bpf_usdt_readarg(2, ctx, &method);", is_return=True)
+elif args.language == "ruby":
+    enable_probe("method__entry", "ruby_entry",
+                 "bpf_usdt_readarg(1, ctx, &clazz);",
+                 "bpf_usdt_readarg(2, ctx, &method);", is_return=False)
+    enable_probe("method__return", "ruby_return",
+                 "bpf_usdt_readarg(1, ctx, &clazz);",
+                 "bpf_usdt_readarg(2, ctx, &method);", is_return=True)
+    enable_probe("cmethod__entry", "ruby_centry",
+                 "bpf_usdt_readarg(1, ctx, &clazz);",
+                 "bpf_usdt_readarg(2, ctx, &method);", is_return=False)
+    enable_probe("cmethod__return", "ruby_creturn",
+                 "bpf_usdt_readarg(1, ctx, &clazz);",
+                 "bpf_usdt_readarg(2, ctx, &method);", is_return=True)
+
+if args.verbose:
+    print(usdt.get_text())
+    print(program)
+
+bpf = BPF(text=program, usdt_contexts=[usdt])
+print("Tracing method calls in %s process %d... Ctrl-C to quit." %
+      (args.language, args.pid))
+print("%-3s %-6s %-6s %-8s %s" % ("CPU", "PID", "TID", "TIME(us)", "METHOD"))
+
+class CallEvent(ct.Structure):
+    _fields_ = [
+        ("depth", ct.c_ulonglong),
+        ("pid", ct.c_ulonglong),
+        ("timestamp", ct.c_ulonglong),
+        ("clazz", ct.c_char * 80),
+        ("method", ct.c_char * 80)
+        ]
+
+start_ts = time.time()
+
+def print_event(cpu, data, size):
+    event = ct.cast(data, ct.POINTER(CallEvent)).contents
+    depth = event.depth & (~(1 << 63))
+    direction = "<- " if event.depth & (1 << 63) else "-> "
+    print("%-3d %-6d %-6d %-8.3f %-40s" % (cpu, event.pid >> 32,
+        event.pid & 0xFFFFFFFF, time.time() - start_ts,
+        ("  " * (depth - 1)) + direction + event.clazz + "." + event.method))
+
+bpf["calls"].open_perf_buffer(print_event)
+while 1:
+    bpf.kprobe_poll()
--- a/tools/uflow_example.txt
+++ b/tools/uflow_example.txt
+Demonstrations of uflow.
+
+
+uflow traces method entry and exit events and prints a visual flow graph that
+shows how methods are entered and exited, similar to a tracing debugger with
+breakpoints. This can be useful for understanding program flow in high-level
+languages such as Java, Python, and Ruby, which provide USDT probes for method
+invocations.
+
+
+For example, trace all Ruby method calls in a specific process:
+
+# ./uflow ruby 27245
+Tracing method calls in ruby process 27245... Ctrl-C to quit.
+CPU PID    TID    TIME(us) METHOD
+3   27245  27245  4.536    <- IO.gets                              
+3   27245  27245  4.536    <- IRB::StdioInputMethod.gets           
+3   27245  27245  4.536    -> IRB::Context.verbose?                
+3   27245  27245  4.536      -> NilClass.nil?                      
+3   27245  27245  4.536      <- NilClass.nil?                      
+3   27245  27245  4.536      -> IO.tty?                            
+3   27245  27245  4.536      <- IO.tty?                            
+3   27245  27245  4.536      -> Kernel.kind_of?                    
+3   27245  27245  4.536      <- Kernel.kind_of?                    
+3   27245  27245  4.536    <- IRB::Context.verbose?                
+3   27245  27245  4.536    <- IRB::Irb.signal_status               
+3   27245  27245  4.536    -> String.chars                         
+3   27245  27245  4.536    <- String.chars                         
+^C
+
+In the preceding output, indentation indicates the depth of the flow graph,
+and the <- and -> arrows indicate the direction of the event (exit or entry).
+
+Often, the amount of output can be overwhelming. You can filter specific 
+classes or methods. For example, trace only methods from the Thread class:
+
+# ./uflow -C java/lang/Thread java $(pidof java)
+Tracing method calls in java process 27722... Ctrl-C to quit.
+CPU PID    TID    TIME(us) METHOD
+3   27722  27731  3.144    -> java/lang/Thread.<init>              
+3   27722  27731  3.144      -> java/lang/Thread.init              
+3   27722  27731  3.144        -> java/lang/Thread.init            
+3   27722  27731  3.144          -> java/lang/Thread.currentThread 
+3   27722  27731  3.144          <- java/lang/Thread.currentThread 
+3   27722  27731  3.144          -> java/lang/Thread.getThreadGroup
+3   27722  27731  3.144          <- java/lang/Thread.getThreadGroup
+3   27722  27731  3.144          -> java/lang/ThreadGroup.checkAccess
+3   27722  27731  3.144          <- java/lang/ThreadGroup.checkAccess
+3   27722  27731  3.144          -> java/lang/ThreadGroup.addUnstarted
+3   27722  27731  3.144          <- java/lang/ThreadGroup.addUnstarted
+3   27722  27731  3.145          -> java/lang/Thread.isDaemon     
+3   27722  27731  3.145          <- java/lang/Thread.isDaemon     
+3   27722  27731  3.145          -> java/lang/Thread.getPriority   
+3   27722  27731  3.145          <- java/lang/Thread.getPriority   
+3   27722  27731  3.145          -> java/lang/Thread.getContextClassLoader
+3   27722  27731  3.145          <- java/lang/Thread.getContextClassLoader
+3   27722  27731  3.145          -> java/lang/Thread.setPriority   
+3   27722  27731  3.145            -> java/lang/Thread.checkAccess 
+3   27722  27731  3.145            <- java/lang/Thread.checkAccess 
+3   27722  27731  3.145            -> java/lang/Thread.getThreadGroup
+3   27722  27731  3.145            <- java/lang/Thread.getThreadGroup
+3   27722  27731  3.145            -> java/lang/ThreadGroup.getMaxPriority
+3   27722  27731  3.145            <- java/lang/ThreadGroup.getMaxPriority
+3   27722  27731  3.145            -> java/lang/Thread.setPriority0
+3   27722  27731  3.145            <- java/lang/Thread.setPriority0
+3   27722  27731  3.145          <- java/lang/Thread.setPriority   
+3   27722  27731  3.145          -> java/lang/Thread.nextThreadID  
+3   27722  27731  3.145          <- java/lang/Thread.nextThreadID  
+3   27722  27731  3.145        <- java/lang/Thread.init            
+3   27722  27731  3.145      <- java/lang/Thread.init              
+3   27722  27731  3.145    <- java/lang/Thread.<init>              
+3   27722  27731  3.145    -> java/lang/Thread.start               
+3   27722  27731  3.145      -> java/lang/ThreadGroup.add          
+3   27722  27731  3.145      <- java/lang/ThreadGroup.add          
+3   27722  27731  3.145      -> java/lang/Thread.start0            
+3   27722  27731  3.145      <- java/lang/Thread.start0            
+3   27722  27731  3.146    <- java/lang/Thread.start               
+2   27722  27742  3.146    -> java/lang/Thread.run                 
+^C
+
+The reason that the CPU number is printed in the first column is that events
+from different threads can be reordered when running on different CPUs, and
+produce non-sensible output. By looking for changes in the CPU column, you can
+easily see if the events you're following make sense and belong to the same
+thread running on the same CPU.
+
+
+USAGE message:
+
+# ./uflow -h
+usage: uflow.py [-h] [-M METHOD] [-C CLAZZ] [-v] {java,python,ruby} pid
+
+Trace method execution flow in high-level languages.
+
+positional arguments:
+  {java,python,ruby}    language to trace
+  pid                   process id to attach to
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -M METHOD, --method METHOD
+                        trace only calls to methods starting with this prefix
+  -C CLAZZ, --class CLAZZ
+                        trace only calls to classes starting with this prefix
+  -v, --verbose         verbose mode: print the BPF program (for debugging
+                        purposes)
+
+examples:
+    ./uflow java 185                # trace Java method calls in process 185
+    ./uflow ruby 1344               # trace Ruby method calls in process 1344
+    ./uflow -M indexOf java 185     # trace only 'indexOf'-prefixed methods
+    ./uflow -C '<stdin>' python 180 # trace only REPL-defined methods
--- a/tools/ugc.py
+++ b/tools/ugc.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# ugc  Summarize garbage collection events in high-level languages.
+#      For Linux, uses BCC, eBPF.
+#
+# USAGE: ugc [-v] [-m] {java,python,ruby,node} pid
+#
+# Copyright 2016 Sasha Goldshtein
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 19-Oct-2016   Sasha Goldshtein   Created this.
+
+from __future__ import print_function
+import argparse
+from bcc import BPF, USDT
+import ctypes as ct
+import time
+
+examples = """examples:
+    ./ugc java 185           # trace Java GCs in process 185
+    ./ugc ruby 1344 -m       # trace Ruby GCs reporting in ms
+"""
+parser = argparse.ArgumentParser(
+    description="Summarize garbage collection events in high-level languages.",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("language", choices=["java", "python", "ruby", "node"],
+    help="language to trace")
+parser.add_argument("pid", type=int, help="process id to attach to")
+parser.add_argument("-v", "--verbose", action="store_true",
+    help="verbose mode: print the BPF program (for debugging purposes)")
+parser.add_argument("-m", "--milliseconds", action="store_true",
+    help="report times in milliseconds (default is microseconds)")
+args = parser.parse_args()
+
+usdt = USDT(pid=args.pid)
+
+program = """
+struct gc_event_t {
+    u64 probe_index;
+    u64 elapsed_ns;
+    u64 field1;
+    u64 field2;
+    u64 field3;
+    u64 field4;
+    char string1[32];
+    char string2[32];
+};
+struct entry_t {
+    u64 start_ns;
+    u64 field1;
+    u64 field2;
+};
+
+BPF_PERF_OUTPUT(gcs);
+BPF_HASH(entry, u64, struct entry_t);
+"""
+
+class Probe(object):
+    def __init__(self, begin, end, begin_save, end_save, formatter):
+        self.begin = begin
+        self.end = end
+        self.begin_save = begin_save
+        self.end_save = end_save
+        self.formatter = formatter
+
+    def generate(self):
+        text = """
+int trace_%s(struct pt_regs *ctx) {
+    u64 pid = bpf_get_current_pid_tgid();
+    struct entry_t e = {};
+    e.start_ns = bpf_ktime_get_ns();
+    %s
+    entry.update(&pid, &e);
+    return 0;
+}
+int trace_%s(struct pt_regs *ctx) {
+    u64 elapsed;
+    struct entry_t *e;
+    struct gc_event_t event = {};
+    u64 pid = bpf_get_current_pid_tgid();
+    e = entry.lookup(&pid);
+    if (!e) {
+        return 0;   // missed the entry event on this thread
+    }
+    elapsed = bpf_ktime_get_ns() - e->start_ns;
+    event.elapsed_ns = elapsed;
+    %s
+    gcs.perf_submit(ctx, &event, sizeof(event));
+    return 0;
+}
+        """ % (self.begin, self.begin_save, self.end, self.end_save)
+        return text
+
+    def attach(self):
+        usdt.enable_probe(self.begin, "trace_%s" % self.begin)
+        usdt.enable_probe(self.end, "trace_%s" % self.end)
+
+    def format(self, data):
+        return self.formatter(data)
+
+probes = []
+
+#
+# Java
+#
+if args.language == "java":
+    # Oddly, the gc__begin/gc__end probes don't really have any useful
+    # information, while the mem__pool* ones do. There's also a bunch of
+    # probes described in the hotspot_gc*.stp file which aren't there
+    # when looking at a live Java process.
+    begin_save = """
+    bpf_usdt_readarg(6, ctx, &e.field1);    // used bytes
+    bpf_usdt_readarg(8, ctx, &e.field2);    // max bytes
+    """
+    end_save = """
+    event.field1 = e->field1;                  // used bytes at start
+    event.field2 = e->field2;                  // max bytes at start
+    bpf_usdt_readarg(6, ctx, &event.field3);   // used bytes at end
+    bpf_usdt_readarg(8, ctx, &event.field4);   // max bytes at end
+    u64 manager = 0, pool = 0;
+    bpf_usdt_readarg(1, ctx, &manager);        // ptr to manager name
+    bpf_usdt_readarg(3, ctx, &pool);           // ptr to pool name
+    bpf_probe_read(&event.string1, sizeof(event.string1), (void *)manager);
+    bpf_probe_read(&event.string2, sizeof(event.string2), (void *)pool);
+    """
+    formatter = lambda e: "%s %s used=%d->%d max=%d->%d" % \
+                (e.string1, e.string2, e.field1, e.field3, e.field2, e.field4)
+    probes.append(Probe("mem__pool__gc__begin", "mem__pool__gc__end",
+                        begin_save, end_save, formatter))
+    probes.append(Probe("gc__begin", "gc__end",
+                        "", "", lambda _: "no additional info available"))
+#
+# Python
+#
+elif args.language == "python":
+    begin_save = """
+    int gen = 0;
+    bpf_usdt_readarg(1, ctx, &gen);
+    e.field1 = gen;
+    """
+    end_save = """
+    long objs = 0;
+    bpf_usdt_readarg(1, ctx, &objs);
+    event.field1 = e->field1;
+    event.field2 = objs;
+    """
+    formatter = lambda event: "gen %d GC collected %d objects" % \
+                              (event.field1, event.field2)
+    probes.append(Probe("gc__start", "gc__done",
+                        begin_save, end_save, formatter))
+#
+# Ruby
+#
+elif args.language == "ruby":
+    # Ruby GC probes do not have any additional information available.
+    probes.append(Probe("gc__mark__begin", "gc__mark__end",
+                        "", "", lambda _: "GC mark stage"))
+    probes.append(Probe("gc__sweep__begin", "gc__sweep__end",
+                        "", "", lambda _: "GC sweep stage"))
+#
+# Node
+#
+elif args.language == "node":
+    end_save = """
+    u32 gc_type = 0;
+    bpf_usdt_readarg(1, ctx, &gc_type);
+    event.field1 = gc_type;
+    """
+    descs = {"GC scavenge": 1, "GC mark-sweep-compact": 2,
+             "GC incremental mark": 4, "GC weak callbacks": 8}
+    probes.append(Probe("gc__start", "gc__done", "", end_save,
+                  lambda e: str.join(", ",
+                                     [desc for desc, val in descs.items()
+                                      if e.field1 & val != 0])))
+
+for probe in probes:
+    program += probe.generate()
+    probe.attach()
+
+if args.verbose:
+    print(usdt.get_text())
+    print(program)
+
+bpf = BPF(text=program, usdt_contexts=[usdt])
+print("Tracing garbage collections in %s process %d... Ctrl-C to quit." %
+      (args.language, args.pid))
+time_col = "TIME (ms)" if args.milliseconds else "TIME (us)"
+print("%-8s %-40s %-8s" % ("START", "DESCRIPTION", time_col))
+
+class GCEvent(ct.Structure):
+    _fields_ = [
+        ("probe_index", ct.c_ulonglong),
+        ("elapsed_ns", ct.c_ulonglong),
+        ("field1", ct.c_ulonglong),
+        ("field2", ct.c_ulonglong),
+        ("field3", ct.c_ulonglong),
+        ("field4", ct.c_ulonglong),
+        ("string1", ct.c_char * 32),
+        ("string2", ct.c_char * 32)
+        ]
+
+start_ts = time.time()
+
+def print_event(cpu, data, size):
+    event = ct.cast(data, ct.POINTER(GCEvent)).contents
+    elapsed = event.elapsed_ns/1000000 if args.milliseconds else \
+              event.elapsed_ns/1000
+    print("%-8.3f %-40s %-8.2f" % (time.time() - start_ts,
+                                   probes[event.probe_index].format(event),
+                                   elapsed))
+
+bpf["gcs"].open_perf_buffer(print_event)
+while 1:
+    bpf.kprobe_poll()
--- a/tools/ugc_example.txt
+++ b/tools/ugc_example.txt
+Demonstrations of ugc.
+
+
+ugc traces garbage collection events in high-level languages, including Java,
+Python, Ruby, and Node. Each GC event is printed with some additional 
+information provided by that language's runtime, if available. The duration of
+the GC event is also provided.
+
+For example, to trace all garbage collection events in a specific Node process:
+
+# ./ugc node $(pidof node)
+Tracing garbage collections in node process 3018... Ctrl-C to quit.
+START    DESCRIPTION                              TIME (us)
+3.864    GC mark-sweep-compact                    3189.00 
+4.937    GC scavenge                              1254.00 
+4.940    GC scavenge                              1657.00 
+4.943    GC scavenge                              1171.00 
+4.949    GC scavenge                              2216.00 
+4.954    GC scavenge                              2515.00 
+4.960    GC scavenge                              2243.00 
+4.966    GC scavenge                              2410.00 
+4.976    GC scavenge                              3003.00 
+4.986    GC scavenge                              4174.00 
+4.994    GC scavenge                              1508.00 
+5.003    GC scavenge                              1966.00 
+5.010    GC scavenge                              1636.00 
+5.022    GC scavenge                              3564.00 
+5.035    GC scavenge                              3275.00 
+5.045    GC incremental mark                      157.00  
+5.049    GC mark-sweep-compact                    3248.00 
+5.060    GC scavenge                              4785.00 
+5.081    GC scavenge                              6616.00 
+5.094    GC scavenge                              8570.00 
+5.144    GC scavenge                              456.00  
+7.188    GC scavenge                              2345.00 
+7.227    GC scavenge                              12054.00
+7.253    GC scavenge                              15626.00
+7.304    GC scavenge                              15329.00
+7.384    GC scavenge                              7168.00 
+7.411    GC scavenge                              3794.00 
+7.414    GC incremental mark                      123.00  
+7.430    GC mark-sweep-compact                    7110.00 
+^C
+
+
+USAGE message:
+
+# ./ugc -h
+usage: ugc.py [-h] [-v] [-m] {java,python,ruby,node} pid
+
+Summarize garbage collection events in high-level languages.
+
+positional arguments:
+  {java,python,ruby,node}
+                        language to trace
+  pid                   process id to attach to
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -v, --verbose         verbose mode: print the BPF program (for debugging
+                        purposes)
+  -m, --milliseconds    report times in milliseconds (default is microseconds)
+
+examples:
+    ./ugc java 185           # trace Java GCs in process 185
+    ./ugc ruby 1344 -m       # trace Ruby GCs reporting in ms
--- a/tools/uobjnew.py
+++ b/tools/uobjnew.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# uobjnew  Summarize object allocations in high-level languages.
+#          For Linux, uses BCC, eBPF.
+#
+# USAGE: uobjnew [-h] [-T TOP] [-v] {java,ruby,c} pid [interval]
+#
+# Copyright 2016 Sasha Goldshtein
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 25-Oct-2016   Sasha Goldshtein   Created this.
+
+from __future__ import print_function
+import argparse
+from bcc import BPF, USDT
+from time import sleep
+
+examples = """examples:
+    ./uobjnew java 145         # summarize Java allocations in process 145
+    ./uobjnew c 2020 1         # grab malloc() sizes and print every second
+    ./uobjnew ruby 6712 -C 10  # top 10 Ruby types by number of allocations
+    ./uobjnew ruby 6712 -S 10  # top 10 Ruby types by total size
+"""
+parser = argparse.ArgumentParser(
+    description="Summarize object allocations in high-level languages.",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("language", choices=["java", "ruby", "c"],
+    help="language to trace")
+parser.add_argument("pid", type=int, help="process id to attach to")
+parser.add_argument("interval", type=int, nargs='?',
+    help="print every specified number of seconds")
+parser.add_argument("-C", "--top-count", type=int,
+    help="number of most frequently allocated types to print")
+parser.add_argument("-S", "--top-size", type=int,
+    help="number of largest types by allocated bytes to print")
+parser.add_argument("-v", "--verbose", action="store_true",
+    help="verbose mode: print the BPF program (for debugging purposes)")
+args = parser.parse_args()
+
+program = """
+#include <linux/ptrace.h>
+
+struct key_t {
+#if MALLOC_TRACING
+    u64 size;
+#else
+    char name[50];
+#endif
+};
+
+struct val_t {
+    u64 total_size;
+    u64 num_allocs;
+};
+
+BPF_HASH(allocs, struct key_t, struct val_t);
+""".replace("MALLOC_TRACING", "1" if args.language == "c" else "0")
+
+usdt = USDT(pid=args.pid)
+
+#
+# Java
+#
+if args.language == "java":
+    program += """
+int alloc_entry(struct pt_regs *ctx) {
+    struct key_t key = {};
+    struct val_t *valp, zero = {};
+    u64 classptr = 0, size = 0;
+    bpf_usdt_readarg(2, ctx, &classptr);
+    bpf_usdt_readarg(4, ctx, &size);
+    bpf_probe_read(&key.name, sizeof(key.name), (void *)classptr);
+    valp = allocs.lookup_or_init(&key, &zero);
+    valp->total_size += size;
+    valp->num_allocs += 1;
+    return 0;
+}
+    """
+    usdt.enable_probe("object__alloc", "alloc_entry")
+#
+# Ruby
+#
+elif args.language == "ruby":
+    create_template = """
+int THETHING_alloc_entry(struct pt_regs *ctx) {
+    struct key_t key = { .name = "THETHING" };
+    struct val_t *valp, zero = {};
+    u64 size = 0;
+    bpf_usdt_readarg(1, ctx, &size);
+    valp = allocs.lookup_or_init(&key, &zero);
+    valp->total_size += size;
+    valp->num_allocs += 1;
+    return 0;
+}
+    """
+    program += """
+int object_alloc_entry(struct pt_regs *ctx) {
+    struct key_t key = {};
+    struct val_t *valp, zero = {};
+    u64 classptr = 0;
+    bpf_usdt_readarg(1, ctx, &classptr);
+    bpf_probe_read(&key.name, sizeof(key.name), (void *)classptr);
+    valp = allocs.lookup_or_init(&key, &zero);
+    valp->num_allocs += 1;  // We don't know the size, unfortunately
+    return 0;
+}
+    """
+    usdt.enable_probe("object__create", "object_alloc_entry")
+    for thing in ["string", "hash", "array"]:
+        program += create_template.replace("THETHING", thing)
+        usdt.enable_probe("%s__create" % thing, "%s_alloc_entry" % thing)
+#
+# C
+#
+elif args.language == "c":
+    program += """
+int alloc_entry(struct pt_regs *ctx, size_t size) {
+    struct key_t key = {};
+    struct val_t *valp, zero = {};
+    key.size = size;
+    valp = allocs.lookup_or_init(&key, &zero);
+    valp->total_size += size;
+    valp->num_allocs += 1;
+    return 0;
+}
+    """
+
+if args.verbose:
+    print(usdt.get_text())
+    print(program)
+
+bpf = BPF(text=program, usdt_contexts=[usdt])
+if args.language == "c":
+    bpf.attach_uprobe(name="c", sym="malloc", fn_name="alloc_entry",
+                      pid=args.pid)
+
+exit_signaled = False
+print("Tracing allocations in process %d (language: %s)... Ctrl-C to quit." %
+      (args.pid, args.language or "none"))
+while True:
+    try:
+        sleep(args.interval or 99999999)
+    except KeyboardInterrupt:
+        exit_signaled = True
+    print()
+    data = bpf["allocs"]
+    if args.top_count:
+        data = sorted(data.items(), key=lambda (k, v): v.num_allocs)
+        data = data[-args.top_count:]
+    elif args.top_size:
+        data = sorted(data.items(), key=lambda (k, v): v.total_size)
+        data = data[-args.top_size:]
+    else:
+        data = sorted(data.items(), key=lambda (k, v): v.total_size)
+    print("%-30s %8s %12s" % ("TYPE", "# ALLOCS", "# BYTES"))
+    for key, value in data:
+        if args.language == "c":
+            obj_type = "block size %d" % key.size
+        else:
+            obj_type = key.name
+        print("%-30s %8d %12d" %
+              (obj_type, value.num_allocs, value.total_size))
+    if args.interval and not exit_signaled:
+        bpf["allocs"].clear()
+    else:
+        exit()
--- a/tools/uobjnew_example.txt
+++ b/tools/uobjnew_example.txt
+Demonstrations of uobjnew.
+
+
+uobjnew summarizes new object allocation events and prints out statistics on
+which object type has been allocated frequently, and how many bytes of that
+type have been allocated. This helps diagnose common allocation paths, which
+can in turn cause heavy garbage collection.
+
+For example, trace Ruby object allocations when running some simple commands
+in irb (the Ruby REPL):
+
+# ./uobjnew ruby 27245
+Tracing allocations in process 27245 (language: ruby)... Ctrl-C to quit.
+
+TYPE                           # ALLOCS      # BYTES
+NameError                             1            0
+RubyToken::TkSPACE                    1            0
+RubyToken::TkSTRING                   1            0
+String                                7            0
+RubyToken::TkNL                       2            0
+RubyToken::TkIDENTIFIER               2            0
+array                                55          129
+string                              344         1348
+^C
+
+
+Plain C/C++ allocations (through "malloc") are also supported. We can't report
+the type being allocated, but we can report the object sizes at least. Also,
+print only the top 10 rows by number of bytes allocated:
+
+# ./uobjnew -S 10 c 27245
+Tracing allocations in process 27245 (language: c)... Ctrl-C to quit.
+
+TYPE                           # ALLOCS      # BYTES
+block size 64                        22         1408
+block size 992                        2         1984
+block size 32                        68         2176
+block size 48                        48         2304
+block size 944                        4         3776
+block size 1104                       4         4416
+block size 160                       32         5120
+block size 535                       15         8025
+block size 128                      112        14336
+block size 80                       569        45520
+^C
+
+
+USAGE message:
+
+# ./uobjnew -h
+usage: uobjnew.py [-h] [-C TOP_COUNT] [-S TOP_SIZE] [-v]
+                  {java,ruby,c} pid [interval]
+
+Summarize object allocations in high-level languages.
+
+positional arguments:
+  {java,ruby,c}         language to trace
+  pid                   process id to attach to
+  interval              print every specified number of seconds
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -C TOP_COUNT, --top-count TOP_COUNT
+                        number of most frequently allocated types to print
+  -S TOP_SIZE, --top-size TOP_SIZE
+                        number of largest types by allocated bytes to print
+  -v, --verbose         verbose mode: print the BPF program (for debugging
+                        purposes)
+
+examples:
+    ./uobjnew java 145         # summarize Java allocations in process 145
+    ./uobjnew c 2020 1         # grab malloc() sizes and print every second
+    ./uobjnew ruby 6712 -C 10  # top 10 Ruby types by number of allocations
+    ./uobjnew ruby 6712 -S 10  # top 10 Ruby types by total size
--- a/tools/ustat.py
+++ b/tools/ustat.py
--- a/tools/ustat_example.txt
+++ b/tools/ustat_example.txt
+Demonstrations of ustat.
+
+
+ustat is a "top"-like tool for monitoring events in high-level languages. It 
+prints statistics about garbage collections, method calls, object allocations,
+and various other events for every process that it recognizes with a Java,
+Python, Ruby, or Node runtime.
+
+For example:
+
+# ./ustat.py
+Tracing... Output every 10 secs. Hit Ctrl-C to end
+12:17:17 loadavg: 0.33 0.08 0.02 5/211 26284
+
+PID    CMDLINE              METHOD/s   GC/s   OBJNEW/s   CLOAD/s  EXC/s  THR/s 
+3018   node/node            0          3      0          0        0      0     
+^C
+Detaching...
+
+
+If desired, you can instruct ustat to print a certain number of entries and 
+exit, which can be useful to get a quick picture on what's happening on the 
+system over a short time interval. Here, we ask ustat to print 5-second 
+summaries 12 times (for a total time of 1 minute):
+
+# ./ustat.py -C 5 12
+Tracing... Output every 5 secs. Hit Ctrl-C to end
+12:18:26 loadavg: 0.27 0.11 0.04 2/336 26455
+
+PID    CMDLINE              METHOD/s   GC/s   OBJNEW/s   CLOAD/s  EXC/s  THR/s 
+3018   node/node            0          1      0          0        0      0     
+
+12:18:31 loadavg: 0.33 0.12 0.04 2/336 26456
+
+PID    CMDLINE              METHOD/s   GC/s   OBJNEW/s   CLOAD/s  EXC/s  THR/s 
+3018   node/node            0          0      0          0        0      0     
+26439  java -XX:+ExtendedDT 2776045    0      0          0        0      0     
+
+12:18:37 loadavg: 0.38 0.14 0.05 2/336 26457
+
+PID    CMDLINE              METHOD/s   GC/s   OBJNEW/s   CLOAD/s  EXC/s  THR/s 
+3018   node/node            0          0      0          0        0      0     
+26439  java -XX:+ExtendedDT 2804378    0      0          0        0      0     
+
+(...more output omitted for brevity)
+
+
+USAGE message:
+
+# ./ustat.py -h
+usage: ustat.py [-h] [-l {java,python,ruby,node}] [-C]
+                [-S {cload,excp,gc,method,objnew,thread}] [-r MAXROWS] [-d]
+                [interval] [count]
+
+Activity stats from high-level languages.
+
+positional arguments:
+  interval              output interval, in seconds
+  count                 number of outputs
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -l {java,python,ruby,node}, --language {java,python,ruby,node}
+                        language to trace (default: all languages)
+  -C, --noclear         don't clear the screen
+  -S {cload,excp,gc,method,objnew,thread}, --sort {cload,excp,gc,method,objnew,thread}
+                        sort by this field (descending order)
+  -r MAXROWS, --maxrows MAXROWS
+                        maximum rows to print, default 20
+  -d, --debug           Print the resulting BPF program (for debugging
+                        purposes)
+
+examples:
+  ./ustat              # stats for all languages, 1 second refresh
+  ./ustat -C           # don't clear the screen
+  ./ustat -l java      # Java processes only
+  ./ustat 5            # 5 second summaries
+  ./ustat 5 10         # 5 second summaries, 10 times only 
--- a/tools/uthreads.py
+++ b/tools/uthreads.py
+#!/usr/bin/python
+# @lint-avoid-python-3-compatibility-imports
+#
+# uthreads  Trace thread creation/destruction events in high-level languages.
+#           For Linux, uses BCC, eBPF.
+#
+# USAGE: uthreads [-l {java}] [-v] pid
+#
+# Copyright 2016 Sasha Goldshtein
+# Licensed under the Apache License, Version 2.0 (the "License")
+#
+# 25-Oct-2016   Sasha Goldshtein   Created this.
+
+from __future__ import print_function
+import argparse
+from bcc import BPF, USDT
+import ctypes as ct
+import time
+
+examples = """examples:
+    ./uthreads -l java 185   # trace Java threads in process 185
+    ./uthreads 12245         # trace only pthreads in process 12245
+"""
+parser = argparse.ArgumentParser(
+    description="Trace thread creation/destruction events in " +
+                "high-level languages.",
+    formatter_class=argparse.RawDescriptionHelpFormatter,
+    epilog=examples)
+parser.add_argument("-l", "--language", choices=["java"],
+    help="language to trace (none for pthreads only)")
+parser.add_argument("pid", type=int, help="process id to attach to")
+parser.add_argument("-v", "--verbose", action="store_true",
+    help="verbose mode: print the BPF program (for debugging purposes)")
+args = parser.parse_args()
+
+usdt = USDT(pid=args.pid)
+
+program = """
+struct thread_event_t {
+    u64 runtime_id;
+    u64 native_id;
+    char type[8];
+    char name[80];
+};
+
+BPF_PERF_OUTPUT(threads);
+
+int trace_pthread(struct pt_regs *ctx) {
+    struct thread_event_t te = {};
+    u64 start_routine = 0;
+    char type[] = "pthread";
+    te.native_id = bpf_get_current_pid_tgid() & 0xFFFFFFFF;
+    bpf_usdt_readarg(2, ctx, &start_routine);
+    te.runtime_id = start_routine;  // This is really a function pointer
+    __builtin_memcpy(&te.type, type, sizeof(te.type));
+    threads.perf_submit(ctx, &te, sizeof(te));
+    return 0;
+}
+"""
+usdt.enable_probe("pthread_start", "trace_pthread")
+
+if args.language == "java":
+    template = """
+int %s(struct pt_regs *ctx) {
+    char type[] = "%s";
+    struct thread_event_t te = {};
+    u64 nameptr = 0, id = 0, native_id = 0;
+    bpf_usdt_readarg(1, ctx, &nameptr);
+    bpf_usdt_readarg(3, ctx, &id);
+    bpf_usdt_readarg(4, ctx, &native_id);
+    bpf_probe_read(&te.name, sizeof(te.name), (void *)nameptr);
+    te.runtime_id = id;
+    te.native_id = native_id;
+    __builtin_memcpy(&te.type, type, sizeof(te.type));
+    threads.perf_submit(ctx, &te, sizeof(te));
+    return 0;
+}
+    """
+    program += template % ("trace_start", "start")
+    program += template % ("trace_stop", "stop")
+    usdt.enable_probe("thread__start", "trace_start")
+    usdt.enable_probe("thread__stop", "trace_stop")
+
+if args.verbose:
+    print(usdt.get_text())
+    print(program)
+
+bpf = BPF(text=program, usdt_contexts=[usdt])
+print("Tracing thread events in process %d (language: %s)... Ctrl-C to quit." %
+      (args.pid, args.language or "none"))
+print("%-8s %-16s %-8s %-30s" % ("TIME", "ID", "TYPE", "DESCRIPTION"))
+
+class ThreadEvent(ct.Structure):
+    _fields_ = [
+        ("runtime_id", ct.c_ulonglong),
+        ("native_id", ct.c_ulonglong),
+        ("type", ct.c_char * 8),
+        ("name", ct.c_char * 80),
+        ]
+
+start_ts = time.time()
+
+def print_event(cpu, data, size):
+    event = ct.cast(data, ct.POINTER(ThreadEvent)).contents
+    name = event.name
+    if event.type == "pthread":
+        name = bpf.sym(event.runtime_id, args.pid)
+        tid = event.native_id
+    else:
+        tid = "R=%s/N=%s" % (event.runtime_id, event.native_id)
+    print("%-8.3f %-16s %-8s %-30s" % (
+        time.time() - start_ts, tid, event.type, name))
+
+bpf["threads"].open_perf_buffer(print_event)
+while 1:
+    bpf.kprobe_poll()
--- a/tools/uthreads_example.txt
+++ b/tools/uthreads_example.txt
+Demonstrations of uthreads.
+
+
+uthreads traces thread creation events in Java or raw pthreads, and prints
+details about the newly created thread. For Java threads, the thread name is
+printed; for pthreads, the thread's start function is printed, if there is
+symbol information to resolve it.
+
+For example, trace all Java thread creation events:
+
+# ./uthreads -l java 27420
+Tracing thread events in process 27420 (language: java)... Ctrl-C to quit.
+TIME     ID               TYPE     DESCRIPTION                   
+18.596   R=9/N=0          start    SIGINT handler                
+18.596   R=4/N=0          stop     Signal Dispatcher             
+^C
+
+The ID column in the preceding output shows the thread's runtime ID and native
+ID, when available. The accuracy of this information depends on the Java 
+runtime.
+
+
+Next, trace only pthread creation events in some native application:
+
+# ./uthreads 27450
+Tracing thread events in process 27450 (language: none)... Ctrl-C to quit.
+TIME     ID               TYPE     DESCRIPTION                   
+0.924    27462            pthread  primes_thread                 
+0.927    27463            pthread  primes_thread                 
+0.928    27464            pthread  primes_thread                 
+0.928    27465            pthread  primes_thread                 
+^C
+
+The thread name ("primes_thread" in this example) is resolved from debuginfo.
+If symbol information is not present, the thread's start address is printed
+instead.
+
+
+USAGE message:
+
+# ./uthreads -h
+usage: uthreads.py [-h] [-l {java}] [-v] pid
+
+Trace thread creation/destruction events in high-level languages.
+
+positional arguments:
+  pid                   process id to attach to
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -l {java}, --language {java}
+                        language to trace (none for pthreads only)
+  -v, --verbose         verbose mode: print the BPF program (for debugging
+                        purposes)
+
+examples:
+    ./uthreads -l java 185   # trace Java threads in process 185
+    ./uthreads 12245         # trace only pthreads in process 12245