Commit e663541c authored by Brenden Blanco's avatar Brenden Blanco

python: Add 2/3 compat wrappers for byte strings

Introduce some helpers for managing bytes/unicode objects in a way that
bridges the gap from python2 to 3.

1. Add printb() helper for writing bytes output directly to stdout. This
avoids complaints from print() in python3, which expects a unicode
str(). Since python 3.5, `b"" % bytes()` style format strings should
work and we can write tools with common code, once we convert format
strings to bytes.
http://legacy.python.org/dev/peps/pep-0461/

2. Add a class for wrapping command line arguments that are intended for
comparing to debugged memory, for instance running process COMM or
kernel pathname data. The approach takes some of the discussion from
http://legacy.python.org/dev/peps/pep-0383/ into account, though
unfortunately the python2-future implementation of "surrogateescape" is
buggy, therefore this iteration is partial.

The object instance should only ever be coerced into a bytes object.
This silently invokes encode(sys.getfilesystemencoding()), which if it
fails implies that the tool was passed junk characters on the command
line. Thereafter the tool should implement only bytes-bytes comparisons
(for instance re.search(b"", b"")) and bytes stdout printing (see
printb).

3. Add an _assert_is_bytes helper to check for improper usage of str
objects in python arguments. The behavior of the assertion can be
tweaked by changing the bcc.utils._strict_bytes bool.

Going forward, one should never invoke decode() on a bpf data stream,
e.g. the result of a table lookup or perf ring output. Leave that data
in the native bytes() representation.
Signed-off-by: default avatarBrenden Blanco <bblanco@gmail.com>
parent c28f6e86
...@@ -12,6 +12,9 @@ ...@@ -12,6 +12,9 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import ctypes as ct import ctypes as ct
import sys
import traceback
import warnings
from .libbcc import lib from .libbcc import lib
...@@ -39,3 +42,57 @@ def detect_language(candidates, pid): ...@@ -39,3 +42,57 @@ def detect_language(candidates, pid):
res = lib.bcc_procutils_language(pid) res = lib.bcc_procutils_language(pid)
language = ct.cast(res, ct.c_char_p).value.decode() language = ct.cast(res, ct.c_char_p).value.decode()
return language if language in candidates else None return language if language in candidates else None
FILESYSTEMENCODING = sys.getfilesystemencoding()
def printb(s, file=sys.stdout):
"""
printb(s)
print a bytes object to stdout and flush
"""
buf = file.buffer if hasattr(file, "buffer") else file
buf.write(s)
buf.write(b"\n")
file.flush()
class ArgString(object):
"""
ArgString(arg)
encapsulate a system argument that can be easily coerced to a bytes()
object, which is better for comparing to kernel or probe data (which should
never be en/decode()'ed).
"""
def __init__(self, arg):
if sys.version_info[0] >= 3:
self.s = arg
else:
self.s = arg.decode(FILESYSTEMENCODING)
def __bytes__(self):
return self.s.encode(FILESYSTEMENCODING)
def __str__(self):
return self.__bytes__()
def warn_with_traceback(message, category, filename, lineno, file=None, line=None):
log = file if hasattr(file, "write") else sys.stderr
traceback.print_stack(f=sys._getframe(2), file=log)
log.write(warnings.formatwarning(message, category, filename, lineno, line))
# uncomment to get full tracebacks for invalid uses of python3+str in arguments
#warnings.showwarning = warn_with_traceback
_strict_bytes = False
def _assert_is_bytes(arg):
if arg is None:
return arg
if _strict_bytes:
assert type(arg) is bytes, "not a bytes object: %r" % arg
elif type(arg) is not bytes:
warnings.warn("not a bytes object: %r" % arg, DeprecationWarning, 2)
return ArgString(arg).__bytes__()
return arg
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment