Experimental: speed up calling of capi code

The main capi calling convention is to box all the positional arguments into a tuple, and then pass the tuple to PyArg_ParseTuple along with a format string that describes how to parse out the arguments. This ends up being pretty wasteful and misses all of the fast argument-rearrangement that we are able to JIT out. These unicode functions are particularly egregious, since they use a helper function that ends up having to dynamically generate the format string to include the function name. This commit is a very simple change gets some of the common cases: in addition to the existing METH_O calling convention ('self' plus one positional arg), add the METH_O2 and METH_O3 calling conventions. Plus add METH_D1/D2/D3 as additional flags that can be or'd into the calling convention flags, which specify that there should some number of default arguments. This is pretty limited: - only handles up to 3 arguments / defaults - only handles "O" type specifiers (ie no unboxing of ints) - only allows NULL as the default value - doesn't give as much diagnostic info on error The first two could be handled by passing the format string as part of the function metadata instead of using it in the function body, though this would mean having to add the ability to understand the format strings. The last two issues are tricky from an API perspective since they would require a larger change to pass through variable-length data structures. So anyway, punt on those issues for now, and just use the simple flag approach. This cuts the function call overhead by about 4x for the functions that it's applied to, which are some common ones: string.count, unicode.count, unicode.startswith. (endswith, [r]find, and [r]index should all get updated as well)

Experimental: speed up calling of capi code
The main capi calling convention is to box all the positional arguments into a tuple, and then pass the tuple to PyArg_ParseTuple along with a format string that describes how to parse out the arguments. This ends up being pretty wasteful and misses all of the fast argument-rearrangement that we are able to JIT out. These unicode functions are particularly egregious, since they use a helper function that ends up having to dynamically generate the format string to include the function name. This commit is a very simple change gets some of the common cases: in addition to the existing METH_O calling convention ('self' plus one positional arg), add the METH_O2 and METH_O3 calling conventions. Plus add METH_D1/D2/D3 as additional flags that can be or'd into the calling convention flags, which specify that there should some number of default arguments. This is pretty limited: - only handles up to 3 arguments / defaults - only handles "O" type specifiers (ie no unboxing of ints) - only allows NULL as the default value - doesn't give as much diagnostic info on error The first two could be handled by passing the format string as part of the function metadata instead of using it in the function body, though this would mean having to add the ability to understand the format strings. The last two issues are tricky from an API perspective since they would require a larger change to pass through variable-length data structures. So anyway, punt on those issues for now, and just use the simple flag approach. This cuts the function call overhead by about 4x for the functions that it's applied to, which are some common ones: string.count, unicode.count, unicode.startswith. (endswith, [r]find, and [r]index should all get updated as well)
caa5000a · Kevin Modzelewski · d2ffecbe · caa5000a · caa5000a · caa5000a
Commit caa5000a authored Jul 17, 2015 by Kevin Modzelewski
5 changed files
--- a/from_cpython/Include/methodobject.h
+++ b/from_cpython/Include/methodobject.h
@@ -83,6 +83,14 @@ PyAPI_FUNC(PyObject *) PyCFunction_NewEx(PyMethodDef *, PyObject *,

 #define METH_COEXIST   0x0040

+/* Pyston additions: */
+#define METH_O2        0x0080
+#define METH_O3        (METH_O | METH_O2)
+// The number of defaults:
+#define METH_D1        0x0200
+#define METH_D2        0x0400
+#define METH_D3        (METH_D1 | METH_D2)
+
 typedef struct PyMethodChain {
    PyMethodDef *methods;		/* Methods of this type */
    struct PyMethodChain *link;	/* NULL or base type */

--- a/from_cpython/Objects/stringobject.c
+++ b/from_cpython/Objects/stringobject.c
@@ -24,15 +24,25 @@ PyObject * _do_string_format(PyObject *self, PyObject *args, PyObject *kwargs) {
 }

 PyObject *
-string_count(PyStringObject *self, PyObject *args)
+string_count(PyStringObject *self,
+             PyObject *sub_obj, PyObject* obj_start, PyObject** args)
 {
-    PyObject *sub_obj;
    const char *str = PyString_AS_STRING(self), *sub;
    Py_ssize_t sub_len;
    Py_ssize_t start = 0, end = PY_SSIZE_T_MAX;
+    PyObject* obj_end = args[0];

+    /*
    if (!stringlib_parse_args_finds("count", args, &sub_obj, &start, &end))
        return NULL;
+    */
+
+    if (obj_start && obj_start != Py_None)
+        if (!_PyEval_SliceIndex(obj_start, &start))
+            return 0;
+    if (obj_end && obj_end != Py_None)
+        if (!_PyEval_SliceIndex(obj_end, &end))
+            return 0;

    if (PyString_Check(sub_obj)) {
        sub = PyString_AS_STRING(sub_obj);

--- a/from_cpython/Objects/unicodeobject.c
+++ b/from_cpython/Objects/unicodeobject.c
@@ -6300,16 +6300,31 @@ Unicode string S[start:end].  Optional arguments start and end are\n\
 interpreted as in slice notation.");

 static PyObject *
-unicode_count(PyUnicodeObject *self, PyObject *args)
+unicode_count(PyUnicodeObject *self,
+              PyObject *subobj, PyObject* obj_start, PyObject** args)
 {
    PyUnicodeObject *substring;
    Py_ssize_t start = 0;
    Py_ssize_t end = PY_SSIZE_T_MAX;
    PyObject *result;
+    PyObject* obj_end = args[0];

+    /*
    if (!stringlib_parse_args_finds_unicode("count", args, &substring,
                                            &start, &end))
        return NULL;
+    */
+
+    if (obj_start && obj_start != Py_None)
+        if (!_PyEval_SliceIndex(obj_start, &start))
+            return 0;
+    if (obj_end && obj_end != Py_None)
+        if (!_PyEval_SliceIndex(obj_end, &end))
+            return 0;
+
+    substring = (PyUnicodeObject*)PyUnicode_FromObject(subobj);
+    if (!substring)
+        return 0;

    ADJUST_INDICES(start, end, self->length);
    result = PyInt_FromSsize_t(
@@ -7656,16 +7671,26 @@ prefix can also be a tuple of strings to try.");

 static PyObject *
 unicode_startswith(PyUnicodeObject *self,
-                   PyObject *args)
+                   PyObject *subobj, PyObject* obj_start, PyObject** args)
 {
-    PyObject *subobj;
    PyUnicodeObject *substring;
    Py_ssize_t start = 0;
    Py_ssize_t end = PY_SSIZE_T_MAX;
    int result;
+    PyObject* obj_end = args[0];

+    /*
    if (!stringlib_parse_args_finds("startswith", args, &subobj, &start, &end))
        return NULL;
+    */
+
+    if (obj_start && obj_start != Py_None)
+        if (!_PyEval_SliceIndex(obj_start, &start))
+            return 0;
+    if (obj_end && obj_end != Py_None)
+        if (!_PyEval_SliceIndex(obj_end, &end))
+            return 0;
+
    if (PyTuple_Check(subobj)) {
        Py_ssize_t i;
        for (i = 0; i < PyTuple_GET_SIZE(subobj); i++) {
@@ -7814,7 +7839,7 @@ static PyMethodDef unicode_methods[] = {
    {"capitalize", (PyCFunction) unicode_capitalize, METH_NOARGS, capitalize__doc__},
    {"title", (PyCFunction) unicode_title, METH_NOARGS, title__doc__},
    {"center", (PyCFunction) unicode_center, METH_VARARGS, center__doc__},
-    {"count", (PyCFunction) unicode_count, METH_VARARGS, count__doc__},
+    {"count", (PyCFunction) unicode_count, METH_O3 | METH_D2, count__doc__},
    {"expandtabs", (PyCFunction) unicode_expandtabs, METH_VARARGS, expandtabs__doc__},
    {"find", (PyCFunction) unicode_find, METH_VARARGS, find__doc__},
    {"partition", (PyCFunction) unicode_partition, METH_O, partition__doc__},
@@ -7834,7 +7859,7 @@ static PyMethodDef unicode_methods[] = {
    {"swapcase", (PyCFunction) unicode_swapcase, METH_NOARGS, swapcase__doc__},
    {"translate", (PyCFunction) unicode_translate, METH_O, translate__doc__},
    {"upper", (PyCFunction) unicode_upper, METH_NOARGS, upper__doc__},
-    {"startswith", (PyCFunction) unicode_startswith, METH_VARARGS, startswith__doc__},
+    {"startswith", (PyCFunction) unicode_startswith, METH_O3 | METH_D2, startswith__doc__},
    {"endswith", (PyCFunction) unicode_endswith, METH_VARARGS, endswith__doc__},
    {"islower", (PyCFunction) unicode_islower, METH_NOARGS, islower__doc__},
    {"isupper", (PyCFunction) unicode_isupper, METH_NOARGS, isupper__doc__},

--- a/src/runtime/descr.cpp
+++ b/src/runtime/descr.cpp
@@ -247,6 +247,7 @@ Box* BoxedMethodDescriptor::tppCall(Box* _self, CallRewriteArgs* rewrite_args, A
    }

    ParamReceiveSpec paramspec(0, 0, false, false);
+    Box** defaults = NULL;
    if (call_flags == METH_NOARGS) {
        paramspec = ParamReceiveSpec(1, 0, false, false);
    } else if (call_flags == METH_VARARGS) {
@@ -255,6 +256,25 @@ Box* BoxedMethodDescriptor::tppCall(Box* _self, CallRewriteArgs* rewrite_args, A
        paramspec = ParamReceiveSpec(1, 0, true, true);
    } else if (call_flags == METH_O) {
        paramspec = ParamReceiveSpec(2, 0, false, false);
+    } else if ((call_flags & ~(METH_O3 | METH_D3)) == 0) {
+        int num_args = 0;
+        if (call_flags & METH_O)
+            num_args++;
+        if (call_flags & METH_O2)
+            num_args += 2;
+
+        int num_defaults = 0;
+        if (call_flags & METH_D1)
+            num_defaults++;
+        if (call_flags & METH_D2)
+            num_defaults += 2;
+
+        paramspec = ParamReceiveSpec(1 + num_args, num_defaults, false, false);
+        if (num_defaults) {
+            static Box* _defaults[] = { NULL, NULL, NULL };
+            assert(num_defaults <= 3);
+            defaults = _defaults;
+        }
    } else {
        RELEASE_ASSERT(0, "0x%x", call_flags);
    }
@@ -264,9 +284,15 @@ Box* BoxedMethodDescriptor::tppCall(Box* _self, CallRewriteArgs* rewrite_args, A
    Box* oarg3 = NULL;
    Box** oargs = NULL;

+    Box* oargs_array[1];
+    if (paramspec.totalReceived() >= 3) {
+        assert((paramspec.totalReceived() - 3) <= sizeof(oargs_array) / sizeof(oargs_array[0]));
+        oargs = oargs_array;
+    }
+
    bool rewrite_success = false;
-    rearrangeArguments(paramspec, NULL, self->method->ml_name, NULL, rewrite_args, rewrite_success, argspec, arg1, arg2,
-                       arg3, args, keyword_names, oarg1, oarg2, oarg3, args);
+    rearrangeArguments(paramspec, NULL, self->method->ml_name, defaults, rewrite_args, rewrite_success, argspec, arg1,
+                       arg2, arg3, args, keyword_names, oarg1, oarg2, oarg3, oargs);

    if (!rewrite_success)
        rewrite_args = NULL;
@@ -321,6 +347,25 @@ Box* BoxedMethodDescriptor::tppCall(Box* _self, CallRewriteArgs* rewrite_args, A
        if (rewrite_args)
            rewrite_args->out_rtn = rewrite_args->rewriter->call(true, (void*)self->method->ml_meth, rewrite_args->arg1,
                                                                 rewrite_args->arg2);
+    } else if ((call_flags & ~(METH_O3 | METH_D3)) == 0) {
+        {
+            UNAVOIDABLE_STAT_TIMER(t0, "us_timer_in_builtins");
+            rtn = ((Box * (*)(Box*, Box*, Box*, Box**))self->method->ml_meth)(oarg1, oarg2, oarg3, oargs);
+        }
+        if (rewrite_args) {
+            if (paramspec.totalReceived() == 2)
+                rewrite_args->out_rtn = rewrite_args->rewriter->call(true, (void*)self->method->ml_meth,
+                                                                     rewrite_args->arg1, rewrite_args->arg2);
+            else if (paramspec.totalReceived() == 3)
+                rewrite_args->out_rtn = rewrite_args->rewriter->call(
+                    true, (void*)self->method->ml_meth, rewrite_args->arg1, rewrite_args->arg2, rewrite_args->arg3);
+            else if (paramspec.totalReceived() > 3)
+                rewrite_args->out_rtn
+                    = rewrite_args->rewriter->call(true, (void*)self->method->ml_meth, rewrite_args->arg1,
+                                                   rewrite_args->arg2, rewrite_args->arg3, rewrite_args->args);
+            else
+                abort();
+        }
    } else {
        RELEASE_ASSERT(0, "0x%x", call_flags);
    }

--- a/src/runtime/str.cpp
+++ b/src/runtime/str.cpp
@@ -2655,7 +2655,7 @@ static PyBufferProcs string_as_buffer = {
 };

 static PyMethodDef string_methods[] = {
-    { "count", (PyCFunction)string_count, METH_VARARGS, NULL },
+    { "count", (PyCFunction)string_count, METH_O3 | METH_D2, NULL },
    { "join", (PyCFunction)string_join, METH_O, NULL },
    { "split", (PyCFunction)string_split, METH_VARARGS, NULL },
    { "rsplit", (PyCFunction)string_rsplit, METH_VARARGS, NULL },