- 20 Feb, 2015 1 commit
-
-
Kevin Modzelewski authored
add tp_getattr support, PyCFunction_NewEx, and most of PyNumber_Int's behavior
-
- 19 Feb, 2015 11 commits
-
-
Kevin Modzelewski authored
In typeCallInternal, we used to expand out any starargs in order to take a look at the first arg (and change it when passing it). We had a bug in this code, and rather than make that code more complicated to fix it, just call back into callFunc to resolve it. This is kind of tricky since callFunc will call typeCall, and we don't want typeCall to duplicate the typeCallInternal behavior (that's not any better than duplicating the arg behavior), so we want typeCall to call into typeCallInternal. But typeCall receives varargs! which typeCallInternal doesn't support. So typeCall has to do some (simpler) arg handling to expand out the varargs. In the end, it simplifies the code a little bit but causes a bunch of extra calls in the varargs case, so it's less of a win than I thought, but at least it fixes the bug.
-
Kevin Modzelewski authored
We could make the typeGCHandler support these half-constructed classes, but let's just turn off the GC for this area.
-
Chris Toshok authored
-
Kevin Modzelewski authored
implement setting __name__ for functions
-
Kevin Modzelewski authored
Update gcc-4.8.2 tarball url to generic gnu ftpmirror.
-
Kevin Modzelewski authored
compvar: add int <op> float handling
-
Marius Wachtler authored
Convert the integer to a float and then let the float code handle the operation With this change the type analysis is also able to comprehend that e.g. '1 - <float>' will return a float This means that the math operations in the 'linear_combination' function in chaos.py get completely inlined. improves chaos.py by 5%
-
Kaiwen Xu authored
-
Travis Hance authored
-
Kevin Modzelewski authored
We seem to be spending a fair amount of time doing unnecessary work for simple calls like boxInt and createList, which are generated by irgen and reduce to calling new BoxedInt / BoxedList. The operator new calls tp_alloc, so we get some indirect function calls, and then tp_alloc does some checking about its caller, and then we check to see what size object to create, and how to initialize it. I created a DEFAULT_CLASS_SIMPLE macro to go with DEFAULT_CLASS, that should help with these things. I (manually) inlined all of those functions into the operator new. I also moved the small arena bucket selection function (SmallArena::alloc) into the header file so that it can get inlined, since the allocation size is often known at compile time and we can statically resolve to a bucket. Putting these together means that boxInt and createList are much tighter.
-
Kevin Modzelewski authored
__thread seems quite a bit faster than pthread_get_specific, so if we give up on having multiple Heap objects, then we can store a reference to the current thread's ThreadBlockCache in a static __thread variable. It looks like this ends up mattering (5% average speedup) since SmallArena::_alloc() is so hot
-
- 18 Feb, 2015 21 commits
-
-
Kevin Modzelewski authored
Teach len() howto rewrite itself
-
Marius Wachtler authored
-15% for fasta.py
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
At some point I'm sure we'll start paying for our 2KB+ inline caches, but it doesn't seem to be now!
-
Kevin Modzelewski authored
It's a pretty crude heuristic, but it stops us from endlessly rewriting "megamorphic" IC sites. pyston interp2.py : 6.7s baseline: 6.5 (+3.0%) pyston raytrace.py : 8.3s baseline: 7.9 (+4.3%) pyston nbody.py : 10.6s baseline: 10.3 (+3.1%) pyston fannkuch.py : 7.4s baseline: 7.4 (+0.8%) pyston chaos.py : 24.2s baseline: 24.6 (-1.5%) pyston spectral_norm.py : 22.7s baseline: 30.4 (-25.4%) pyston fasta.py : 9.0s baseline: 8.4 (+7.6%) pyston pidigits.py : 4.4s baseline: 4.3 (+1.7%) pyston richards.py : 2.7s baseline: 12.5 (-78.7%) pyston deltablue.py : 2.7s baseline: 2.6 (+0.9%) pyston (geomean-0b9f) : 7.6s baseline: 9.0 (-15.2%) There are a number of regressions; I feel like this is something we'll be tuning a lot.
-
Kevin Modzelewski authored
Limit the number of generator stacks that we save, and register them as additional GC pressure.
-
https://github.com/toshok/pystonKevin Modzelewski authored
Conflicts: src/runtime/generator.cpp Closes #307
-
Kevin Modzelewski authored
New context switching code for generators
-
Kevin Modzelewski authored
Smaller performance improvements for fasta
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
remove the larger buckets, and hoist some math out of loops.
-
Kevin Modzelewski authored
Python exposes the finding part through the 'imp' module.
-
Chris Toshok authored
-
Kevin Modzelewski authored
It uses the buffer protocol, so make str support that better.
-
Chris Toshok authored
-
Chris Toshok authored
For some reason the larger bucket sizes are causing a large perf hit in spectral_norm. It's unclear exactly why this is happening, but theories are legion. More investigation is warranted, but this gets us back from the perf regression. Also hoist the atom_idx calculation out of a couple of loops that were iterating over object indices.
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-
Kevin Modzelewski authored
-
- 17 Feb, 2015 3 commits
-
-
Kevin Modzelewski authored
Only gets hit when there are >=3 !is_defined names also set (other fake names might also count towards this).
-
Kevin Modzelewski authored
Don't emit duplicate attr guards
-
Marius Wachtler authored
pyston (calibration) : 0.8s stock2: 0.8 (+2.5%) pyston interp2.py : 5.9s stock2: 6.2 (-4.5%) pyston raytrace.py : 6.9s stock2: 7.0 (-1.6%) pyston nbody.py : 9.8s stock2: 9.6 (+1.9%) pyston fannkuch.py : 7.0s stock2: 6.9 (+2.6%) pyston chaos.py : 20.6s stock2: 21.6 (-4.6%) pyston spectral_norm.py : 27.9s stock2: 34.2 (-18.6%) pyston fasta.py : 17.1s stock2: 17.8 (-4.5%) pyston pidigits.py : 4.4s stock2: 4.5 (-1.0%) pyston richards.py : 10.4s stock2: 10.2 (+2.2%) pyston deltablue.py : 2.2s stock2: 2.2 (-1.9%) pyston (geomean-0b9f) : 8.8s stock2: 9.1 (-3.2%)
-
- 16 Feb, 2015 3 commits
-
-
Marius Wachtler authored
reduces the generator yield overhead
-
Marius Wachtler authored
This is a huge speed improvement for generators, fasta.py takes 8secs now instead of 18secs
-
Marius Wachtler authored
reduces strJoin runtime from 0.8sec to 0.5sec when executing fasta.py
-
- 14 Feb, 2015 1 commit
-
-
Kevin Modzelewski authored
We should do a more comprehensive investigation. Removing t2 caused regressions on a number of benchmarks since we lost chances to do speculations, but making t3 easier to get to caused regressions due to the cost of our LLVM optimization set (which is pretty hefty since it's supposed to be hard to activate).
-