Commits · 466b5204d238ea053bee9e2e2d20f2fe484b4ff2 · Boxiang Sun / Pyston

20 Feb, 2015 1 commit
- Merge pull request #295 from toshok/tp_getattr · 466b5204
  Kevin Modzelewski authored Feb 19, 2015
```
add tp_getattr support, PyCFunction_NewEx, and most of PyNumber_Int's behavior
```
  466b5204
19 Feb, 2015 11 commits

Fix an arg-handling bug in typeCallInternal · 0e10126d

Kevin Modzelewski authored Feb 19, 2015

In typeCallInternal, we used to expand out any starargs in order to take a look
at the first arg (and change it when passing it).

We had a bug in this code, and rather than make that code more complicated
to fix it, just call back into callFunc to resolve it. This is kind of tricky
since callFunc will call typeCall, and we don't want typeCall to duplicate
the typeCallInternal behavior (that's not any better than duplicating the
arg behavior), so we want typeCall to call into typeCallInternal. But
typeCall receives varargs! which typeCallInternal doesn't support. So typeCall
has to do some (simpler) arg handling to expand out the varargs.

In the end, it simplifies the code a little bit but causes a bunch of extra calls
in the varargs case, so it's less of a win than I thought, but at least it
fixes the bug.

0e10126d

Minor: if a GC is triggered in this section it will crash · bac77762

Kevin Modzelewski authored Feb 19, 2015

We could make the typeGCHandler support these half-constructed classes,
but let's just turn off the GC for this area.

bac77762

add tp_getattr support, and most of PyNumber_Int's behavior · 9aac061c
Chris Toshok authored Feb 11, 2015

9aac061c
Merge pull request #292 from tjhance/set-function-name · dbac3625
Kevin Modzelewski authored Feb 19, 2015
```
implement setting __name__ for functions
```
dbac3625
Merge pull request #312 from kevinxucs/docs_gcc-url · efeaab75
Kevin Modzelewski authored Feb 19, 2015
```
Update gcc-4.8.2 tarball url to generic gnu ftpmirror.
```
efeaab75
Merge pull request #313 from undingen/perf_chaos · f2e68e79
Kevin Modzelewski authored Feb 19, 2015
```
compvar: add int <op> float handling
```
f2e68e79

compvar: add int <op> float handling · 15541f28

Marius Wachtler authored Feb 19, 2015

Convert the integer to a float and then let the float code handle the operation
With this change the type analysis is also able to comprehend that
e.g. '1 - <float>' will return a float

This means that the math operations in the 'linear_combination' function in chaos.py
get completely inlined.

improves chaos.py by 5%

15541f28

Update gcc-4.8.2 tarball url to generic gnu ftpmirror. · c43f92a5
Kaiwen Xu authored Feb 19, 2015

c43f92a5
implement setting __name__ for functions · 4093d5a8
Travis Hance authored Feb 09, 2015

4093d5a8

Rearrange things to improve our ability to inline common cases · 2c4ab499

Kevin Modzelewski authored Feb 18, 2015

We seem to be spending a fair amount of time doing unnecessary work
for simple calls like boxInt and createList, which are generated
by irgen and reduce to calling new BoxedInt / BoxedList. The
operator new calls tp_alloc, so we get some indirect function calls,
and then tp_alloc does some checking about its caller, and then we
check to see what size object to create, and how to initialize it.

I created a DEFAULT_CLASS_SIMPLE macro to go with DEFAULT_CLASS,
that should help with these things. I (manually) inlined all of those
functions into the operator new.

I also moved the small arena bucket selection function (SmallArena::alloc)
into the header file so that it can get inlined, since the allocation size
is often known at compile time and we can statically resolve to a bucket.

Putting these together means that boxInt and createList are much tighter.

2c4ab499

Use a __thread cache for the GC's thread-local ThreadBlockCache · a2e51e4f

Kevin Modzelewski authored Feb 18, 2015

__thread seems quite a bit faster than pthread_get_specific, so
if we give up on having multiple Heap objects, then we can store
a reference to the current thread's ThreadBlockCache in a static
__thread variable.  It looks like this ends up mattering (5% average
speedup) since SmallArena::_alloc() is so hot

a2e51e4f

18 Feb, 2015 21 commits

Merge pull request #309 from undingen/len_rewriting · 243781f7
Kevin Modzelewski authored Feb 18, 2015
```
Teach len() howto rewrite itself
```
243781f7
Teach len() howto rewrite itself · ee7cf48d
Marius Wachtler authored Feb 17, 2015
```
-15% for fasta.py
```
ee7cf48d
Allow rewriting 1-arg calls to type() · a4722ed0
Kevin Modzelewski authored Feb 17, 2015

a4722ed0
max->min · 12f29135
Kevin Modzelewski authored Feb 18, 2015

12f29135
Change from "never retry ICs" to exponential backoff · af59d5ae
Kevin Modzelewski authored Feb 17, 2015

af59d5ae

Increase callsite IC sizes · 49a830b6

Kevin Modzelewski authored Feb 17, 2015

At some point I'm sure we'll start paying for our 2KB+ inline caches,
but it doesn't seem to be now!

49a830b6

Stop rewriting ICs after a certain number of rewrites · 1a1afc8c

Kevin Modzelewski authored Feb 17, 2015

It's a pretty crude heuristic, but it stops us from endlessly
rewriting "megamorphic" IC sites.

pyston interp2.py : 6.7s baseline: 6.5 (+3.0%)
pyston raytrace.py : 8.3s baseline: 7.9 (+4.3%)
pyston nbody.py : 10.6s baseline: 10.3 (+3.1%)
pyston fannkuch.py : 7.4s baseline: 7.4 (+0.8%)
pyston chaos.py : 24.2s baseline: 24.6 (-1.5%)
pyston spectral_norm.py : 22.7s baseline: 30.4 (-25.4%)
pyston fasta.py : 9.0s baseline: 8.4 (+7.6%)
pyston pidigits.py : 4.4s baseline: 4.3 (+1.7%)
pyston richards.py : 2.7s baseline: 12.5 (-78.7%)
pyston deltablue.py : 2.7s baseline: 2.6 (+0.9%)
pyston (geomean-0b9f) : 7.6s baseline: 9.0 (-15.2%)

There are a number of regressions; I feel like this is something
we'll be tuning a lot.

1a1afc8c

Reduce generators memory usage · 58d587ba

Kevin Modzelewski authored Feb 17, 2015

Limit the number of generator stacks that we save, and register them as
additional GC pressure.

58d587ba

Merge branch 'generator-simple-destructor' of https://github.com/toshok/pyston · 1a58c87d
Kevin Modzelewski authored Feb 17, 2015
```
Conflicts:
	src/runtime/generator.cpp

Closes #307
```
1a58c87d
Merge pull request #302 from undingen/ctxswitching · 76057292
Kevin Modzelewski authored Feb 17, 2015
```
New context switching code for generators
```
76057292
Merge pull request #303 from undingen/perf_fasta · 81c004af
Kevin Modzelewski authored Feb 17, 2015
```
Smaller performance improvements for fasta
```
81c004af
Make find_module support packages · f5d262cc
Kevin Modzelewski authored Feb 17, 2015

f5d262cc
Merge pull request #306 from toshok/fix-spectral-norm-gc-regression · 14dd8e7d
Kevin Modzelewski authored Feb 17, 2015
```
remove the larger buckets, and hoist some math out of loops.
```
14dd8e7d
Separate the code to find then import modules · 01b8b9b7
Kevin Modzelewski authored Feb 17, 2015
```
Python exposes the finding part through the 'imp' module.
```
01b8b9b7
add bm_ai.py · 330b378c
Chris Toshok authored Feb 18, 2015

330b378c
Fix some issues with file.write · e4767851
Kevin Modzelewski authored Feb 17, 2015
```
It uses the buffer protocol, so make str support that better.
```
e4767851
reuse generator stacks · cf9487cd
Chris Toshok authored Feb 18, 2015

cf9487cd

remove the larger buckets, and hoist some math out of loops. · a9426101

Chris Toshok authored Feb 18, 2015

For some reason the larger bucket sizes are causing a large perf hit
in spectral_norm.  It's unclear exactly why this is happening, but
theories are legion.  More investigation is warranted, but this gets us
back from the perf regression.

Also hoist the atom_idx calculation out of a couple of loops that were
iterating over object indices.

a9426101

Merge branch 'zlib' · 5ba655be
Kevin Modzelewski authored Feb 17, 2015

5ba655be
Enable the zlib module · aea9ef2d
Kevin Modzelewski authored Feb 17, 2015

aea9ef2d
Switch to CPython's pthread library · 1a20fdce
Kevin Modzelewski authored Feb 17, 2015

1a20fdce

17 Feb, 2015 3 commits

Support passing generator objects through the args array in OSR · bff16616
Kevin Modzelewski authored Feb 17, 2015
```
Only gets hit when there are >=3 !is_defined names also set (other
fake names might also count towards this).
```
bff16616
Merge pull request #305 from undingen/dup_guards · 0b650c38
Kevin Modzelewski authored Feb 17, 2015
```
Don't emit duplicate attr guards
```
0b650c38

Don't emit duplicate attr guards · 6509deb8

Marius Wachtler authored Feb 17, 2015

pyston (calibration) : 0.8s stock2: 0.8 (+2.5%)
pyston interp2.py : 5.9s stock2: 6.2 (-4.5%)
pyston raytrace.py : 6.9s stock2: 7.0 (-1.6%)
pyston nbody.py : 9.8s stock2: 9.6 (+1.9%)
pyston fannkuch.py : 7.0s stock2: 6.9 (+2.6%)
pyston chaos.py : 20.6s stock2: 21.6 (-4.6%)
pyston spectral_norm.py : 27.9s stock2: 34.2 (-18.6%)
pyston fasta.py : 17.1s stock2: 17.8 (-4.5%)
pyston pidigits.py : 4.4s stock2: 4.5 (-1.0%)
pyston richards.py : 10.4s stock2: 10.2 (+2.2%)
pyston deltablue.py : 2.2s stock2: 2.2 (-1.9%)
pyston (geomean-0b9f) : 8.8s stock2: 9.1 (-3.2%)

6509deb8

16 Feb, 2015 3 commits
- Save current internal thread state in TLS · 7f864131
  Marius Wachtler authored Feb 16, 2015
```
reduces the generator yield overhead
```
  7f864131
- New context switching code for generators · 2537d743
  Marius Wachtler authored Feb 16, 2015
```
This is a huge speed improvement for generators,
fasta.py takes 8secs now instead of 18secs
```
  2537d743
- strJoin: use llvm::raw_string_ostream · babe2f69
  Marius Wachtler authored Feb 13, 2015
```
reduces strJoin runtime from 0.8sec to 0.5sec when executing fasta.py
```
  babe2f69
14 Feb, 2015 1 commit

Reenable tier 2 for now · a3a12bb6

Kevin Modzelewski authored Feb 14, 2015

We should do a more comprehensive investigation. Removing t2 caused
regressions on a number of benchmarks since we lost chances to do
speculations, but making t3 easier to get to caused regressions
due to the cost of our LLVM optimization set (which is pretty hefty
since it's supposed to be hard to activate).

a3a12bb6