-
Kevin Modzelewski authored
Should hopefully cut down on allocations to pass around 'const std::string&' objects (since we don't always store things as std::strings anymore), or to calls to strlen if we pass around const char*s. Haven't looked yet at the calls that we embed in the llvm IR. Here are the perf results: pyston django_migrate.py : 2.3s baseline: 2.3 (-1.7%) pyston django-template.py : 15.1s baseline: 15.4 (-1.6%) pyston interp2.py : 5.3s baseline: 6.3 (-15.1%) pyston raytrace.py : 6.1s baseline: 6.2 (-0.7%) pyston nbody.py : 8.4s baseline: 8.1 (+4.1%) pyston fannkuch.py : 7.5s baseline: 7.5 (+0.2%) pyston chaos.py : 20.2s baseline: 20.0 (+0.7%) pyston fasta.py : 5.4s baseline: 5.4 (+0.3%) pyston pidigits.py : 5.7s baseline: 5.7 (+0.0%) pyston richards.py : 2.5s baseline: 2.7 (-6.2%) pyston deltablue.py : 1.8s baseline: 1.8 (-0.0%) pyston (geomean-3424) : 5.7s baseline: 5.8 (-2.0%) I looked into the regression in nbody.py, and it is in an unrelated piece of code (list unpacking) that has the same assembly and gets called the same number of times. Maybe there's some weird cache collision. It's an extremely small benchmark (a single 13-line loop) so I'm happy to write it off as microbenchmark sensitivity. We can also optimize this if we want to; we could speculate on the type that we are unpacking and inline the parts of the unpacking code we need.
ef27d6cb