1. 19 Feb, 2015 4 commits
    • Kevin Modzelewski's avatar
      Merge pull request #313 from undingen/perf_chaos · f2e68e79
      Kevin Modzelewski authored
      compvar: add int <op> float handling
      f2e68e79
    • Marius Wachtler's avatar
      compvar: add int <op> float handling · 15541f28
      Marius Wachtler authored
      Convert the integer to a float and then let the float code handle the operation
      With this change the type analysis is also able to comprehend that
      e.g. '1 - <float>' will return a float
      
      This means that the math operations in the 'linear_combination' function in chaos.py
      get completely inlined.
      
      improves chaos.py by 5%
      15541f28
    • Kevin Modzelewski's avatar
      Rearrange things to improve our ability to inline common cases · 2c4ab499
      Kevin Modzelewski authored
      We seem to be spending a fair amount of time doing unnecessary work
      for simple calls like boxInt and createList, which are generated
      by irgen and reduce to calling new BoxedInt / BoxedList.  The
      operator new calls tp_alloc, so we get some indirect function calls,
      and then tp_alloc does some checking about its caller, and then we
      check to see what size object to create, and how to initialize it.
      
      I created a DEFAULT_CLASS_SIMPLE macro to go with DEFAULT_CLASS,
      that should help with these things.  I (manually) inlined all of those
      functions into the operator new.
      
      I also moved the small arena bucket selection function (SmallArena::alloc)
      into the header file so that it can get inlined, since the allocation size
      is often known at compile time and we can statically resolve to a bucket.
      
      Putting these together means that boxInt and createList are much tighter.
      2c4ab499
    • Kevin Modzelewski's avatar
      Use a __thread cache for the GC's thread-local ThreadBlockCache · a2e51e4f
      Kevin Modzelewski authored
      __thread seems quite a bit faster than pthread_get_specific, so
      if we give up on having multiple Heap objects, then we can store
      a reference to the current thread's ThreadBlockCache in a static
      __thread variable.  It looks like this ends up mattering (5% average
      speedup) since SmallArena::_alloc() is so hot
      a2e51e4f
  2. 18 Feb, 2015 21 commits
  3. 17 Feb, 2015 3 commits
    • Kevin Modzelewski's avatar
      Support passing generator objects through the args array in OSR · bff16616
      Kevin Modzelewski authored
      Only gets hit when there are >=3 !is_defined names also set (other
      fake names might also count towards this).
      bff16616
    • Kevin Modzelewski's avatar
      Merge pull request #305 from undingen/dup_guards · 0b650c38
      Kevin Modzelewski authored
      Don't emit duplicate attr guards
      0b650c38
    • Marius Wachtler's avatar
      Don't emit duplicate attr guards · 6509deb8
      Marius Wachtler authored
      pyston (calibration)                      :    0.8s stock2: 0.8 (+2.5%)
      pyston interp2.py                         :    5.9s stock2: 6.2 (-4.5%)
      pyston raytrace.py                        :    6.9s stock2: 7.0 (-1.6%)
      pyston nbody.py                           :    9.8s stock2: 9.6 (+1.9%)
      pyston fannkuch.py                        :    7.0s stock2: 6.9 (+2.6%)
      pyston chaos.py                           :   20.6s stock2: 21.6 (-4.6%)
      pyston spectral_norm.py                   :   27.9s stock2: 34.2 (-18.6%)
      pyston fasta.py                           :   17.1s stock2: 17.8 (-4.5%)
      pyston pidigits.py                        :    4.4s stock2: 4.5 (-1.0%)
      pyston richards.py                        :   10.4s stock2: 10.2 (+2.2%)
      pyston deltablue.py                       :    2.2s stock2: 2.2 (-1.9%)
      pyston (geomean-0b9f)                     :    8.8s stock2: 9.1 (-3.2%)
      6509deb8
  4. 16 Feb, 2015 3 commits
  5. 14 Feb, 2015 7 commits
    • Kevin Modzelewski's avatar
      Reenable tier 2 for now · a3a12bb6
      Kevin Modzelewski authored
      We should do a more comprehensive investigation.  Removing t2 caused
      regressions on a number of benchmarks since we lost chances to do
      speculations, but making t3 easier to get to caused regressions
      due to the cost of our LLVM optimization set (which is pretty hefty
      since it's supposed to be hard to activate).
      a3a12bb6
    • Kevin Modzelewski's avatar
      fb70753e
    • Kevin Modzelewski's avatar
      Merge branch 'deopt' · 2e030372
      Kevin Modzelewski authored
      Fully switch to the new deopt system, and clean up a lot of stuff.
      2e030372
    • Kevin Modzelewski's avatar
      Further distinguish OSR and non-osr compiles · 1bfb56e8
      Kevin Modzelewski authored
      A "FunctionSpecialization" object really makes no sense in the context of
      an OSR compile, since the FunctionSpecialization talks about the types
      of the input arguments, which no longer matter for OSR compiles.
      Now, their type information comes (almost) entirely from the OSREntryDescriptor,
      so in most places assert that we get exactly one or the other.
      1bfb56e8
    • Kevin Modzelewski's avatar
      Can kill all notion of partial-block-compilation · 0e60f0d3
      Kevin Modzelewski authored
      We only needed that for supporting the old deopt system
      0e60f0d3
    • Kevin Modzelewski's avatar
      Nuke the old "block guards" and the rest of the old deopt system · 8feae20e
      Kevin Modzelewski authored
      Long live new-deopt!
      8feae20e
    • Kevin Modzelewski's avatar
      For OSRs, do type analysis starting from OSR edge · ea673dfd
      Kevin Modzelewski authored
      Before we would do type analysis starting from the function entry
      (using the specialization of the previous function).  This makes things
      pretty complicated because we can infer different types than we are OSRing
      with!  Ex if the type analysis determines that we should speculate in an
      earlier BB, the types we have now might not reflect that speculation.
      
      So instead, start the type analysis starting from the BB that the OSR starts at.
      Should also have the side-benefit of requiring less type analysis work.
      
      But this should let us get rid of the OSR-entry guarding, and the rest of
      the old deopt system!
      ea673dfd
  6. 13 Feb, 2015 2 commits