1. 12 Aug, 2002 20 commits
    • Tim Peters's avatar
      k_mul(): Heh -- I checked in two fixes for the last problem. Only keep · 8c914587
      Tim Peters authored
      the good one <wink>.  Also checked in a test-aid by mistake.
      8c914587
    • Tim Peters's avatar
      k_mul(): White-box testing turned up that (ah+al)*(bh+bl) can, in rare · 5a8539f5
      Tim Peters authored
      cases, overflow the allocated result object by 1 bit.  In such cases,
      it would have been brought back into range if we subtracted al*bl and
      ah*bh from it first, but I don't want to do that because it hurts cache
      behavior.  Instead we just ignore the excess bit when it appears -- in
      effect, this is forcing unsigned mod BASE**(asize + bsize) arithmetic
      in a case where that doesn't happen all by itself.
      5a8539f5
    • Guido van Rossum's avatar
      Fix MSVC warnings. · 8b54f1ee
      Guido van Rossum authored
      8b54f1ee
    • Guido van Rossum's avatar
      Refactor how __dict__ and __weakref__ interact with __slots__. · ab5e2d06
      Guido van Rossum authored
      1. You can now have __dict__ and/or __weakref__ in your __slots__
         (before only __weakref__ was supported).  This is treated
         differently than before: it merely sets a flag that the object
         should support the corresponding magic.
      
      2. Dynamic types now always have descriptors __dict__ and __weakref__
         thrust upon them.  If the type in fact does not support one or the
         other, that descriptor's __get__ method will raise AttributeError.
      
      3. (This is the reason for all this; it fixes SF bug 575229, reported
         by Cesar Douady.)  Given this code:
            class A(object): __slots__ = []
            class B(object): pass
            class C(A, B): __slots__ = []
         the class object for C was broken; its size was less than that of
         B, and some descriptors on B could cause a segfault.  C now
         correctly inherits __weakrefs__ and __dict__ from B, even though A
         is the "primary" base (C.__base__ is A).
      
      4. Some code cleanup, and a few comments added.
      ab5e2d06
    • Tim Peters's avatar
      x_mul(): Made life easier for C optimizers in the "grade school" · 523a2539
      Tim Peters authored
      algorithm.  MSVC 6 wasn't impressed <wink>.
      
      Something odd:  the x_mul algorithm appears to get substantially worse
      than quadratic time as the inputs grow larger:
      
      bits in each input   x_mul time   k_mul time
      ------------------   ----------   ----------
                   15360         0.01         0.00
                   30720         0.04         0.01
                   61440         0.16         0.04
                  122880         0.64         0.14
                  245760         2.56         0.40
                  491520        10.76         1.23
                  983040        71.28         3.69
                 1966080       459.31        11.07
      
      That is, x_mul is perfectly quadratic-time until a little burp at
      2.56->10.76, and after that goes to hell in a hurry.  Under Karatsuba,
      doubling the input size "should take" 3 times longer instead of 4, and
      that remains the case throughout this range.  I conclude that my "be nice
      to the cache" reworkings of k_mul() are paying.
      523a2539
    • Tim Peters's avatar
      k_mul() and long_mul(): I'm confident that the Karatsuba algorithm is · d880fd68
      Tim Peters authored
      correct now, so added some final comments, did some cleanup, and enabled
      it for all long-int multiplies.  The KARAT envar no longer matters,
      although I left some #if 0'ed code in there for my own use (temporary).
      k_mul() is still much slower than x_mul() if the inputs have very
      differenent sizes, and that still needs to be addressed.
      d880fd68
    • Guido van Rossum's avatar
    • Guido van Rossum's avatar
    • Guido van Rossum's avatar
    • Tim Peters's avatar
      k_mul: Rearranged computation for better cache use. Ignored overflow · d177ece2
      Tim Peters authored
      (it's possible, but should be harmless -- this requires more thought,
      and allocating enough space in advance to prevent it requires exactly
      as much thought, to know exactly how much that is -- the end result
      certainly fits in the allocated space -- hmm, but that's really all
      the thought it needs!  borrows/carries out of the high digits really
      are harmless).
      d177ece2
    • Mark Hammond's avatar
      73f4c9b4
    • Marc-André Lemburg's avatar
      Add name mangling for new PyUnicode_FromOrdinal() and fix declaration · 9171bbf6
      Marc-André Lemburg authored
      to use new extern macro.
      9171bbf6
    • Mark Hammond's avatar
      Excise DL_EXPORT from Include. · 81ef4c69
      Mark Hammond authored
      Thanks to Skip Montanaro and Kalle Svensson for the patches.
      81ef4c69
    • Tim Peters's avatar
      x_mul(): This failed to normalize its result. · ad331860
      Tim Peters authored
      k_mul():  This didn't allocate enough result space when one input had
      more than twice as many bits as the other.  This was partly hidden by
      that x_mul() didn't normalize its result.
      
      The Karatsuba recurrence is pretty much hosed if the inputs aren't
      roughly the same size.  If one has at least twice as many bits as the
      other, we get a degenerate case where the "high half" of the smaller
      input is 0.  Added a special case for that, for speed, but despite that
      it helped, this can still be much slower than the "grade school" method.
      It seems to take a really wild imbalance to trigger that; e.g., a
      2**22-bit input times a 1000-bit input on my box runs about twice as slow
      under k_mul than under x_mul.  This still needs to be addressed.
      
      I'm also not sure that allocating a->ob_size + b->ob_size digits is
      enough, given that this is computing k = (ah+al)*(bh+bl) instead of
      k = (ah-al)*(bl-bh); i.e., it's certainly enough for the final result,
      but it's vaguely possible that adding in the "artificially" large k may
      overflow that temporarily.  If so, an assert will trigger in the debug
      build, but we'll probably compute the right result anyway(!).
      ad331860
    • Tim Peters's avatar
      Introduced helper functions v_iadd and v_isub, for in-place digit-vector · fa09d4c8
      Tim Peters authored
      addition and subtraction.  Reworked the tail end of k_mul() to use them.
      This saves oodles of one-shot longobject allocations (this is a triply-
      recursive routine, so saving one allocation in the body saves 3**n
      allocations at depth n; we actually save 2 allocations in the body).
      fa09d4c8
    • Guido van Rossum's avatar
      New news about __class__ assignment restrictions and speed-up of · a9148c34
      Guido van Rossum authored
      new-style object creation/deallocation.
      
      Moved all news about type/class unification and new-stype classes to a
      separate section at the top.
      a9148c34
    • Neal Norwitz's avatar
      bd2299b2
    • Tim Peters's avatar
      632c5009
    • Tim Peters's avatar
      k_mul(): Repaired typo in comment. · 9c9dbf0a
      Tim Peters authored
      9c9dbf0a
    • Tim Peters's avatar
      Cautious introduction of a patch that started from · a0f082fa
      Tim Peters authored
      SF 560379:  Karatsuba multiplication.
      Lots of things were changed from that.  This needs a lot more testing,
      for correctness and speed, the latter especially when bit lengths are
      unbalanced.  For now, the Karatsuba code gets invoked if and only if
      envar KARAT exists.
      a0f082fa
  2. 11 Aug, 2002 8 commits
  3. 10 Aug, 2002 10 commits
  4. 09 Aug, 2002 2 commits