1. 29 Apr, 2002 1 commit
    • Tim Peters's avatar
      Mostly in SequenceMatcher.{__chain_b, find_longest_match}: · 81b9251d
      Tim Peters authored
      This now does a dynamic analysis of which elements are so frequently
      repeated as to constitute noise.  The primary benefit is an enormous
      speedup in find_longest_match, as the innermost loop can have factors
      of 100s less potential matches to worry about, in cases where the
      sequences have many duplicate elements.  In effect, this zooms in on
      sequences of non-ubiquitous elements now.
      
      While I like what I've seen of the effects so far, I still consider
      this experimental.  Please give it a try!
      81b9251d
  2. 28 Apr, 2002 4 commits
  3. 27 Apr, 2002 2 commits
    • Tim Peters's avatar
      Repair widespread misuse of _PyString_Resize. Since it's clear people · 5de9842b
      Tim Peters authored
      don't understand how this function works, also beefed up the docs.  The
      most common usage error is of this form (often spread out across gotos):
      
      	if (_PyString_Resize(&s, n) < 0) {
      		Py_DECREF(s);
      		s = NULL;
      		goto outtahere;
      	}
      
      The error is that if _PyString_Resize runs out of memory, it automatically
      decrefs the input string object s (which also deallocates it, since its
      refcount must be 1 upon entry), and sets s to NULL.  So if the "if"
      branch ever triggers, it's an error to call Py_DECREF(s):  s is already
      NULL!  A correct way to write the above is the simpler (and intended)
      
      	if (_PyString_Resize(&s, n) < 0)
      		goto outtahere;
      
      Bugfix candidate.
      5de9842b
    • Tim Peters's avatar
      SF patch 549375: Compromise PyUnicode_EncodeUTF8 · 602f740b
      Tim Peters authored
      This implements ideas from Marc-Andre, Martin, Guido and me on Python-Dev.
      
      "Short" Unicode strings are encoded into a "big enough" stack buffer,
      then exactly as much string space as they turn out to need is allocated
      at the end.  This should have speed benefits akin to Martin's "measure
      once, allocate once" strategy, but without needing a distinct measuring
      pass.
      
      "Long" Unicode strings allocate as much heap space as they could possibly
      need (4 x # Unicode chars), and do a realloc at the end to return the
      untouched excess.  Since the overallocation is likely to be substantial,
      this shouldn't burden the platform realloc with unusably small excess
      blocks.
      
      Also simplified uses of the PyString_xyz functions.  Also added a release-
      build check that 4*size doesn't overflow a C int.  Sooner or later, that's
      going to happen.
      602f740b
  4. 26 Apr, 2002 11 commits
  5. 25 Apr, 2002 9 commits
  6. 24 Apr, 2002 3 commits
  7. 23 Apr, 2002 10 commits