1. 18 May, 2001 1 commit
    • Guido van Rossum's avatar
      A much improved HTML parser -- a replacement for sgmllib. The API is · 263c2ec1
      Guido van Rossum authored
      derived from but not quite compatible with that of sgmllib, so it's a
      new file.  I suppose it needs documentation, and htmllib needs to be
      changed to use this instead of sgmllib, and sgmllib needs to be
      declared obsolete.  But that can all be done later.
      
      This code was first published as part of TAL (part of Zope Page
      Templates), but that was strongly based on sgmllib anyway.  Authors
      are Fred drake and Guido van Rossum.
      263c2ec1
  2. 17 May, 2001 9 commits
  3. 16 May, 2001 1 commit
  4. 15 May, 2001 10 commits
  5. 14 May, 2001 13 commits
  6. 13 May, 2001 4 commits
    • Tim Peters's avatar
    • Mark Hammond's avatar
      Add support for Windows using "mbcs" as the default Unicode encoding when... · 62673723
      Mark Hammond authored
      Add support for Windows using "mbcs" as the default Unicode encoding when dealing with the file system.  As discussed on python-dev and in patch 410465.
      62673723
    • Tim Peters's avatar
      Aggressive reordering of dict comparisons. In case of collision, it stands · 07539686
      Tim Peters authored
      to reason that me_key is much more likely to match the key we're looking
      for than to match dummy, and if the key is absent me_key is much more
      likely to be NULL than dummy:  most dicts don't even have a dummy entry.
      Running instrumented dict code over the test suite and some apps confirmed
      that matching dummy was 200-300x less frequent than matching key in
      practice.  So this reorders the tests to try the common case first.
      It can lose if a large dict with many collisions is mostly deleted, not
      resized, and then frequently searched, but that's hardly a case we
      should be favoring.
      07539686
    • Tim Peters's avatar
      Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask". · 5770625e
      Tim Peters authored
      The comment following used to say:
      	/* We use ~hash instead of hash, as degenerate hash functions, such
      	   as for ints <sigh>, can have lots of leading zeros. It's not
      	   really a performance risk, but better safe than sorry.
      	   12-Dec-00 tim:  so ~hash produces lots of leading ones instead --
      	   what's the gain? */
      That is, there was never a good reason for doing it.  And to the contrary,
      as explained on Python-Dev last December, it tended to make the *sum*
      (i + incr) & mask (which is the first table index examined in case of
      collison) the same "too often" across distinct hashes.
      
      Changing to the simpler "i = hash & mask" reduced the number of string-dict
      collisions (== # number of times we go around the lookup for-loop) from about
      6 million to 5 million during a full run of the test suite (these are
      approximate because the test suite does some random stuff from run to run).
      The number of collisions in non-string dicts also decreased, but not as
      dramatically.
      
      Note that this may, for a given dict, change the order (wrt previous
      releases) of entries exposed by .keys(), .values() and .items().  A number
      of std tests suffered bogus failures as a result.  For dicts keyed by
      small ints, or (less so) by characters, the order is much more likely to be
      in increasing order of key now; e.g.,
      
      >>> d = {}
      >>> for i in range(10):
      ...    d[i] = i
      ...
      >>> d
      {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
      >>>
      
      Unfortunately. people may latch on to that in small examples and draw a
      bogus conclusion.
      
      test_support.py
          Moved test_extcall's sortdict() into test_support, made it stronger,
          and imported sortdict into other std tests that needed it.
      test_unicode.py
          Excluced cp875 from the "roundtrip over range(128)" test, because
          cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
          See Python-Dev for excruciating details.
      Cookie.py
          Chaged various output functions to sort dicts before building
          strings from them.
      test_extcall
          Fiddled the expected-result file.  This remains sensitive to native
          dict ordering, because, e.g., if there are multiple errors in a
          keyword-arg dict (and test_extcall sets up many cases like that), the
          specific error Python complains about first depends on native dict
          ordering.
      5770625e
  7. 12 May, 2001 2 commits