1. 14 May, 2001 7 commits
  2. 13 May, 2001 4 commits
    • Tim Peters's avatar
    • Mark Hammond's avatar
      Add support for Windows using "mbcs" as the default Unicode encoding when... · ef8b654b
      Mark Hammond authored
      Add support for Windows using "mbcs" as the default Unicode encoding when dealing with the file system.  As discussed on python-dev and in patch 410465.
      ef8b654b
    • Tim Peters's avatar
      Aggressive reordering of dict comparisons. In case of collision, it stands · 342c65e1
      Tim Peters authored
      to reason that me_key is much more likely to match the key we're looking
      for than to match dummy, and if the key is absent me_key is much more
      likely to be NULL than dummy:  most dicts don't even have a dummy entry.
      Running instrumented dict code over the test suite and some apps confirmed
      that matching dummy was 200-300x less frequent than matching key in
      practice.  So this reorders the tests to try the common case first.
      It can lose if a large dict with many collisions is mostly deleted, not
      resized, and then frequently searched, but that's hardly a case we
      should be favoring.
      342c65e1
    • Tim Peters's avatar
      Get rid of the superstitious "~" in dict hashing's "i = (~hash) & mask". · 2f228e75
      Tim Peters authored
      The comment following used to say:
      	/* We use ~hash instead of hash, as degenerate hash functions, such
      	   as for ints <sigh>, can have lots of leading zeros. It's not
      	   really a performance risk, but better safe than sorry.
      	   12-Dec-00 tim:  so ~hash produces lots of leading ones instead --
      	   what's the gain? */
      That is, there was never a good reason for doing it.  And to the contrary,
      as explained on Python-Dev last December, it tended to make the *sum*
      (i + incr) & mask (which is the first table index examined in case of
      collison) the same "too often" across distinct hashes.
      
      Changing to the simpler "i = hash & mask" reduced the number of string-dict
      collisions (== # number of times we go around the lookup for-loop) from about
      6 million to 5 million during a full run of the test suite (these are
      approximate because the test suite does some random stuff from run to run).
      The number of collisions in non-string dicts also decreased, but not as
      dramatically.
      
      Note that this may, for a given dict, change the order (wrt previous
      releases) of entries exposed by .keys(), .values() and .items().  A number
      of std tests suffered bogus failures as a result.  For dicts keyed by
      small ints, or (less so) by characters, the order is much more likely to be
      in increasing order of key now; e.g.,
      
      >>> d = {}
      >>> for i in range(10):
      ...    d[i] = i
      ...
      >>> d
      {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
      >>>
      
      Unfortunately. people may latch on to that in small examples and draw a
      bogus conclusion.
      
      test_support.py
          Moved test_extcall's sortdict() into test_support, made it stronger,
          and imported sortdict into other std tests that needed it.
      test_unicode.py
          Excluced cp875 from the "roundtrip over range(128)" test, because
          cp875 doesn't have a well-defined inverse for unicode("?", "cp875").
          See Python-Dev for excruciating details.
      Cookie.py
          Chaged various output functions to sort dicts before building
          strings from them.
      test_extcall
          Fiddled the expected-result file.  This remains sensitive to native
          dict ordering, because, e.g., if there are multiple errors in a
          keyword-arg dict (and test_extcall sets up many cases like that), the
          specific error Python complains about first depends on native dict
          ordering.
      2f228e75
  3. 12 May, 2001 7 commits
  4. 11 May, 2001 22 commits