1. 09 Sep, 2019 14 commits
  2. 08 Sep, 2019 4 commits
  3. 07 Sep, 2019 2 commits
  4. 06 Sep, 2019 4 commits
  5. 05 Sep, 2019 12 commits
  6. 04 Sep, 2019 4 commits
    • Steve Dower's avatar
    • Zackery Spytz's avatar
    • Roger Iyengar's avatar
      Fix grammar in asyncio-dev.rst (GH-15672) · 675d17ce
      Roger Iyengar authored
      
      
      Automerge-Triggered-By: @ned-deily
      675d17ce
    • Greg Price's avatar
      closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558) · 2f094139
      Greg Price authored
      The purpose of the `unicodedata.is_normalized` function is to answer
      the question `str == unicodedata.normalized(form, str)` more
      efficiently than writing just that, by using the "quick check"
      optimization described in the Unicode standard in UAX #15.
      
      However, it turns out the code doesn't implement the full algorithm
      from the standard, and as a result we often miss the optimization and
      end up having to compute the whole normalized string after all.
      
      Implement the standard's algorithm.  This greatly speeds up
      `unicodedata.is_normalized` in many cases where our partial variant
      of quick-check had been returning MAYBE and the standard algorithm
      returns NO.
      
      At a quick test on my desktop, the existing code takes about 4.4 ms/MB
      (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it
      has to do the slow normalize-and-compare:
      
        $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
            -- 'unicodedata.is_normalized("NFD", s)'
        50 loops, best of 5: 4.39 msec per loop
      
      With this patch, it gets the answer instantly (58 ns) on the same 1 MB
      string:
      
        $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"*500000' \
            -- 'unicodedata.is_normalized("NFD", s)'
        5000000 loops, best of 5: 58.2 nsec per loop
      
      This restores a small optimization that the original version of this
      code had for the `unicodedata.normalize` use case.
      
      With this, that case is actually faster than in master!
      
      $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
          -- 'unicodedata.normalize("NFD", s)'
      500 loops, best of 5: 561 usec per loop
      
      $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"*500000' \
          -- 'unicodedata.normalize("NFD", s)'
      500 loops, best of 5: 512 usec per loop
      2f094139