1. 21 Apr, 2015 1 commit
    • Greg Ward's avatar
      #17445: difflib: add diff_bytes(), to compare bytes rather than str · 4d9d2563
      Greg Ward authored
      Some applications (e.g. traditional Unix diff, version control
      systems) neither know nor care about the encodings of the files they
      are comparing. They are textual, but to the diff utility they are just
      bytes. This worked fine under Python 2, because all of the hardcoded
      strings in difflib.py are ASCII, so could safely be combined with
      old-style u'' strings. But it stopped working in 3.x.
      
      The solution is to use surrogate escapes for a lossless
      bytes->str->bytes roundtrip. That means {unified,context}_diff() can
      continue to just handle strings without worrying about bytes. Callers
      who have to deal with bytes will need to change to using diff_bytes().
      
      Use case: Mercurial's test runner uses difflib to compare current hg
      output with known good output. But Mercurial's output is just bytes,
      since it can contain:
        * file contents (arbitrary unknown encoding)
        * filenames (arbitrary unknown encoding)
        * usernames and commit messages (usually UTF-8, but not guaranteed
          because old versions of Mercurial did not enforce it)
        * user messages (locale encoding)
      
      Since the output of any given hg command can include text in multiple
      encodings, it is hopeless to try to treat it as decodable Unicode
      text. It's just bytes, all the way down.
      
      This is an elaboration of a patch by Terry Reedy.
      4d9d2563
  2. 20 Apr, 2015 20 commits
  3. 19 Apr, 2015 14 commits
  4. 18 Apr, 2015 3 commits
  5. 17 Apr, 2015 2 commits