1. 04 May, 2000 6 commits
    • Guido van Rossum's avatar
      When the UTF-8 conversion to Unicode fails, return an 8-bit string · 69529ad0
      Guido van Rossum authored
      instead.  This seems more robust than returning an Unicode string with
      some unconverted charcters in it.
      
      This still doesn't support getting truly binary data out of Tcl, since
      we look for the trailing null byte; but the old (pre-Unicode) code did
      this too, so apparently there's no need.  (Plus, I really don't feel
      like finding out how Tcl deals with this in each version.)
      69529ad0
    • Guido van Rossum's avatar
      Mark Hammond should get his act into gear (his words :-). Zero length · 03e29f1a
      Guido van Rossum authored
      strings _are_ valid!
      03e29f1a
    • Jack Jansen's avatar
      301f3f6b
    • Guido van Rossum's avatar
      Two changes to improve (I hope) Unicode support. · 990f5c6c
      Guido van Rossum authored
      1. In Tcl 8.2 and later, use Tcl_NewUnicodeObj() when passing a Python
      Unicode object rather than going through UTF-8.  (This function
      doesn't exist in Tcl 8.1, so there the original UTF-8 code is still
      used; in Tcl 8.0 there is no support for Unicode.)  This assumes that
      Tcl_UniChar is the same thing as Py_UNICODE; a run-time error is
      issued if this is not the case.
      
      2. In Tcl 8.1 and later (i.e., whenever Tcl supports Unicode), when a
      string returned from Tcl contains bytes with the top bit set, we
      assume it is encoded in UTF-8, and decode it into a Unicode string
      object.
      
      Notes:
      
      - Passing Unicode strings to Tcl 8.0 does not do the right thing; this
      isn't worth fixing.
      
      - When passing an 8-bit string to Tcl 8.1 or later that has bytes with
      the top bit set, Tcl tries to interpret it as UTF-8; it seems to fall
      back on Latin-1 for non-UTF-8 bytes.  I'm not sure what to do about
      this besides telling the user to disambiguate such strings by
      converting them to Unicode (forcing the user to be explicit about the
      encoding).
      
      - Obviously it won't be possible to get binary data out of Tk this
      way.  Do we need that ability?  How to do it?
      990f5c6c
    • Guido van Rossum's avatar
      cc229ea7
    • Guido van Rossum's avatar
      49517821
  2. 03 May, 2000 14 commits
  3. 02 May, 2000 19 commits
  4. 01 May, 2000 1 commit
    • Guido van Rossum's avatar
      Marc-Andre Lemburg: · 0e4f657a
      Guido van Rossum authored
      Fixed \OOO interpretation for Unicode objects. \777 now
      correctly produces the Unicode character with ordinal 511.
      0e4f657a