• Guido van Rossum's avatar
    Two changes to improve (I hope) Unicode support. · afa13e07
    Guido van Rossum authored
    1. In Tcl 8.2 and later, use Tcl_NewUnicodeObj() when passing a Python
    Unicode object rather than going through UTF-8.  (This function
    doesn't exist in Tcl 8.1, so there the original UTF-8 code is still
    used; in Tcl 8.0 there is no support for Unicode.)  This assumes that
    Tcl_UniChar is the same thing as Py_UNICODE; a run-time error is
    issued if this is not the case.
    
    2. In Tcl 8.1 and later (i.e., whenever Tcl supports Unicode), when a
    string returned from Tcl contains bytes with the top bit set, we
    assume it is encoded in UTF-8, and decode it into a Unicode string
    object.
    
    Notes:
    
    - Passing Unicode strings to Tcl 8.0 does not do the right thing; this
    isn't worth fixing.
    
    - When passing an 8-bit string to Tcl 8.1 or later that has bytes with
    the top bit set, Tcl tries to interpret it as UTF-8; it seems to fall
    back on Latin-1 for non-UTF-8 bytes.  I'm not sure what to do about
    this besides telling the user to disambiguate such strings by
    converting them to Unicode (forcing the user to be explicit about the
    encoding).
    
    - Obviously it won't be possible to get binary data out of Tk this
    way.  Do we need that ability?  How to do it?
    afa13e07
_tkinter.c 46.8 KB