Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
f5fec3c8
Commit
f5fec3c8
authored
Jul 19, 2001
by
Andrew M. Kuchling
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Fill out the Unicode section, somewhat uncertainly
parent
8cfa9055
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
24 additions
and
7 deletions
+24
-7
Doc/whatsnew/whatsnew22.tex
Doc/whatsnew/whatsnew22.tex
+24
-7
No files found.
Doc/whatsnew/whatsnew22.tex
View file @
f5fec3c8
...
...
@@ -340,11 +340,21 @@ and Tim Peters, with other fixes from the Python Labs crew.}
Python's Unicode support has been enhanced a bit in 2.2. Unicode
strings are usually stored as UCS-2, as 16-bit unsigned integers.
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers
by supplying
\longprogramopt
{
enable-unicode=ucs4
}
to the configure script.
XXX explain surrogates? I have to figure out what the changes mean to users.
Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned
integers, as its internal encoding by supplying
\longprogramopt
{
enable-unicode=ucs4
}
to the configure script. When
built to use UCS-4, in theory Python could handle Unicode characters
from U-00000000 to U-7FFFFFFF. Being able to use UCS-4 internally is
a necessary step to do that, but it's not the only step, and in Python
2.2alpha1 the work isn't complete yet. For example, the
\function
{
unichr()
}
function still only accepts values from 0 to
65535, and there's no
\code
{
\e
U
}
notation for embedding characters
greater than 65535 in a Unicode string literal. All this is the
province of the still-unimplemented PEP 261, ``Support for `wide'
Unicode characters''; consult it for further details, and please offer
comments and suggestions on the proposal it describes.
Another change is much simpler to explain.
Since their introduction, Unicode strings have supported an
\method
{
encode()
}
method to convert the string to a selected encoding
such as UTF-8 or Latin-1. A symmetric
...
...
@@ -375,9 +385,16 @@ end
'furrfu'
\end{verbatim}
References: http://mail.python.org/pipermail/i18n-sig/2001-June/001107.html
and following thread.
\method
{
encode()
}
and
\method
{
decode()
}
were implemented by
Marc-Andr
\'
e Lemburg. The changes to support using UCS-4 internally
were implemented by Fredrik Lundh and Martin von L
\"
owis.
\begin{seealso}
\seepep
{
261
}{
Support for `wide' Unicode characters
}{
PEP written by
Paul Prescod. Not yet accepted or fully implemented.
}
\end{seealso}
%======================================================================
\section
{
PEP 227: Nested Scopes
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment