Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
9d93b00a
Commit
9d93b00a
authored
Jan 14, 2015
by
Georg Brandl
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Closes #23181: codepoint -> code point
parent
604d49bf
Changes
5
Show whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
12 additions
and
12 deletions
+12
-12
Doc/c-api/unicode.rst
Doc/c-api/unicode.rst
+2
-2
Doc/library/codecs.rst
Doc/library/codecs.rst
+6
-6
Doc/library/htmllib.rst
Doc/library/htmllib.rst
+2
-2
Doc/library/json.rst
Doc/library/json.rst
+1
-1
Doc/tutorial/interpreter.rst
Doc/tutorial/interpreter.rst
+1
-1
No files found.
Doc/c-api/unicode.rst
View file @
9d93b00a
...
@@ -547,7 +547,7 @@ These are the UTF-32 codec APIs:
...
@@ -547,7 +547,7 @@ These are the UTF-32 codec APIs:
After completion, *\*byteorder* is set to the current byte order at the end
After completion, *\*byteorder* is set to the current byte order at the end
of input data.
of input data.
In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
In a narrow build code
points outside the BMP will be decoded as surrogate pairs.
If *byteorder* is *NULL*, the codec starts in native order mode.
If *byteorder* is *NULL*, the codec starts in native order mode.
...
@@ -580,7 +580,7 @@ These are the UTF-32 codec APIs:
...
@@ -580,7 +580,7 @@ These are the UTF-32 codec APIs:
mark (U+FEFF). In the other two modes, no BOM mark is prepended.
mark (U+FEFF). In the other two modes, no BOM mark is prepended.
If *Py_UNICODE_WIDE* is not defined, surrogate pairs will be output
If *Py_UNICODE_WIDE* is not defined, surrogate pairs will be output
as a single codepoint.
as a single code
point.
Return *NULL* if an exception was raised by the codec.
Return *NULL* if an exception was raised by the codec.
...
...
Doc/library/codecs.rst
View file @
9d93b00a
...
@@ -787,7 +787,7 @@ methods and attributes from the underlying stream.
...
@@ -787,7 +787,7 @@ methods and attributes from the underlying stream.
Encodings and Unicode
Encodings and Unicode
---------------------
---------------------
Unicode strings are stored internally as sequences of codepoints (to be precise
Unicode strings are stored internally as sequences of code
points (to be precise
as :c:type:`Py_UNICODE` arrays). Depending on the way Python is compiled (either
as :c:type:`Py_UNICODE` arrays). Depending on the way Python is compiled (either
via ``--enable-unicode=ucs2`` or ``--enable-unicode=ucs4``, with the
via ``--enable-unicode=ucs2`` or ``--enable-unicode=ucs4``, with the
former being the default) :c:type:`Py_UNICODE` is either a 16-bit or 32-bit data
former being the default) :c:type:`Py_UNICODE` is either a 16-bit or 32-bit data
...
@@ -796,24 +796,24 @@ and how these arrays are stored as bytes become an issue. Transforming a
...
@@ -796,24 +796,24 @@ and how these arrays are stored as bytes become an issue. Transforming a
unicode object into a sequence of bytes is called encoding and recreating the
unicode object into a sequence of bytes is called encoding and recreating the
unicode object from the sequence of bytes is known as decoding. There are many
unicode object from the sequence of bytes is known as decoding. There are many
different methods for how this transformation can be done (these methods are
different methods for how this transformation can be done (these methods are
also called encodings). The simplest method is to map the codepoints 0-255 to
also called encodings). The simplest method is to map the code
points 0-255 to
the bytes ``0x0``-``0xff``. This means that a unicode object that contains
the bytes ``0x0``-``0xff``. This means that a unicode object that contains
codepoints above ``U+00FF`` can't be encoded with this method (which is called
code
points above ``U+00FF`` can't be encoded with this method (which is called
``'latin-1'``
or ``'iso-8859-1'``). :func:`unicode.encode` will raise a
``'latin-1'``
or ``'iso-8859-1'``). :func:`unicode.encode` will raise a
:exc:`UnicodeEncodeError`
that looks like this: ``UnicodeEncodeError: 'latin-1'
:exc:`UnicodeEncodeError`
that looks like this: ``UnicodeEncodeError: 'latin-1'
codec can't encode character u'\u1234' in position 3: ordinal not in
codec can't encode character u'\u1234' in position 3: ordinal not in
range(256)``.
range(256)``.
There's another group of encodings (the so called charmap encodings) that choose
There's another group of encodings (the so called charmap encodings) that choose
a different subset of all unicode code points and how these codepoints are
a different subset of all unicode code points and how these code
points are
mapped to the bytes ``0x0``-``0xff``. To see how this is done simply open
mapped to the bytes ``0x0``-``0xff``. To see how this is done simply open
e.g. :file:`encodings/cp1252.py` (which is an encoding that is used primarily on
e.g. :file:`encodings/cp1252.py` (which is an encoding that is used primarily on
Windows). There's a string constant with 256 characters that shows you which
Windows). There's a string constant with 256 characters that shows you which
character is mapped to which byte value.
character is mapped to which byte value.
All of these encodings can only encode 256 of the 1114112 codepoints
All of these encodings can only encode 256 of the 1114112 code
points
defined in unicode. A simple and straightforward way that can store each Unicode
defined in unicode. A simple and straightforward way that can store each Unicode
code point, is to store each codepoint as four consecutive bytes. There are two
code point, is to store each code
point as four consecutive bytes. There are two
possibilities: store the bytes in big endian or in little endian order. These
possibilities: store the bytes in big endian or in little endian order. These
two encodings are called ``UTF-32-BE`` and ``UTF-32-LE`` respectively. Their
two encodings are called ``UTF-32-BE`` and ``UTF-32-LE`` respectively. Their
disadvantage is that if e.g. you use ``UTF-32-BE`` on a little endian machine you
disadvantage is that if e.g. you use ``UTF-32-BE`` on a little endian machine you
...
...
Doc/library/htmllib.rst
View file @
9d93b00a
...
@@ -185,14 +185,14 @@ can be handled using simple textual substitution in the Latin-1 character set
...
@@ -185,14 +185,14 @@ can be handled using simple textual substitution in the Latin-1 character set
.. data:: name2codepoint
.. data:: name2codepoint
A dictionary that maps HTML entity names to the Unicode codepoints.
A dictionary that maps HTML entity names to the Unicode code
points.
.. versionadded:: 2.3
.. versionadded:: 2.3
.. data:: codepoint2name
.. data:: codepoint2name
A dictionary that maps Unicode codepoints to HTML entity names.
A dictionary that maps Unicode code
points to HTML entity names.
.. versionadded:: 2.3
.. versionadded:: 2.3
Doc/library/json.rst
View file @
9d93b00a
...
@@ -533,7 +533,7 @@ The RFC does not explicitly forbid JSON strings which contain byte sequences
...
@@ -533,7 +533,7 @@ The RFC does not explicitly forbid JSON strings which contain byte sequences
that don't correspond to valid Unicode characters (e.g. unpaired UTF-16
that don't correspond to valid Unicode characters (e.g. unpaired UTF-16
surrogates), but it does note that they may cause interoperability problems.
surrogates), but it does note that they may cause interoperability problems.
By default, this module accepts and outputs (when present in the original
By default, this module accepts and outputs (when present in the original
:class:`str`) codepoints for such sequences.
:class:`str`) code
points for such sequences.
Infinite and NaN Number Values
Infinite and NaN Number Values
...
...
Doc/tutorial/interpreter.rst
View file @
9d93b00a
...
@@ -140,7 +140,7 @@ encodings can be found in the Python Library Reference, in the section on
...
@@ -140,7 +140,7 @@ encodings can be found in the Python Library Reference, in the section on
For example, to write Unicode literals including the Euro currency symbol, the
For example, to write Unicode literals including the Euro currency symbol, the
ISO-8859-15 encoding can be used, with the Euro symbol having the ordinal value
ISO-8859-15 encoding can be used, with the Euro symbol having the ordinal value
164. This script, when saved in the ISO-8859-15 encoding, will print the value
164. This script, when saved in the ISO-8859-15 encoding, will print the value
8364 (the Unicode codepoint corresponding to the Euro symbol) and then exit::
8364 (the Unicode code
point corresponding to the Euro symbol) and then exit::
# -*- coding: iso-8859-15 -*-
# -*- coding: iso-8859-15 -*-
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment