Commit dcc56f8b authored by Georg Brandl's avatar Georg Brandl

Add bytes/remove unicode from the data model.

parent 85eb8c10
......@@ -289,52 +289,21 @@ Sequences
.. index::
builtin: chr
builtin: ord
object: string
single: character
single: byte
single: ASCII@ASCII
The items of a string are characters. There is no separate character type; a
character is represented by a string of one item. Characters represent (at
least) 8-bit bytes. The built-in functions :func:`chr` and :func:`ord` convert
between characters and nonnegative integers representing the byte values. Bytes
with the values 0-127 usually represent the corresponding ASCII values, but the
interpretation of values is up to the program. The string data type is also
used to represent arrays of bytes, e.g., to hold data read from a file.
.. index::
single: ASCII@ASCII
single: EBCDIC
single: character set
pair: string; comparison
builtin: chr
builtin: ord
(On systems whose native character set is not ASCII, strings may use EBCDIC in
their internal representation, provided the functions :func:`chr` and
:func:`ord` implement a mapping between ASCII and EBCDIC, and string comparison
preserves the ASCII order. Or perhaps someone can propose a better rule?)
Unicode
.. index::
builtin: unichr
builtin: ord
builtin: unicode
object: unicode
builtin: str
single: character
single: integer
single: Unicode
The items of a Unicode object are Unicode code units. A Unicode code unit is
represented by a Unicode object of one item and can hold either a 16-bit or
32-bit value representing a Unicode ordinal (the maximum value for the ordinal
is given in ``sys.maxunicode``, and depends on how Python is configured at
compile time). Surrogate pairs may be present in the Unicode object, and will
be reported as two separate items. The built-in functions :func:`unichr` and
:func:`ord` convert between code units and nonnegative integers representing the
Unicode ordinals as defined in the Unicode Standard 3.0. Conversion from and to
other encodings are possible through the Unicode method :meth:`encode` and the
built-in function :func:`unicode`.
The items of a string object are Unicode code units. A Unicode code
unit is represented by a string object of one item and can hold either
a 16-bit or 32-bit value representing a Unicode ordinal (the maximum
value for the ordinal is given in ``sys.maxunicode``, and depends on
how Python is configured at compile time). Surrogate pairs may be
present in the Unicode object, and will be reported as two separate
items. The built-in functions :func:`chr` and :func:`ord` convert
between code units and nonnegative integers representing the Unicode
ordinals as defined in the Unicode Standard 3.0. Conversion from and to
other encodings are possible through the string method :meth:`encode`.
Tuples
.. index::
......@@ -342,11 +311,12 @@ Sequences
pair: singleton; tuple
pair: empty; tuple
The items of a tuple are arbitrary Python objects. Tuples of two or more items
are formed by comma-separated lists of expressions. A tuple of one item (a
'singleton') can be formed by affixing a comma to an expression (an expression
by itself does not create a tuple, since parentheses must be usable for grouping
of expressions). An empty tuple can be formed by an empty pair of parentheses.
The items of a tuple are arbitrary Python objects. Tuples of two or
more items are formed by comma-separated lists of expressions. A tuple
of one item (a 'singleton') can be formed by affixing a comma to an
expression (an expression by itself does not create a tuple, since
parentheses must be usable for grouping of expressions). An empty
tuple can be formed by an empty pair of parentheses.
.. % Immutable sequences
......@@ -369,14 +339,23 @@ Sequences
Lists
.. index:: object: list
The items of a list are arbitrary Python objects. Lists are formed by placing a
comma-separated list of expressions in square brackets. (Note that there are no
special cases needed to form lists of length 0 or 1.)
The items of a list are arbitrary Python objects. Lists are formed by
placing a comma-separated list of expressions in square brackets. (Note
that there are no special cases needed to form lists of length 0 or 1.)
Bytes
.. index:: bytes, byte
A bytes object is a mutable array. The items are 8-bit bytes,
represented by integers in the range 0 <= x < 256. Bytes literals
(like ``b'abc'`` and the built-in function :func:`bytes` can be used to
construct bytes objects. Also, bytes objects can be decoded to strings
via the :meth:`decode` method.
.. index:: module: array
The extension module :mod:`array` provides an additional example of a mutable
sequence type.
The extension module :mod:`array` provides an additional example of a
mutable sequence type.
.. % Mutable sequences
......@@ -1230,12 +1209,14 @@ Basic customization
builtin: str
builtin: print
Called by the :func:`str` built-in function and by the :func:`print`
function to compute the "informal" string representation of an object. This
differs from :meth:`__repr__` in that it does not have to be a valid Python
Called by the :func:`str` built-in function and by the :func:`print` function
to compute the "informal" string representation of an object. This differs
from :meth:`__repr__` in that it does not have to be a valid Python
expression: a more convenient or concise representation may be used instead.
The return value must be a string object.
.. XXX what about subclasses of string?
.. method:: object.__format__(self, format_spec)
......@@ -1355,15 +1336,6 @@ Basic customization
:meth:`__bool__`, all its instances are considered true.
.. method:: object.__unicode__(self)
.. index:: builtin: unicode
Called to implement :func:`unicode` builtin; should return a Unicode object.
When this method is not defined, string conversion is attempted, and the result
of string conversion is converted to Unicode using the system default encoding.
.. _attribute-access:
Customizing attribute access
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment