Commit dcc56f8b authored by Georg Brandl's avatar Georg Brandl

Add bytes/remove unicode from the data model.

parent 85eb8c10
...@@ -289,52 +289,21 @@ Sequences ...@@ -289,52 +289,21 @@ Sequences
.. index:: .. index::
builtin: chr builtin: chr
builtin: ord builtin: ord
object: string builtin: str
single: character
single: byte
single: ASCII@ASCII
The items of a string are characters. There is no separate character type; a
character is represented by a string of one item. Characters represent (at
least) 8-bit bytes. The built-in functions :func:`chr` and :func:`ord` convert
between characters and nonnegative integers representing the byte values. Bytes
with the values 0-127 usually represent the corresponding ASCII values, but the
interpretation of values is up to the program. The string data type is also
used to represent arrays of bytes, e.g., to hold data read from a file.
.. index::
single: ASCII@ASCII
single: EBCDIC
single: character set
pair: string; comparison
builtin: chr
builtin: ord
(On systems whose native character set is not ASCII, strings may use EBCDIC in
their internal representation, provided the functions :func:`chr` and
:func:`ord` implement a mapping between ASCII and EBCDIC, and string comparison
preserves the ASCII order. Or perhaps someone can propose a better rule?)
Unicode
.. index::
builtin: unichr
builtin: ord
builtin: unicode
object: unicode
single: character single: character
single: integer single: integer
single: Unicode single: Unicode
The items of a Unicode object are Unicode code units. A Unicode code unit is The items of a string object are Unicode code units. A Unicode code
represented by a Unicode object of one item and can hold either a 16-bit or unit is represented by a string object of one item and can hold either
32-bit value representing a Unicode ordinal (the maximum value for the ordinal a 16-bit or 32-bit value representing a Unicode ordinal (the maximum
is given in ``sys.maxunicode``, and depends on how Python is configured at value for the ordinal is given in ``sys.maxunicode``, and depends on
compile time). Surrogate pairs may be present in the Unicode object, and will how Python is configured at compile time). Surrogate pairs may be
be reported as two separate items. The built-in functions :func:`unichr` and present in the Unicode object, and will be reported as two separate
:func:`ord` convert between code units and nonnegative integers representing the items. The built-in functions :func:`chr` and :func:`ord` convert
Unicode ordinals as defined in the Unicode Standard 3.0. Conversion from and to between code units and nonnegative integers representing the Unicode
other encodings are possible through the Unicode method :meth:`encode` and the ordinals as defined in the Unicode Standard 3.0. Conversion from and to
built-in function :func:`unicode`. other encodings are possible through the string method :meth:`encode`.
Tuples Tuples
.. index:: .. index::
...@@ -342,11 +311,12 @@ Sequences ...@@ -342,11 +311,12 @@ Sequences
pair: singleton; tuple pair: singleton; tuple
pair: empty; tuple pair: empty; tuple
The items of a tuple are arbitrary Python objects. Tuples of two or more items The items of a tuple are arbitrary Python objects. Tuples of two or
are formed by comma-separated lists of expressions. A tuple of one item (a more items are formed by comma-separated lists of expressions. A tuple
'singleton') can be formed by affixing a comma to an expression (an expression of one item (a 'singleton') can be formed by affixing a comma to an
by itself does not create a tuple, since parentheses must be usable for grouping expression (an expression by itself does not create a tuple, since
of expressions). An empty tuple can be formed by an empty pair of parentheses. parentheses must be usable for grouping of expressions). An empty
tuple can be formed by an empty pair of parentheses.
.. % Immutable sequences .. % Immutable sequences
...@@ -369,14 +339,23 @@ Sequences ...@@ -369,14 +339,23 @@ Sequences
Lists Lists
.. index:: object: list .. index:: object: list
The items of a list are arbitrary Python objects. Lists are formed by placing a The items of a list are arbitrary Python objects. Lists are formed by
comma-separated list of expressions in square brackets. (Note that there are no placing a comma-separated list of expressions in square brackets. (Note
special cases needed to form lists of length 0 or 1.) that there are no special cases needed to form lists of length 0 or 1.)
Bytes
.. index:: bytes, byte
A bytes object is a mutable array. The items are 8-bit bytes,
represented by integers in the range 0 <= x < 256. Bytes literals
(like ``b'abc'`` and the built-in function :func:`bytes` can be used to
construct bytes objects. Also, bytes objects can be decoded to strings
via the :meth:`decode` method.
.. index:: module: array .. index:: module: array
The extension module :mod:`array` provides an additional example of a mutable The extension module :mod:`array` provides an additional example of a
sequence type. mutable sequence type.
.. % Mutable sequences .. % Mutable sequences
...@@ -1230,12 +1209,14 @@ Basic customization ...@@ -1230,12 +1209,14 @@ Basic customization
builtin: str builtin: str
builtin: print builtin: print
Called by the :func:`str` built-in function and by the :func:`print` Called by the :func:`str` built-in function and by the :func:`print` function
function to compute the "informal" string representation of an object. This to compute the "informal" string representation of an object. This differs
differs from :meth:`__repr__` in that it does not have to be a valid Python from :meth:`__repr__` in that it does not have to be a valid Python
expression: a more convenient or concise representation may be used instead. expression: a more convenient or concise representation may be used instead.
The return value must be a string object. The return value must be a string object.
.. XXX what about subclasses of string?
.. method:: object.__format__(self, format_spec) .. method:: object.__format__(self, format_spec)
...@@ -1355,15 +1336,6 @@ Basic customization ...@@ -1355,15 +1336,6 @@ Basic customization
:meth:`__bool__`, all its instances are considered true. :meth:`__bool__`, all its instances are considered true.
.. method:: object.__unicode__(self)
.. index:: builtin: unicode
Called to implement :func:`unicode` builtin; should return a Unicode object.
When this method is not defined, string conversion is attempted, and the result
of string conversion is converted to Unicode using the system default encoding.
.. _attribute-access: .. _attribute-access:
Customizing attribute access Customizing attribute access
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment