Commit #1068: new docs for PEP 3101. Also document the old string formatting...

Commit #1068: new docs for PEP 3101. Also document the old string formatting as "old", and begin documenting str/unicode unification.

Commit #1068: new docs for PEP 3101. Also document the old string formatting...
Commit #1068: new docs for PEP 3101. Also document the old string formatting as "old", and begin documenting str/unicode unification.
4b49131f · Georg Brandl · 20594ccf · 4b49131f · 4b49131f · 4b49131f
Commit 4b49131f authored Aug 31, 2007 by Georg Brandl
9 changed files
--- a/Doc/library/fpformat.rst
+++ b/Doc/library/fpformat.rst
@@ -12,8 +12,8 @@ numbers representations in 100% pure Python.

 .. note::

-   This module is unnecessary: everything here can be done using the ``%`` string
-   interpolation operator described in the :ref:`string-formatting` section.
+   This module is unnecessary: everything here can be done using the string
+   formatting functions described in the :ref:`string-formatting` section.

 The :mod:`fpformat` module defines the following functions and an exception:


--- a/Doc/library/functions.rst
+++ b/Doc/library/functions.rst
@@ -449,6 +449,22 @@ available.  They are listed here in alphabetical order.

   The float type is described in :ref:`typesnumeric`.

+.. function:: format(value[, format_spec])
+
+   .. index::
+      pair: str; format
+      single: __format__
+   
+   Convert a string or a number to a "formatted" representation, as controlled
+   by *format_spec*.  The interpretation of *format_spec* will depend on the
+   type of the *value* argument, however there is a standard formatting syntax
+   that is used by most built-in types: :ref:`formatspec`.
+   
+   .. note::
+
+      ``format(value, format_spec)`` merely calls ``value.__format__(format_spec)``.
+
+
 .. function:: frozenset([iterable])
   :noindex:

@@ -990,10 +1006,9 @@ available.  They are listed here in alphabetical order.

   For more information on strings see :ref:`typesseq` which describes sequence
   functionality (strings are sequences), and also the string-specific methods
-   described in the :ref:`string-methods` section. To output formatted strings
-   use template strings or the ``%`` operator described in the
-   :ref:`string-formatting` section. In addition see the :ref:`stringservices`
-   section. See also :func:`unicode`.
+   described in the :ref:`string-methods` section. To output formatted strings,
+   see the :ref:`string-formatting` section. In addition see the
+   :ref:`stringservices` section.


 .. function:: sum(iterable[, start])

--- a/Doc/library/logging.rst
+++ b/Doc/library/logging.rst
@@ -611,8 +611,10 @@ This time, all messages with a severity of DEBUG or above were handled, and the
 format of the messages was also changed, and output went to the specified file
 rather than the console.

-Formatting uses standard Python string formatting - see section
-:ref:`string-formatting`. The format string takes the following common
+.. XXX logging should probably be updated!
+
+Formatting uses the old Python string formatting - see section
+:ref:`old-string-formatting`. The format string takes the following common
 specifiers. For a complete list of specifiers, consult the :class:`Formatter`
 documentation.

@@ -1483,7 +1485,7 @@ A Formatter can be initialized with a format string which makes use of knowledge
 of the :class:`LogRecord` attributes - such as the default value mentioned above
 making use of the fact that the user's message and arguments are pre-formatted
 into a :class:`LogRecord`'s *message* attribute.  This format string contains
-standard python %-style mapping keys. See section :ref:`string-formatting`
+standard python %-style mapping keys. See section :ref:`old-string-formatting`
 for more information on string formatting.

 Currently, the useful mapping keys in a :class:`LogRecord` are:

--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -480,19 +480,18 @@ object) supplying the :meth:`__iter__` and :meth:`__next__` methods.

 .. _typesseq:

-Sequence Types --- :class:`str`, :class:`unicode`, :class:`list`, :class:`tuple`, :class:`buffer`, :class:`range`
-=================================================================================================================
-
-There are six sequence types: strings, Unicode strings, lists, tuples, buffers,
-and range objects.
-(For other containers see the built in :class:`dict`, :class:`list`,
-:class:`set`, and :class:`tuple` classes, and the :mod:`collections`
-module.)
+Sequence Types --- :class:`str`, :class:`bytes`, :class:`list`, :class:`tuple`, :class:`buffer`, :class:`range`
+===============================================================================================================

+There are five sequence types: strings, byte sequences, lists, tuples, buffers,
+and range objects.  (For other containers see the built in :class:`dict`,
+:class:`list`, :class:`set`, and :class:`tuple` classes, and the
+:mod:`collections` module.)

 .. index::
   object: sequence
   object: string
+   object: bytes
   object: tuple
   object: list
   object: buffer
@@ -501,21 +500,32 @@ module.)
 String literals are written in single or double quotes: ``'xyzzy'``,
 ``"frobozz"``.  See :ref:`strings` for more about string literals.  In addition
 to the functionality described here, there are also string-specific methods
-described in the :ref:`string-methods` section.  Lists are constructed with
-square brackets, separating items with commas: ``[a, b, c]``.  Tuples are
-constructed by the comma operator (not within square brackets), with or without
-enclosing parentheses, but an empty tuple must have the enclosing parentheses,
-such as ``a, b, c`` or ``()``.  A single item tuple must have a trailing comma,
-such as ``(d,)``.
+described in the :ref:`string-methods` section.  Bytes objects can be
+constructed from literals too; use a ``b`` prefix with normal string syntax:
+``b'xyzzy'``.
+
+.. caveat::
+
+   While string objects are sequences of characters (represented by strings of
+   length 1), bytes objects are sequences of *integers* (between 0 and 255),
+   representing the ASCII value of single bytes.  That means that for a bytes
+   object *b*, ``b[0]`` will be an integer, while ``b[0:1]`` will be a bytes
+   object of length 1.
+
+Lists are constructed with square brackets, separating items with commas: ``[a,
+b, c]``.  Tuples are constructed by the comma operator (not within square
+brackets), with or without enclosing parentheses, but an empty tuple must have
+the enclosing parentheses, such as ``a, b, c`` or ``()``.  A single item tuple
+must have a trailing comma, such as ``(d,)``.

 Buffer objects are not directly supported by Python syntax, but can be created
 by calling the builtin function :func:`buffer`.  They don't support
 concatenation or repetition.

-Objects of type range are similar to buffers in that there is no specific syntax to
-create them, but they are created using the :func:`range` function.  They don't
-support slicing, concatenation or repetition, and using ``in``, ``not in``,
-:func:`min` or :func:`max` on them is inefficient.
+Objects of type range are similar to buffers in that there is no specific syntax
+to create them, but they are created using the :func:`range` function.  They
+don't support slicing, concatenation or repetition, and using ``in``, ``not
+in``, :func:`min` or :func:`max` on them is inefficient.

 Most sequence types support the following operations.  The ``in`` and ``not in``
 operations have the same priorities as the comparison operations.  The ``+`` and
@@ -555,12 +565,11 @@ are sequences of the same type; *n*, *i* and *j* are integers:
 | ``max(s)``       | largest item of *s*            |          |
 +------------------+--------------------------------+----------+

-Sequence types also support comparisons. In particular, tuples and lists
-are compared lexicographically by comparing corresponding
-elements. This means that to compare equal, every element must compare
-equal and the two sequences must be of the same type and have the same
-length. (For full details see :ref:`comparisons` in the language
-reference.)
+Sequence types also support comparisons. In particular, tuples and lists are
+compared lexicographically by comparing corresponding elements. This means that
+to compare equal, every element must compare equal and the two sequences must be
+of the same type and have the same length. (For full details see
+:ref:`comparisons` in the language reference.)

 .. index::
   triple: operations on; sequence; types
@@ -578,10 +587,8 @@ reference.)
 Notes:

 (1)
-   When *s* is a string or Unicode string object the ``in`` and ``not in``
-   operations act like a substring test.  In Python versions before 2.3, *x* had to
-   be a string of length 1. In Python 2.3 and beyond, *x* may be a string of any
-   length.
+   When *s* is a string object, the ``in`` and ``not in`` operations act like a
+   substring test.

 (2)
   Values of *n* less than ``0`` are treated as ``0`` (which yields an empty
@@ -642,6 +649,8 @@ Notes:
      Formerly, string concatenation never occurred in-place.


+.. XXX add bytes methods
+
 .. _string-methods:

 String Methods
@@ -649,19 +658,15 @@ String Methods

 .. index:: pair: string; methods

-Below are listed the string methods which both 8-bit strings and Unicode objects
-support. In addition, Python's strings support the sequence type methods
-described in the :ref:`typesseq` section. To output formatted strings
-use template strings or the ``%`` operator described in the
-:ref:`string-formatting` section. Also, see the :mod:`re` module for
-string functions based on regular expressions.
+String objects support the methods listed below.  In addition, Python's strings
+support the sequence type methods described in the :ref:`typesseq` section. To
+output formatted strings, see the :ref:`string-formatting` section. Also, see
+the :mod:`re` module for string functions based on regular expressions.

 .. method:: str.capitalize()

   Return a copy of the string with only its first character capitalized.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.center(width[, fillchar])

@@ -679,6 +684,7 @@ string functions based on regular expressions.
   slice notation.


+.. XXX what about str.decode???
 .. method:: str.decode([encoding[, errors]])

   Decodes the string using the codec registered for *encoding*. *encoding*
@@ -737,6 +743,24 @@ string functions based on regular expressions.
   found.


+.. method:: str.format(format_string, *args, **ksargs)
+
+   Perform a string formatting operation.  The *format_string* argument can
+   contain literal text or replacement fields delimited by braces ``{}``.  Each
+   replacement field contains either the numeric index of a positional argument,
+   or the name of a keyword argument.  Returns a copy of *format_string* where
+   each replacement field is replaced with the string value of the corresponding
+   argument.
+
+      >>> "The sum of 1 + 2 is {0}".format(1+2)
+      'The sum of 1 + 2 is 3'
+
+   See :ref:`formatstrings` for a description of the various formatting options
+   that can be specified in format strings.
+
+   .. versionadded:: 3.0
+
+
 .. method:: str.index(sub[, start[, end]])

   Like :meth:`find`, but raise :exc:`ValueError` when the substring is not found.
@@ -747,31 +771,23 @@ string functions based on regular expressions.
   Return true if all characters in the string are alphanumeric and there is at
   least one character, false otherwise.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.isalpha()

   Return true if all characters in the string are alphabetic and there is at least
   one character, false otherwise.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.isdigit()

   Return true if all characters in the string are digits and there is at least one
   character, false otherwise.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.isidentifier()

   Return true if the string is a valid identifier according to the language
-   definition.
-
-   .. XXX link to the definition?
+   definition, section :ref:`identifiers`.


 .. method:: str.islower()
@@ -779,16 +795,12 @@ string functions based on regular expressions.
   Return true if all cased characters in the string are lowercase and there is at
   least one cased character, false otherwise.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.isspace()

   Return true if there are only whitespace characters in the string and there is
   at least one character, false otherwise.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.istitle()

@@ -796,16 +808,12 @@ string functions based on regular expressions.
   character, for example uppercase characters may only follow uncased characters
   and lowercase characters only cased ones.  Return false otherwise.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.isupper()

   Return true if all cased characters in the string are uppercase and there is at
   least one cased character, false otherwise.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.join(seq)

@@ -827,8 +835,6 @@ string functions based on regular expressions.

   Return a copy of the string converted to lowercase.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.lstrip([chars])

@@ -984,41 +990,24 @@ string functions based on regular expressions.
   Return a copy of the string with uppercase characters converted to lowercase and
   vice versa.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.title()

   Return a titlecased version of the string: words start with uppercase
   characters, all remaining cased characters are lowercase.

-   For 8-bit strings, this method is locale-dependent.
-
-
-.. method:: str.translate(table[, deletechars])

-   Return a copy of the string where all characters occurring in the optional
-   argument *deletechars* are removed, and the remaining characters have been
-   mapped through the given translation table, which must be a string of length
-   256.
+.. method:: str.translate(map)

-   You can use the :func:`maketrans` helper function in the :mod:`string` module to
-   create a translation table. For string objects, set the *table* argument to
-   ``None`` for translations that only delete characters::
+   Returns a copy of the *s* where all characters have been mapped through the
+   *map* which must be a mapping of Unicode ordinals (integers) to Unicode
+   ordinals, strings or ``None``.  Unmapped characters are left
+   untouched. Characters mapped to ``None`` are deleted.

-      >>> 'read this short text'.translate(None, 'aeiou')
-      'rd ths shrt txt'
-
-   .. versionadded:: 2.6
-      Support for a ``None`` *table* argument.
+   .. note::

-   For Unicode objects, the :meth:`translate` method does not accept the optional
-   *deletechars* argument.  Instead, it returns a copy of the *s* where all
-   characters have been mapped through the given translation table which must be a
-   mapping of Unicode ordinals to Unicode ordinals, Unicode strings or ``None``.
-   Unmapped characters are left untouched. Characters mapped to ``None`` are
-   deleted.  Note, a more flexible approach is to create a custom character mapping
-   codec using the :mod:`codecs` module (see :mod:`encodings.cp1251` for an
+      A more flexible approach is to create a custom character mapping codec
+      using the :mod:`codecs` module (see :mod:`encodings.cp1251` for an
      example).


@@ -1026,8 +1015,6 @@ string functions based on regular expressions.

   Return a copy of the string converted to uppercase.

-   For 8-bit strings, this method is locale-dependent.
-

 .. method:: str.zfill(width)

@@ -1037,10 +1024,10 @@ string functions based on regular expressions.
   .. versionadded:: 2.2.2


-.. _string-formatting:
+.. _old-string-formatting:

-String Formatting Operations
----------------------------
+Old String Formatting Operations
+--------------------------------

 .. index::
   single: formatting, string (%)
@@ -1052,14 +1039,18 @@ String Formatting Operations
   single: % formatting
   single: % interpolation

-String and Unicode objects have one unique built-in operation: the ``%``
-operator (modulo).  This is also known as the string *formatting* or
-*interpolation* operator.  Given ``format % values`` (where *format* is a string
-or Unicode object), ``%`` conversion specifications in *format* are replaced
-with zero or more elements of *values*.  The effect is similar to the using
-:cfunc:`sprintf` in the C language.  If *format* is a Unicode object, or if any
-of the objects being converted using the ``%s`` conversion are Unicode objects,
-the result will also be a Unicode object.
+.. XXX better?
+
+.. note::
+
+   The formatting operations described here are obsolete and my go away in future
+   versions of Python.  Use the new :ref:`string-formatting` in new code.
+
+String objects have one unique built-in operation: the ``%`` operator (modulo).
+This is also known as the string *formatting* or *interpolation* operator.
+Given ``format % values`` (where *format* is a string), ``%`` conversion
+specifications in *format* are replaced with zero or more elements of *values*.
+The effect is similar to the using :cfunc:`sprintf` in the C language.

 If *format* requires a single argument, *values* may be a single non-tuple
 object. [#]_  Otherwise, *values* must be a tuple with exactly the number of
@@ -1164,7 +1155,7 @@ The conversion types are:
 | ``'r'``    | String (converts any python object using            | \(5)  |
 |            | :func:`repr`).                                      |       |
 +------------+-----------------------------------------------------+-------+
-| ``'s'``    | String (converts any python object using            | \(6)  |
+| ``'s'``    | String (converts any python object using            |       |
 |            | :func:`str`).                                       |       |
 +------------+-----------------------------------------------------+-------+
 | ``'%'``    | No argument is converted, results in a ``'%'``      |       |
@@ -1203,9 +1194,6 @@ Notes:

   The precision determines the maximal number of characters used.

-(6)
-   If the object or format provided is a :class:`unicode` string, the resulting
-   string will also be :class:`unicode`.

   The precision determines the maximal number of characters used.

@@ -2019,6 +2007,7 @@ the particular object.
   on all file-like objects.


+.. XXX does this still apply?
 .. attribute:: file.encoding

   The encoding that this file uses. When Unicode strings are written to a file,

--- a/Doc/library/string.rst
+++ b/Doc/library/string.rst
@@ -8,15 +8,13 @@

 .. index:: module: re

-The :mod:`string` module contains a number of useful constants and
-classes, as well as some deprecated legacy functions that are also
-available as methods on strings. In addition, Python's built-in string
-classes support the sequence type methods described in the
-:ref:`typesseq` section, and also the string-specific methods described
-in the :ref:`string-methods` section. To output formatted strings use
-template strings or the ``%`` operator described in the
-:ref:`string-formatting` section. Also, see the :mod:`re` module for
-string functions based on regular expressions.
+The :mod:`string` module contains a number of useful constants and classes, as
+well as some deprecated legacy functions that are also available as methods on
+strings. In addition, Python's built-in string classes support the sequence type
+methods described in the :ref:`typesseq` section, and also the string-specific
+methods described in the :ref:`string-methods` section. To output formatted
+strings, see the :ref:`string-formatting` section. Also, see the :mod:`re`
+module for string functions based on regular expressions.


 String constants
@@ -78,6 +76,354 @@ The constants defined in this module are:
   vertical tab.


+.. _string-formatting:
+
+String Formatting
+-----------------
+
+Starting in Python 3.0, the built-in string class provides the ability to do
+complex variable substitutions and value formatting via the :func:`format`
+method described in :pep:`3101`.  The :class:`Formatter` class in the
+:mod:`string` module allows you to create and customize your own string
+formatting behaviors using the same implementation as the built-in
+:meth:`format` method.
+
+.. class:: Formatter
+
+   The :class:`Formatter` class has the following public methods:
+
+   .. method:: format(format_string, *args, *kwargs)
+
+      :meth:`format` is the primary API method.  It takes a format template
+      string, and an arbitrary set of positional and keyword argument.
+      :meth:`format` is just a wrapper that calls :meth:`vformat`.
+
+   .. method:: vformat(format_string, args, kwargs)
+   
+      This function does the actual work of formatting.  It is exposed as a
+      separate function for cases where you want to pass in a predefined
+      dictionary of arguments, rather than unpacking and repacking the
+      dictionary as individual arguments using the ``*args`` and ``**kwds``
+      syntax.  :meth:`vformat` does the work of breaking up the format template
+      string into character data and replacement fields.  It calls the various
+      methods described below.
+
+   In addition, the :class:`Formatter` defines a number of methods that are
+   intended to be replaced by subclasses:
+
+   .. method:: parse(format_string)
+   
+      Loop over the format_string and return an iterable of tuples
+      (*literal_text*, *field_name*, *format_spec*, *conversion*).  This is used
+      by :meth:`vformat` to break the string in to either literal text, or
+      replacement fields.
+      
+      The values in the tuple conceptually represent a span of literal text
+      followed by a single replacement field.  If there is no literal text
+      (which can happen if two replacement fields occur consecutively), then
+      *literal_text* will be a zero-length string.  If there is no replacement
+      field, then the values of *field_name*, *format_spec* and *conversion*
+      will be ``None``.
+
+   .. method:: get_field(field_name, args, kwargs, used_args)
+
+      Given *field_name* as returned by :meth:`parse` (see above), convert it to
+      an object to be formatted.  The default version takes strings of the form
+      defined in :pep:`3101`, such as "0[name]" or "label.title".  It records
+      which args have been used in *used_args*. *args* and *kwargs* are as
+      passed in to :meth:`vformat`.
+
+   .. method:: get_value(key, args, kwargs)
+   
+      Retrieve a given field value.  The *key* argument will be either an
+      integer or a string.  If it is an integer, it represents the index of the
+      positional argument in *args*; if it is a string, then it represents a
+      named argument in *kwargs*.
+
+      The *args* parameter is set to the list of positional arguments to
+      :meth:`vformat`, and the *kwargs* parameter is set to the dictionary of
+      keyword arguments.
+
+      For compound field names, these functions are only called for the first
+      component of the field name; Subsequent components are handled through
+      normal attribute and indexing operations.
+
+      So for example, the field expression '0.name' would cause
+      :meth:`get_value` to be called with a *key* argument of 0.  The ``name``
+      attribute will be looked up after :meth:`get_value` returns by calling the
+      built-in :func:`getattr` function.
+
+      If the index or keyword refers to an item that does not exist, then an
+      :exc:`IndexError` or :exc:`KeyError` should be raised.
+
+   .. method:: check_unused_args(used_args, args, kwargs)
+
+      Implement checking for unused arguments if desired.  The arguments to this
+      function is the set of all argument keys that were actually referred to in
+      the format string (integers for positional arguments, and strings for
+      named arguments), and a reference to the *args* and *kwargs* that was
+      passed to vformat.  The set of unused args can be calculated from these
+      parameters.  :meth:`check_unused_args` is assumed to throw an exception if
+      the check fails.
+
+   .. method:: format_field(value, format_spec)
+
+      :meth:`format_field` simply calls the global :func:`format` built-in.  The
+      method is provided so that subclasses can override it.
+
+   .. method:: convert_field(value, conversion)
+   
+      Converts the value (returned by :meth:`get_field`) given a conversion type
+      (as in the tuple returned by the :meth:`parse` method.)  The default
+      version understands 'r' (repr) and 's' (str) conversion types.
+
+   .. versionadded:: 3.0
+
+.. _formatstrings:
+
+Format String Syntax
+--------------------
+
+The :meth:`str.format` method and the :class:`Formatter` class share the same
+syntax for format strings (although in the case of :class:`Formatter`,
+subclasses can define their own format string syntax.)
+
+Format strings contain "replacement fields" surrounded by curly braces ``{}``.
+Anything that is not contained in braces is considered literal text, which is
+copied unchanged to the output.  If you need to include a brace character in the
+literal text, it can be escaped by doubling: ``{{`` and ``}}``.
+
+The grammar for a replacement field is as follows:
+
+   .. productionlist:: sf
+      replacement_field: "{" `field_name` ["!" `conversion`] [":" `format_spec`] "}"
+      field_name: (`identifier` | `integer`) ("." `attribute_name` | "[" element_index "]")*
+      attribute_name: `identifier`
+      element_index: `integer`
+      conversion: "r" | "s"
+      format_spec: <described in the next section>
+      
+In less formal terms, the replacement field starts with a *field_name*, which
+can either be a number (for a positional argument), or an identifier (for
+keyword arguments).  Following this is an optional *conversion* field, which is
+preceded by an exclamation point ``'!'``, and a *format_spec*, which is preceded
+by a colon ``':'``.
+
+The *field_name* itself begins with either a number or a keyword.  If it's a
+number, it refers to a positional argument, and if it's a keyword it refers to a
+named keyword argument.  This can be followed by any number of index or
+attribute expressions. An expression of the form ``'.name'`` selects the named
+attribute using :func:`getattr`, while an expression of the form ``'[index]'``
+does an index lookup using :func:`__getitem__`.
+
+Some simple format string examples::
+
+   "First, thou shalt count to {0}" # References first positional argument
+   "My quest is {name}"             # References keyword argument 'name'
+   "Weight in tons {0.weight}"      # 'weight' attribute of first positional arg
+   "Units destroyed: {players[0]}"  # First element of keyword argument 'players'.
+   
+The *conversion* field causes a type coercion before formatting.  Normally, the
+job of formatting a value is done by the :meth:`__format__` method of the value
+itself.  However, in some cases it is desirable to force a type to be formatted
+as a string, overriding its own definition of formatting.  By converting the
+value to a string before calling :meth:`__format__`, the normal formatting logic
+is bypassed.
+
+Two conversion flags are currently supported: ``'!s'`` which calls :func:`str()`
+on the value, and ``'!r'`` which calls :func:`repr()`.
+
+Some examples::
+
+   "Harold's a clever {0!s}"        # Calls str() on the argument first
+   "Bring out the holy {name!r}"    # Calls repr() on the argument first
+
+The *format_spec* field contains a specification of how the value should be
+presented, including such details as field width, alignment, padding, decimal
+precision and so on.  Each value type can define it's own "formatting
+mini-language" or interpretation of the *format_spec*.
+
+Most built-in types support a common formatting mini-language, which is
+described in the next section.
+
+A *format_spec* field can also include nested replacement fields within it.
+These nested replacement fields can contain only a field name; conversion flags
+and format specifications are not allowed.  The replacement fields within the
+format_spec are substituted before the *format_spec* string is interpreted.
+This allows the formatting of a value to be dynamically specified.
+
+For example, suppose you wanted to have a replacement field whose field width is
+determined by another variable::
+
+   "A man with two {0:{1}}".format("noses", 10)
+
+This would first evaluate the inner replacement field, making the format string
+effectively::
+
+   "A man with two {0:10}"
+
+Then the outer replacement field would be evaluated, producing::
+
+   "noses     "
+   
+Which is subsitituted into the string, yielding::
+   
+   "A man with two noses     "
+   
+(The extra space is because we specified a field width of 10, and because left
+alignment is the default for strings.)
+
+.. versionadded:: 3.0
+
+.. _formatspec:
+
+Format Specification Mini-Language
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+"Format specifications" are used within replacement fields contained within a
+format string to define how individual values are presented (see
+:ref:`formatstrings`.)  They can also be passed directly to the builtin
+:func:`format` function.  Each formattable type may define how the format
+specification is to be interpreted.
+
+Most built-in types implement the following options for format specifications,
+although some of the formatting options are only supported by the numeric types.
+
+A general convention is that an empty format string (``""``) produces the same
+result as if you had called :func:`str()` on the value.
+
+The general form of a *standard format specifier* is:
+
+.. productionlist:: sf
+   format_spec: [[`fill`]`align`][`sign`][0][`width`][.`precision`][`type`]
+   fill: <a character other than '}'>
+   align: "<" | ">" | "=" | "^"
+   sign: "+" | "-" | " "
+   width: `integer`
+   precision: `integer`
+   type: "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "x" | "X" | "%"
+   
+The *fill* character can be any character other than '}' (which signifies the
+end of the field).  The presence of a fill character is signaled by the *next*
+character, which must be one of the alignment options. If the second character
+of *format_spec* is not a valid alignment option, then it is assumed that both
+the fill character and the alignment option are absent.
+
+The meaning of the various alignment options is as follows:
+
+   +---------+----------------------------------------------------------+
+   | Option  | Meaning                                                  |
+   +=========+==========================================================+
+   | ``'<'`` | Forces the field to be left-aligned within the available |
+   |         | space (This is the default.)                             |
+   +---------+----------------------------------------------------------+
+   | ``'>'`` | Forces the field to be right-aligned within the          |
+   |         | available space.                                         |
+   +---------+----------------------------------------------------------+
+   | ``'='`` | Forces the padding to be placed after the sign (if any)  |
+   |         | but before the digits.  This is used for printing fields |
+   |         | in the form '+000000120'. This alignment option is only  |
+   |         | valid for numeric types.                                 |
+   +---------+----------------------------------------------------------+
+   | ``'^'`` | Forces the field to be centered within the available     |
+   |         | space.                                                   |
+   +---------+----------------------------------------------------------+
+
+Note that unless a minimum field width is defined, the field width will always
+be the same size as the data to fill it, so that the alignment option has no
+meaning in this case.
+
+The *sign* option is only valid for number types, and can be one of the
+following:
+
+   +---------+----------------------------------------------------------+
+   | Option  | Meaning                                                  |
+   +=========+==========================================================+
+   | ``'+'`` | indicates that a sign should be used for both            |
+   |         | positive as well as negative numbers.                    |
+   +---------+----------------------------------------------------------+
+   | ``'-'`` | indicates that a sign should be used only for negative   |
+   |         | numbers (this is the default behavior).                  |
+   +---------+----------------------------------------------------------+
+   | space   | indicates that a leading space should be used on         |
+   |         | positive numbers, and a minus sign on negative numbers.  |
+   +---------+----------------------------------------------------------+
+
+*width* is a decimal integer defining the minimum field width.  If not
+specified, then the field width will be determined by the content.
+
+If the *width* field is preceded by a zero (``'0'``) character, this enables
+zero-padding.  This is equivalent to an *alignment* type of ``'='`` and a *fill*
+character of ``'0'``.
+
+The *precision* is a decimal number indicating how many digits should be
+displayed after the decimal point for a floating point value.  For non-number
+types the field indicates the maximum field size - in other words, how many
+characters will be used from the field content. The *precision* is ignored for
+integer values.
+
+Finally, the *type* determines how the data should be presented.
+
+The available integer presentation types are:
+
+   +---------+----------------------------------------------------------+
+   | Type    | Meaning                                                  |
+   +=========+==========================================================+
+   | ``'b'`` | Binary. Outputs the number in base 2.                    |
+   +---------+----------------------------------------------------------+
+   | ``'c'`` | Character. Converts the integer to the corresponding     |
+   |         | unicode character before printing.                       |
+   +---------+----------------------------------------------------------+
+   | ``'d'`` | Decimal Integer. Outputs the number in base 10.          |
+   +---------+----------------------------------------------------------+
+   | ``'o'`` | Octal format. Outputs the number in base 8.              |
+   +---------+----------------------------------------------------------+
+   | ``'x'`` | Hex format. Outputs the number in base 16, using lower-  |
+   |         | case letters for the digits above 9.                     |
+   +---------+----------------------------------------------------------+
+   | ``'X'`` | Hex format. Outputs the number in base 16, using upper-  |
+   |         | case letters for the digits above 9.                     |
+   +---------+----------------------------------------------------------+
+   | None    | the same as ``'d'``                                      |
+   +---------+----------------------------------------------------------+
+                                                                         
+The available presentation types for floating point and decimal values are:
+                                                                         
+   +---------+----------------------------------------------------------+
+   | Type    | Meaning                                                  |
+   +=========+==========================================================+
+   | ``'e'`` | Exponent notation. Prints the number in scientific       |
+   |         | notation using the letter 'e' to indicate the exponent.  |
+   +---------+----------------------------------------------------------+
+   | ``'E'`` | Exponent notation. Same as ``'e'`` except it uses an     |
+   |         | upper case 'E' as the separator character.               |
+   +---------+----------------------------------------------------------+
+   | ``'f'`` | Fixed point. Displays the number as a fixed-point        |
+   |         | number.                                                  |
+   +---------+----------------------------------------------------------+
+   | ``'F'`` | Fixed point. Same as ``'f'``.                            |
+   +---------+----------------------------------------------------------+
+   | ``'g'`` | General format. This prints the number as a fixed-point  |
+   |         | number, unless the number is too large, in which case    |
+   |         | it switches to ``'e'`` exponent notation.                |
+   +---------+----------------------------------------------------------+
+   | ``'G'`` | General format. Same as ``'g'`` except switches to       |
+   |         | ``'E'`` if the number gets to large.                     |
+   +---------+----------------------------------------------------------+
+   | ``'n'`` | Number. This is the same as ``'g'``, except that it uses |
+   |         | the current locale setting to insert the appropriate     |
+   |         | number separator characters.                             |
+   +---------+----------------------------------------------------------+
+   | ``'%'`` | Percentage. Multiplies the number by 100 and displays    |
+   |         | in fixed (``'f'``) format, followed by a percent sign.   |
+   +---------+----------------------------------------------------------+
+   | None    | similar to ``'g'``, except that it prints at least one   |
+   |         | digit after the decimal point.                           |
+   +---------+----------------------------------------------------------+
+
+
+.. _template-strings:
+
 Template strings
 ----------------

@@ -208,6 +554,7 @@ They are not available as string methods.
   leading and trailing whitespace.


+.. XXX is obsolete with unicode.translate
 .. function:: maketrans(from, to)

   Return a translation table suitable for passing to :func:`translate`, that will
@@ -219,250 +566,3 @@ They are not available as string methods.
      Don't use strings derived from :const:`lowercase` and :const:`uppercase` as
      arguments; in some locales, these don't have the same length.  For case
      conversions, always use :func:`lower` and :func:`upper`.
-
-
-Deprecated string functions
---------------------------
-
-The following list of functions are also defined as methods of string and
-Unicode objects; see section :ref:`string-methods` for more information on
-those.  You should consider these functions as deprecated, although they will
-not be removed until Python 3.0.  The functions defined in this module are:
-
-
-.. function:: atof(s)
-
-   .. deprecated:: 2.0
-      Use the :func:`float` built-in function.
-
-   .. index:: builtin: float
-
-   Convert a string to a floating point number.  The string must have the standard
-   syntax for a floating point literal in Python, optionally preceded by a sign
-   (``+`` or ``-``).  Note that this behaves identical to the built-in function
-   :func:`float` when passed a string.
-
-   .. note::
-
-      .. index::
-         single: NaN
-         single: Infinity
-
-      When passing in a string, values for NaN and Infinity may be returned, depending
-      on the underlying C library.  The specific set of strings accepted which cause
-      these values to be returned depends entirely on the C library and is known to
-      vary.
-
-
-.. function:: atoi(s[, base])
-
-   .. deprecated:: 2.0
-      Use the :func:`int` built-in function.
-
-   .. index:: builtin: eval
-
-   Convert string *s* to an integer in the given *base*.  The string must consist
-   of one or more digits, optionally preceded by a sign (``+`` or ``-``).  The
-   *base* defaults to 10.  If it is 0, a default base is chosen depending on the
-   leading characters of the string (after stripping the sign): ``0x`` or ``0X``
-   means 16, ``0`` means 8, anything else means 10.  If *base* is 16, a leading
-   ``0x`` or ``0X`` is always accepted, though not required.  This behaves
-   identically to the built-in function :func:`int` when passed a string.  (Also
-   note: for a more flexible interpretation of numeric literals, use the built-in
-   function :func:`eval`.)
-
-
-.. function:: atol(s[, base])
-
-   .. deprecated:: 2.0
-      Use the :func:`long` built-in function.
-
-   .. index:: builtin: long
-
-   Convert string *s* to a long integer in the given *base*. The string must
-   consist of one or more digits, optionally preceded by a sign (``+`` or ``-``).
-   The *base* argument has the same meaning as for :func:`atoi`.  A trailing ``l``
-   or ``L`` is not allowed, except if the base is 0.  Note that when invoked
-   without *base* or with *base* set to 10, this behaves identical to the built-in
-   function :func:`long` when passed a string.
-
-
-.. function:: capitalize(word)
-
-   Return a copy of *word* with only its first character capitalized.
-
-
-.. function:: expandtabs(s[, tabsize])
-
-   Expand tabs in a string replacing them by one or more spaces, depending on the
-   current column and the given tab size.  The column number is reset to zero after
-   each newline occurring in the string. This doesn't understand other non-printing
-   characters or escape sequences.  The tab size defaults to 8.
-
-
-.. function:: find(s, sub[, start[,end]])
-
-   Return the lowest index in *s* where the substring *sub* is found such that
-   *sub* is wholly contained in ``s[start:end]``.  Return ``-1`` on failure.
-   Defaults for *start* and *end* and interpretation of negative values is the same
-   as for slices.
-
-
-.. function:: rfind(s, sub[, start[, end]])
-
-   Like :func:`find` but find the highest index.
-
-
-.. function:: index(s, sub[, start[, end]])
-
-   Like :func:`find` but raise :exc:`ValueError` when the substring is not found.
-
-
-.. function:: rindex(s, sub[, start[, end]])
-
-   Like :func:`rfind` but raise :exc:`ValueError` when the substring is not found.
-
-
-.. function:: count(s, sub[, start[, end]])
-
-   Return the number of (non-overlapping) occurrences of substring *sub* in string
-   ``s[start:end]``. Defaults for *start* and *end* and interpretation of negative
-   values are the same as for slices.
-
-
-.. function:: lower(s)
-
-   Return a copy of *s*, but with upper case letters converted to lower case.
-
-
-.. function:: split(s[, sep[, maxsplit]])
-
-   Return a list of the words of the string *s*.  If the optional second argument
-   *sep* is absent or ``None``, the words are separated by arbitrary strings of
-   whitespace characters (space, tab,  newline, return, formfeed).  If the second
-   argument *sep* is present and not ``None``, it specifies a string to be used as
-   the  word separator.  The returned list will then have one more item than the
-   number of non-overlapping occurrences of the separator in the string.  The
-   optional third argument *maxsplit* defaults to 0.  If it is nonzero, at most
-   *maxsplit* number of splits occur, and the remainder of the string is returned
-   as the final element of the list (thus, the list will have at most
-   ``maxsplit+1`` elements).
-
-   The behavior of split on an empty string depends on the value of *sep*. If *sep*
-   is not specified, or specified as ``None``, the result will be an empty list.
-   If *sep* is specified as any string, the result will be a list containing one
-   element which is an empty string.
-
-
-.. function:: rsplit(s[, sep[, maxsplit]])
-
-   Return a list of the words of the string *s*, scanning *s* from the end.  To all
-   intents and purposes, the resulting list of words is the same as returned by
-   :func:`split`, except when the optional third argument *maxsplit* is explicitly
-   specified and nonzero.  When *maxsplit* is nonzero, at most *maxsplit* number of
-   splits -- the *rightmost* ones -- occur, and the remainder of the string is
-   returned as the first element of the list (thus, the list will have at most
-   ``maxsplit+1`` elements).
-
-   .. versionadded:: 2.4
-
-
-.. function:: splitfields(s[, sep[, maxsplit]])
-
-   This function behaves identically to :func:`split`.  (In the past, :func:`split`
-   was only used with one argument, while :func:`splitfields` was only used with
-   two arguments.)
-
-
-.. function:: join(words[, sep])
-
-   Concatenate a list or tuple of words with intervening occurrences of  *sep*.
-   The default value for *sep* is a single space character.  It is always true that
-   ``string.join(string.split(s, sep), sep)`` equals *s*.
-
-
-.. function:: joinfields(words[, sep])
-
-   This function behaves identically to :func:`join`.  (In the past,  :func:`join`
-   was only used with one argument, while :func:`joinfields` was only used with two
-   arguments.) Note that there is no :meth:`joinfields` method on string objects;
-   use the :meth:`join` method instead.
-
-
-.. function:: lstrip(s[, chars])
-
-   Return a copy of the string with leading characters removed.  If *chars* is
-   omitted or ``None``, whitespace characters are removed.  If given and not
-   ``None``, *chars* must be a string; the characters in the string will be
-   stripped from the beginning of the string this method is called on.
-
-   .. versionchanged:: 2.2.3
-      The *chars* parameter was added.  The *chars* parameter cannot be passed in
-      earlier 2.2 versions.
-
-
-.. function:: rstrip(s[, chars])
-
-   Return a copy of the string with trailing characters removed.  If *chars* is
-   omitted or ``None``, whitespace characters are removed.  If given and not
-   ``None``, *chars* must be a string; the characters in the string will be
-   stripped from the end of the string this method is called on.
-
-   .. versionchanged:: 2.2.3
-      The *chars* parameter was added.  The *chars* parameter cannot be passed in
-      earlier 2.2 versions.
-
-
-.. function:: strip(s[, chars])
-
-   Return a copy of the string with leading and trailing characters removed.  If
-   *chars* is omitted or ``None``, whitespace characters are removed.  If given and
-   not ``None``, *chars* must be a string; the characters in the string will be
-   stripped from the both ends of the string this method is called on.
-
-   .. versionchanged:: 2.2.3
-      The *chars* parameter was added.  The *chars* parameter cannot be passed in
-      earlier 2.2 versions.
-
-
-.. function:: swapcase(s)
-
-   Return a copy of *s*, but with lower case letters converted to upper case and
-   vice versa.
-
-
-.. function:: translate(s, table[, deletechars])
-
-   Delete all characters from *s* that are in *deletechars* (if  present), and then
-   translate the characters using *table*, which  must be a 256-character string
-   giving the translation for each character value, indexed by its ordinal.  If
-   *table* is ``None``, then only the character deletion step is performed.
-
-
-.. function:: upper(s)
-
-   Return a copy of *s*, but with lower case letters converted to upper case.
-
-
-.. function:: ljust(s, width)
-              rjust(s, width)
-              center(s, width)
-
-   These functions respectively left-justify, right-justify and center a string in
-   a field of given width.  They return a string that is at least *width*
-   characters wide, created by padding the string *s* with spaces until the given
-   width on the right, left or both sides.  The string is never truncated.
-
-
-.. function:: zfill(s, width)
-
-   Pad a numeric string on the left with zero digits until the given width is
-   reached.  Strings starting with a sign are handled correctly.
-
-
-.. function:: replace(str, old, new[, maxreplace])
-
-   Return a copy of string *str* with all occurrences of substring *old* replaced
-   by *new*.  If the optional argument *maxreplace* is given, the first
-   *maxreplace* occurrences are replaced.
-
--- a/Doc/library/strings.rst
+++ b/Doc/library/strings.rst
@@ -8,12 +8,11 @@ String Services
 The modules described in this chapter provide a wide range of string
 manipulation operations.

-In addition, Python's built-in string classes support the sequence type
-methods described in the :ref:`typesseq` section, and also the
-string-specific methods described in the :ref:`string-methods` section.
-To output formatted strings use template strings or the ``%`` operator
-described in the :ref:`string-formatting` section. Also, see the
-:mod:`re` module for string functions based on regular expressions.
+In addition, Python's built-in string classes support the sequence type methods
+described in the :ref:`typesseq` section, and also the string-specific methods
+described in the :ref:`string-methods` section.  To output formatted strings,
+see the :ref:`string-formatting` section. Also, see the :mod:`re` module for
+string functions based on regular expressions.


 .. toctree::

--- a/Doc/reference/datamodel.rst
+++ b/Doc/reference/datamodel.rst
@@ -1279,15 +1279,36 @@ Basic customization

   .. index::
      builtin: str
-      statement: print
+      builtin: print

-   Called by the :func:`str` built-in function and by the :keyword:`print`
-   statement to compute the "informal" string representation of an object.  This
+   Called by the :func:`str` built-in function and by the :func:`print`
+   function to compute the "informal" string representation of an object.  This
   differs from :meth:`__repr__` in that it does not have to be a valid Python
   expression: a more convenient or concise representation may be used instead.
   The return value must be a string object.


+.. method:: object.__format__(self, format_spec)
+
+   .. index::
+      pair: string; conversion
+      builtin: str
+      builtin: print
+
+   Called by the :func:`format` built-in function (and by extension, the
+   :meth:`format` method of class :class:`str`) to produce a "formatted"
+   string representation of an object. The ``format_spec`` argument is
+   a string that contains a description of the formatting options desired.
+   The interpretation of the ``format_spec`` argument is up to the type
+   implementing :meth:`__format__`, however most classes will either
+   delegate formatting to one of the built-in types, or use a similar
+   formatting option syntax.
+   
+   See :ref:`formatspec` for a description of the standard formatting syntax.
+
+   The return value must be a string object.
+
+
 .. method:: object.__lt__(self, other)
            object.__le__(self, other)
            object.__eq__(self, other)

--- a/Doc/reference/expressions.rst
+++ b/Doc/reference/expressions.rst
@@ -5,12 +5,10 @@
 Expressions
 ***********

-.. index:: single: expression
+.. index:: expression, BNF

 This chapter explains the meaning of the elements of expressions in Python.

-.. index:: single: BNF
-
 **Syntax Notes:** In this and the following chapters, extended BNF notation will
 be used to describe syntax, not lexical analysis.  When (one alternative of) a
 syntax rule has the form
@@ -18,8 +16,6 @@ syntax rule has the form
 .. productionlist:: *
   name: `othername`

-.. index:: single: syntax
-
 and no semantics are given, the semantics of this form of ``name`` are the same
 as for ``othername``.

@@ -852,9 +848,9 @@ identities hold approximately where ``x/y`` is replaced by ``floor(x/y)`` or
 ``floor(x/y) - 1`` [#]_.

 In addition to performing the modulo operation on numbers, the ``%`` operator is
-also overloaded by string and unicode objects to perform string formatting (also
+also overloaded by string objects to perform string formatting (also
 known as interpolation). The syntax for string formatting is described in the
-Python Library Reference, section :ref:`string-formatting`.
+Python Library Reference, section :ref:`old-string-formatting`.

 The floor division operator, the modulo operator, and the :func:`divmod`
 function are not defined for complex numbers.  Instead, convert to a
@@ -985,9 +981,12 @@ Comparison of objects of the same type depends on the type:

 * Numbers are compared arithmetically.

+* Bytes objects are compared lexicographically using the numeric values of
+  their elements.
+
 * Strings are compared lexicographically using the numeric equivalents (the
-  result of the built-in function :func:`ord`) of their characters.  Unicode and
-  8-bit strings are fully interoperable in this behavior. [#]_
+  result of the built-in function :func:`ord`) of their characters. [#]_
+  String and bytes object can't be compared!

 * Tuples and lists are compared lexicographically using comparison of
  corresponding elements.  This means that to compare equal, each element must
@@ -1020,11 +1019,10 @@ particular, dictionaries support membership testing as a nicer way of spelling
 For the list and tuple types, ``x in y`` is true if and only if there exists an
 index *i* such that ``x == y[i]`` is true.

-For the Unicode and string types, ``x in y`` is true if and only if *x* is a
-substring of *y*.  An equivalent test is ``y.find(x) != -1``.  Note, *x* and *y*
-need not be the same type; consequently, ``u'ab' in 'abc'`` will return
-``True``. Empty strings are always considered to be a substring of any other
-string, so ``"" in "abc"`` will return ``True``.
+For the string and bytes types, ``x in y`` is true if and only if *x* is a
+substring of *y*.  An equivalent test is ``y.find(x) != -1``.  Empty strings are
+always considered to be a substring of any other string, so ``"" in "abc"`` will
+return ``True``.

 .. versionchanged:: 2.3
   Previously, *x* was required to be a string of length ``1``.
@@ -1272,7 +1270,7 @@ groups from right to left).
   cases, Python returns the latter result, in order to preserve that
   ``divmod(x,y)[0] * y + x % y`` be very close to ``x``.

-.. [#] While comparisons between unicode strings make sense at the byte
+.. [#] While comparisons between strings make sense at the byte
   level, they may be counter-intuitive to users. For example, the
   strings ``u"\u00C7"`` and ``u"\u0327\u0043"`` compare differently,
   even though they both represent the same unicode character (LATIN

--- a/Doc/tutorial/introduction.rst
+++ b/Doc/tutorial/introduction.rst
@@ -399,8 +399,8 @@ The built-in function :func:`len` returns the length of a string::
      basic transformations and searching.

   :ref:`string-formatting`
-      The formatting operations invoked when strings are the
-      left operand of the ``%`` operator are described in more detail here.
+      The formatting operations invoked by the :meth:`format` string method are
+      described in more detail here.


 .. _tut-unicodestrings: