Commit a2f837f7 authored by Benjamin Peterson's avatar Benjamin Peterson

Document the fact that '\U' and '\u' escapes are not treated specially in 3.0 (see issue 2541)

parent a288faef
...@@ -423,8 +423,9 @@ characters that otherwise have a special meaning, such as newline, backslash ...@@ -423,8 +423,9 @@ characters that otherwise have a special meaning, such as newline, backslash
itself, or the quote character. itself, or the quote character.
String literals may optionally be prefixed with a letter ``'r'`` or ``'R'``; String literals may optionally be prefixed with a letter ``'r'`` or ``'R'``;
such strings are called :dfn:`raw strings` and use different rules for such strings are called :dfn:`raw strings` and treat backslashes as literal
interpreting backslash escape sequences. characters. As a result, ``'\U'`` and ``'\u'`` escapes in raw strings are not
treated specially.
Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an
instance of the :class:`bytes` type instead of the :class:`str` type. They instance of the :class:`bytes` type instead of the :class:`str` type. They
...@@ -520,15 +521,6 @@ is more easily recognized as broken.) It is also important to note that the ...@@ -520,15 +521,6 @@ is more easily recognized as broken.) It is also important to note that the
escape sequences only recognized in string literals fall into the category of escape sequences only recognized in string literals fall into the category of
unrecognized escapes for bytes literals. unrecognized escapes for bytes literals.
When an ``'r'`` or ``'R'`` prefix is used in a string literal, then the
``\uXXXX`` and ``\UXXXXXXXX`` escape sequences are processed while *all other
backslashes are left in the string*. For example, the string literal
``r"\u0062\n"`` consists of three Unicode characters: 'LATIN SMALL LETTER B',
'REVERSE SOLIDUS', and 'LATIN SMALL LETTER N'. Backslashes can be escaped with a
preceding backslash; however, both remain in the string. As a result,
``\uXXXX`` escape sequences are only recognized when there is an odd number of
backslashes.
Even in a raw string, string quotes can be escaped with a backslash, but the Even in a raw string, string quotes can be escaped with a backslash, but the
backslash remains in the string; for example, ``r"\""`` is a valid string backslash remains in the string; for example, ``r"\""`` is a valid string
literal consisting of two characters: a backslash and a double quote; ``r"\"`` literal consisting of two characters: a backslash and a double quote; ``r"\"``
......
...@@ -167,6 +167,9 @@ Strings and Bytes ...@@ -167,6 +167,9 @@ Strings and Bytes
explicitly convert between them, using the :meth:`str.encode` (str -> bytes) explicitly convert between them, using the :meth:`str.encode` (str -> bytes)
or :meth:`bytes.decode` (bytes -> str) methods. or :meth:`bytes.decode` (bytes -> str) methods.
* All backslashes in raw strings are interpreted literally. This means that
Unicode escapes are not treated specially.
.. XXX add bytearray .. XXX add bytearray
* PEP 3112: Bytes literals, e.g. ``b"abc"``, create :class:`bytes` instances. * PEP 3112: Bytes literals, e.g. ``b"abc"``, create :class:`bytes` instances.
...@@ -183,6 +186,8 @@ Strings and Bytes ...@@ -183,6 +186,8 @@ Strings and Bytes
* The :mod:`StringIO` and :mod:`cStringIO` modules are gone. Instead, import * The :mod:`StringIO` and :mod:`cStringIO` modules are gone. Instead, import
:class:`io.StringIO` or :class:`io.BytesIO`. :class:`io.StringIO` or :class:`io.BytesIO`.
* ``'\U'`` and ``'\u'`` escapes in raw strings are not treated specially.
PEP 3101: A New Approach to String Formatting PEP 3101: A New Approach to String Formatting
============================================= =============================================
......
...@@ -26,6 +26,9 @@ Core and Builtins ...@@ -26,6 +26,9 @@ Core and Builtins
through as unmodified as possible; as a consequence, the C API through as unmodified as possible; as a consequence, the C API
related to command line arguments was changed to use wchar_t. related to command line arguments was changed to use wchar_t.
- All backslashes in raw strings are interpreted literally. This means that
'\u' and '\U' escapes are not treated specially.
Extension Modules Extension Modules
----------------- -----------------
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment