Commit ddbdc9a6 authored by Georg Brandl's avatar Georg Brandl

Closes #15956: improve documentation of named groups and how to reference them.

parent bcc55d69
...@@ -237,21 +237,32 @@ The special characters are: ...@@ -237,21 +237,32 @@ The special characters are:
``(?P<name>...)`` ``(?P<name>...)``
Similar to regular parentheses, but the substring matched by the group is Similar to regular parentheses, but the substring matched by the group is
accessible within the rest of the regular expression via the symbolic group accessible via the symbolic group name *name*. Group names must be valid
name *name*. Group names must be valid Python identifiers, and each group Python identifiers, and each group name must be defined only once within a
name must be defined only once within a regular expression. A symbolic group regular expression. A symbolic group is also a numbered group, just as if
is also a numbered group, just as if the group were not named. So the group the group were not named.
named ``id`` in the example below can also be referenced as the numbered group
``1``. Named groups can be referenced in three contexts. If the pattern is
``(?P<quote>['"]).*?(?P=quote)`` (i.e. matching a string quoted with either
For example, if the pattern is ``(?P<id>[a-zA-Z_]\w*)``, the group can be single or double quotes):
referenced by its name in arguments to methods of match objects, such as
``m.group('id')`` or ``m.end('id')``, and also by name in the regular +---------------------------------------+----------------------------------+
expression itself (using ``(?P=id)``) and replacement text given to | Context of reference to group "quote" | Ways to reference it |
``.sub()`` (using ``\g<id>``). +=======================================+==================================+
| in the same pattern itself | * ``(?P=quote)`` (as shown) |
| | * ``\1`` |
+---------------------------------------+----------------------------------+
| when processing match object ``m`` | * ``m.group('quote')`` |
| | * ``m.end('quote')`` (etc.) |
+---------------------------------------+----------------------------------+
| in a string passed to the ``repl`` | * ``\g<quote>`` |
| argument of ``re.sub()`` | * ``\g<1>`` |
| | * ``\1`` |
+---------------------------------------+----------------------------------+
``(?P=name)`` ``(?P=name)``
Matches whatever text was matched by the earlier group named *name*. A backreference to a named group; it matches whatever text was matched by the
earlier group named *name*.
``(?#...)`` ``(?#...)``
A comment; the contents of the parentheses are simply ignored. A comment; the contents of the parentheses are simply ignored.
...@@ -331,7 +342,8 @@ the second character. For example, ``\$`` matches the character ``'$'``. ...@@ -331,7 +342,8 @@ the second character. For example, ``\$`` matches the character ``'$'``.
depends on the values of the ``UNICODE`` and ``LOCALE`` flags. depends on the values of the ``UNICODE`` and ``LOCALE`` flags.
For example, ``r'\bfoo\b'`` matches ``'foo'``, ``'foo.'``, ``'(foo)'``, For example, ``r'\bfoo\b'`` matches ``'foo'``, ``'foo.'``, ``'(foo)'``,
``'bar foo baz'`` but not ``'foobar'`` or ``'foo3'``. ``'bar foo baz'`` but not ``'foobar'`` or ``'foo3'``.
Inside a character range, ``\b`` represents the backspace character, for compatibility with Python's string literals. Inside a character range, ``\b`` represents the backspace character, for
compatibility with Python's string literals.
``\B`` ``\B``
Matches the empty string, but only when it is *not* at the beginning or end of a Matches the empty string, but only when it is *not* at the beginning or end of a
...@@ -642,7 +654,8 @@ form. ...@@ -642,7 +654,8 @@ form.
when not adjacent to a previous match, so ``sub('x*', '-', 'abc')`` returns when not adjacent to a previous match, so ``sub('x*', '-', 'abc')`` returns
``'-a-b-c-'``. ``'-a-b-c-'``.
In addition to character escapes and backreferences as described above, In string-type *repl* arguments, in addition to the character escapes and
backreferences described above,
``\g<name>`` will use the substring matched by the group named ``name``, as ``\g<name>`` will use the substring matched by the group named ``name``, as
defined by the ``(?P<name>...)`` syntax. ``\g<number>`` uses the corresponding defined by the ``(?P<name>...)`` syntax. ``\g<number>`` uses the corresponding
group number; ``\g<2>`` is therefore equivalent to ``\2``, but isn't ambiguous group number; ``\g<2>`` is therefore equivalent to ``\2``, but isn't ambiguous
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment