Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
15c1fe50
Commit
15c1fe50
authored
Jan 29, 2007
by
Andrew M. Kuchling
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
More edits
parent
5781dd2d
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
27 additions
and
27 deletions
+27
-27
Doc/howto/regex.tex
Doc/howto/regex.tex
+27
-27
No files found.
Doc/howto/regex.tex
View file @
15c1fe50
...
...
@@ -927,15 +927,15 @@ Now that we've looked at the general extension syntax, we can return
to the features that simplify working with groups in complex REs.
Since groups are numbered from left to right and a complex expression
may use many groups, it can become difficult to keep track of the
correct numbering
, and modifying such a complex RE is annoying.
Insert a new group near the beginning,
and you change the numbers of
correct numbering
. Modifying such a complex RE is annoying, too:
insert a new group near the beginning
and you change the numbers of
everything that follows it.
First, sometimes you'll want to use a group to collect a part of a
regular expression, but aren't interested in retrieving the group's
contents. You can make this fact explicit by using a non-capturing
group:
\regexp
{
(?:...)
}
, where you can put any other regular
expression inside the parentheses.
Sometimes you'll want to use a group to collect a part of a regular
expression, but aren't interested in retrieving the group's contents.
You can make this fact explicit by using a non-capturing group:
\regexp
{
(?:...)
}
, where you can replace the
\regexp
{
...
}
with any other regular expression.
\begin{verbatim}
>>> m = re.match("([abc])+", "abc")
...
...
@@ -951,23 +951,23 @@ group matched, a non-capturing group behaves exactly the same as a
capturing group; you can put anything inside it, repeat it with a
repetition metacharacter such as
\samp
{
*
}
, and nest it within other
groups (capturing or non-capturing).
\regexp
{
(?:...)
}
is particularly
useful when modifying an existing
group
, since you can add new groups
useful when modifying an existing
pattern
, since you can add new groups
without changing how all the other groups are numbered. It should be
mentioned that there's no performance difference in searching between
capturing and non-capturing groups; neither form is any faster than
the other.
The second, and more significant, feature is named groups;
instead of
A more significant feature is named groups:
instead of
referring to them by numbers, groups can be referenced by a name.
The syntax for a named group is one of the Python-specific extensions:
\regexp
{
(?P<
\var
{
name
}
>...)
}
.
\var
{
name
}
is, obviously, the name of
the group.
Except for associating a name with a group, named groups
a
lso behave identically to capturing groups. The
\class
{
MatchObject
}
methods that deal with capturing groups all accept either integers, to
refer to groups by number, or a string containing the group name.
Named groups are still given numbers, so you can retrieve informatio
n
about a group in two ways:
the group.
Named groups also behave exactly like capturing groups,
a
nd additionally associate a name with a group. The
\class
{
MatchObject
}
methods that deal with capturing groups all accept
either integers that refer to the group by number or strings that
contain the desired group's name. Named groups are still give
n
numbers, so you can retrieve information
about a group in two ways:
\begin{verbatim}
>>> p = re.compile(r'(?P<word>
\b\w
+
\b
)')
...
...
@@ -994,11 +994,11 @@ InternalDate = re.compile(r'INTERNALDATE "'
It's obviously much easier to retrieve
\code
{
m.group('zonem')
}
,
instead of having to remember to retrieve group 9.
Since the syntax for backreferences, in an expression like
\regexp
{
(...)
\e
1
}
, refers to the number of the group t
here's
The syntax for backreferences in an expression such as
\regexp
{
(...)
\e
1
}
refers to the number of the group. T
here's
naturally a variant that uses the group name instead of the number.
This is a
lso a
Python extension:
\regexp
{
(?P=
\var
{
name
}
)
}
indicates
that the contents of the group called
\var
{
name
}
should again be
foun
d
This is a
nother
Python extension:
\regexp
{
(?P=
\var
{
name
}
)
}
indicates
that the contents of the group called
\var
{
name
}
should again be
matche
d
at the current point. The regular expression for finding doubled
words,
\regexp
{
(
\e
b
\e
w+)
\e
s+
\e
1
}
can also be written as
\regexp
{
(?P<word>
\e
b
\e
w+)
\e
s+(?P=word)
}
:
...
...
@@ -1028,11 +1028,11 @@ opposite of the positive assertion; it succeeds if the contained expression
\emph
{
doesn't
}
match at the current position in the string.
\end{itemize}
An example will help make this concrete by demonstrating a case
where a lookahead is useful. Consider a simple pattern to match a
filename and split it apart into a base name and an extension,
separated by a
\samp
{
.
}
. For example, in
\samp
{
news.rc
}
,
\samp
{
news
}
is the base name, and
\samp
{
rc
}
is the filename's extension.
To make this concrete, let's look at a case where a lookahead is
useful. Consider a simple pattern to match a filename and split it
apart into a base name and an extension, separated by a
\samp
{
.
}
. For
example, in
\samp
{
news.rc
}
,
\samp
{
news
}
is the base name, and
\samp
{
rc
}
is the filename's extension.
The pattern to match this is quite simple:
...
...
@@ -1079,12 +1079,12 @@ read and understand. Worse, if the problem changes and you want to
exclude both
\samp
{
bat
}
and
\samp
{
exe
}
as extensions, the pattern
would get even more complicated and confusing.
A negative lookahead cuts through all this:
A negative lookahead cuts through all this
confusion
:
\regexp
{
.*[.](?!bat
\$
).*
\$
}
% $
The lookahead means: if the expression
\regexp
{
bat
}
doesn't match at
The
negative
lookahead means: if the expression
\regexp
{
bat
}
doesn't match at
this point, try the rest of the pattern; if
\regexp
{
bat
\$
}
does match,
the whole pattern will fail. The trailing
\regexp
{
\$
}
is required to
ensure that something like
\samp
{
sample.batch
}
, where the extension
...
...
@@ -1101,7 +1101,7 @@ filenames that end in either \samp{bat} or \samp{exe}:
\section
{
Modifying Strings
}
Up to this point, we've simply performed searches against a static
string. Regular expressions are also commonly used to modify
a string
string. Regular expressions are also commonly used to modify
strings
in various ways, using the following
\class
{
RegexObject
}
methods:
\begin{tableii}
{
c|l
}{
code
}{
Method/Attribute
}{
Purpose
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment