Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
0b8123d8
Commit
0b8123d8
authored
Feb 29, 2012
by
Ezio Melotti
Browse files
Options
Browse Files
Download
Plain Diff
#10713: merge with 3.2.
parents
7b5649cd
5a045b9f
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
40 additions
and
8 deletions
+40
-8
Doc/library/re.rst
Doc/library/re.rst
+14
-8
Lib/test/test_re.py
Lib/test/test_re.py
+26
-0
No files found.
Doc/library/re.rst
View file @
0b8123d8
...
...
@@ -330,16 +330,22 @@ the second character. For example, ``\$`` matches the character ``'$'``.
Matches the empty string, but only at the beginning or end of a word.
A word is defined as a sequence of Unicode alphanumeric or underscore
characters, so the end of a word is indicated by whitespace or a
non-alphanumeric, non-underscore Unicode character. Note that
formally, ``\b`` is defined as the boundary between a ``\w`` and a
``\W`` character (or vice versa). By default Unicode alphanumerics
are the ones used, but this can be changed by using the :const:`ASCII`
flag. Inside a character range, ``\b`` represents the backspace
character, for compatibility with Python's string literals.
non-alphanumeric, non-underscore Unicode character. Note that formally,
``\b`` is defined as the boundary between a ``\w`` and a ``\W`` character
(or vice versa), or between ``\w`` and the beginning/end of the string.
This means that ``r'\bfoo\b'`` matches ``'foo'``, ``'foo.'``, ``'(foo)'``,
``'bar foo baz'`` but not ``'foobar'`` or ``'foo3'``.
By default Unicode alphanumerics are the ones used, but this can be changed
by using the :const:`ASCII` flag. Inside a character range, ``\b``
represents the backspace character, for compatibility with Python's string
literals.
``\B``
Matches the empty string, but only when it is *not* at the beginning or end of a
word. This is just the opposite of ``\b``, so word characters are
Matches the empty string, but only when it is *not* at the beginning or end
of a word. This means that ``r'py\B'`` matches ``'python'``, ``'py3'``,
``'py2'``, but not ``'py'``, ``'py.'``, or ``'py!'``.
``\B`` is just the opposite of ``\b``, so word characters are
Unicode alphanumerics or the underscore, although this can be changed
by using the :const:`ASCII` flag.
...
...
Lib/test/test_re.py
View file @
0b8123d8
...
...
@@ -355,6 +355,32 @@ class ReTests(unittest.TestCase):
self.assertEqual(re.search(r"
\
d
\
D
\
w
\
W
\
s
\
S",
"1aa! a", re.UNICODE).group(0), "1aa! a")
def test_string_boundaries(self):
# See http://bugs.python.org/issue10713
self.assertEqual(re.search(r"
\
b
(abc)
\
b
", "abc").group(1),
"abc")
# There'
s
a
word
boundary
at
the
start
of
a
string
.
self
.
assertTrue
(
re
.
match
(
r"\b"
,
"abc"
))
# A non-empty string includes a non-boundary zero-length match.
self
.
assertTrue
(
re
.
search
(
r"\
B
", "
abc
"))
# There is no non-boundary match at the start of a string.
self.assertFalse(re.match(r"
\
B", "
abc
"))
# However, an empty string contains no word boundaries, and also no
# non-boundaries.
self.assertEqual(re.search(r"
\
B", ""), None)
# This one is questionable and different from the perlre behaviour,
# but describes current behavior.
self.assertEqual(re.search(r"
\
b", ""), None)
# A single word-character string has two boundaries, but no
# non-boundary gaps.
self.assertEqual(len(re.findall(r"
\
b", "
a
")), 2)
self.assertEqual(len(re.findall(r"
\
B", "
a
")), 0)
# If there are no words, there are no boundaries
self.assertEqual(len(re.findall(r"
\
b", "
")), 0)
self.assertEqual(len(re.findall(r"
\
b", "
")), 0)
# Can match around the whitespace.
self.assertEqual(len(re.findall(r"
\
B", "
")), 2)
def test_bigcharset(self):
self.assertEqual(re.match("
([
\
u2222
\
u2223
])
",
"
\
u2222
").group(1), "
\
u2222
")
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment