Commit 722dc78d authored by Sergei Golubchik's avatar Sergei Golubchik

pcre-8.36

parents 32ec8625 553b437d
ChangeLog for PCRE
------------------
Version 8.36 26-September-2014
------------------------------
1. Got rid of some compiler warnings in the C++ modules that were shown up by
-Wmissing-field-initializers and -Wunused-parameter.
2. The tests for quantifiers being too big (greater than 65535) were being
applied after reading the number, and stupidly assuming that integer
overflow would give a negative number. The tests are now applied as the
numbers are read.
3. Tidy code in pcre_exec.c where two branches that used to be different are
now the same.
4. The JIT compiler did not generate match limit checks for certain
bracketed expressions with quantifiers. This may lead to exponential
backtracking, instead of returning with PCRE_ERROR_MATCHLIMIT. This
issue should be resolved now.
5. Fixed an issue, which occures when nested alternatives are optimized
with table jumps.
6. Inserted two casts and changed some ints to size_t in the light of some
reported 64-bit compiler warnings (Bugzilla 1477).
7. Fixed a bug concerned with zero-minimum possessive groups that could match
an empty string, which sometimes were behaving incorrectly in the
interpreter (though correctly in the JIT matcher). This pcretest input is
an example:
'\A(?:[^"]++|"(?:[^"]*+|"")*+")++'
NON QUOTED "QUOT""ED" AFTER "NOT MATCHED
the interpreter was reporting a match of 'NON QUOTED ' only, whereas the
JIT matcher and Perl both matched 'NON QUOTED "QUOT""ED" AFTER '. The test
for an empty string was breaking the inner loop and carrying on at a lower
level, when possessive repeated groups should always return to a higher
level as they have no backtrack points in them. The empty string test now
occurs at the outer level.
8. Fixed a bug that was incorrectly auto-possessifying \w+ in the pattern
^\w+(?>\s*)(?<=\w) which caused it not to match "test test".
9. Give a compile-time error for \o{} (as Perl does) and for \x{} (which Perl
doesn't).
10. Change 8.34/15 introduced a bug that caused the amount of memory needed
to hold a pattern to be incorrectly computed (too small) when there were
named back references to duplicated names. This could cause "internal
error: code overflow" or "double free or corruption" or other memory
handling errors.
11. When named subpatterns had the same prefixes, back references could be
confused. For example, in this pattern:
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
the reference to 'Name' was incorrectly treated as a reference to a
duplicate name.
12. A pattern such as /^s?c/mi8 where the optional character has more than
one "other case" was incorrectly compiled such that it would only try to
match starting at "c".
13. When a pattern starting with \s was studied, VT was not included in the
list of possible starting characters; this should have been part of the
8.34/18 patch.
14. If a character class started [\Qx]... where x is any character, the class
was incorrectly terminated at the ].
15. If a pattern that started with a caseless match for a character with more
than one "other case" was studied, PCRE did not set up the starting code
unit bit map for the list of possible characters. Now it does. This is an
optimization improvement, not a bug fix.
16. The Unicode data tables have been updated to Unicode 7.0.0.
17. Fixed a number of memory leaks in pcregrep.
18. Avoid a compiler warning (from some compilers) for a function call with
a cast that removes "const" from an lvalue by using an intermediate
variable (to which the compiler does not object).
19. Incorrect code was compiled if a group that contained an internal recursive
back reference was optional (had quantifier with a minimum of zero). This
example compiled incorrect code: /(((a\2)|(a*)\g<-1>))*/ and other examples
caused segmentation faults because of stack overflows at compile time.
20. A pattern such as /((?(R)a|(?1)))+/, which contains a recursion within a
group that is quantified with an indefinite repeat, caused a compile-time
loop which used up all the system stack and provoked a segmentation fault.
This was not the same bug as 19 above.
21. Add PCRECPP_EXP_DECL declaration to operator<< in pcre_stringpiece.h.
Patch by Mike Frysinger.
Version 8.35 04-April-2014
--------------------------
......@@ -27,9 +125,9 @@ Version 8.35 04-April-2014
6. Improve character range checks in JIT. Characters are read by an inprecise
function now, which returns with an unknown value if the character code is
above a certain treshold (e.g: 256). The only limitation is that the value
must be bigger than the treshold as well. This function is useful, when
the characters above the treshold are handled in the same way.
above a certain threshold (e.g: 256). The only limitation is that the value
must be bigger than the threshold as well. This function is useful when
the characters above the threshold are handled in the same way.
7. The macros whose names start with RAWUCHAR are placeholders for a future
mode in which only the bottom 21 bits of 32-bit data items are used. To
......
News about PCRE releases
------------------------
Release 8.36 26-September-2014
------------------------------
This is primarily a bug-fix release. However, in addition, the Unicode data
tables have been updated to Unicode 7.0.0.
Release 8.35 04-April-2014
--------------------------
......
......@@ -45,14 +45,16 @@ the 16-bit library, which processes strings of 16-bit values, and one for the
32-bit library, which processes strings of 32-bit values. The distribution also
includes a set of C++ wrapper functions (see the pcrecpp man page for details),
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
C++.
C++. Other C++ wrappers have been created from time to time. See, for example:
https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
style to the C API.
In addition, there is a set of C wrapper functions (again, just for the 8-bit
library) that are based on the POSIX regular expression API (see the pcreposix
man page). These end up in the library called libpcreposix. Note that this just
provides a POSIX calling interface to PCRE; the regular expressions themselves
still follow Perl syntax and semantics. The POSIX API is restricted, and does
not give full access to all of PCRE's facilities.
The distribution also contains a set of C wrapper functions (again, just for
the 8-bit library) that are based on the POSIX regular expression API (see the
pcreposix man page). These end up in the library called libpcreposix. Note that
this just provides a POSIX calling interface to PCRE; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE's facilities.
The header file for the POSIX-style functions is called pcreposix.h. The
official POSIX name is regex.h, but I did not want to risk possible problems
......@@ -988,4 +990,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 17 January 2014
Last updated: 24 October 2014
......@@ -9,19 +9,19 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [35])
m4_define(pcre_minor, [36])
m4_define(pcre_prerelease, [])
m4_define(pcre_date, [2014-04-04])
m4_define(pcre_date, [2014-09-26])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:3:2])
m4_define(libpcre16_version, [2:3:2])
m4_define(libpcre32_version, [0:3:0])
m4_define(libpcreposix_version, [0:2:0])
m4_define(libpcrecpp_version, [0:0:0])
m4_define(libpcre_version, [3:4:2])
m4_define(libpcre16_version, [2:4:2])
m4_define(libpcre32_version, [0:4:0])
m4_define(libpcreposix_version, [0:3:0])
m4_define(libpcrecpp_version, [0:1:0])
AC_PREREQ(2.57)
AC_INIT(PCRE, pcre_major.pcre_minor[]pcre_prerelease, , pcre)
......
......@@ -45,14 +45,16 @@ the 16-bit library, which processes strings of 16-bit values, and one for the
32-bit library, which processes strings of 32-bit values. The distribution also
includes a set of C++ wrapper functions (see the pcrecpp man page for details),
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
C++.
C++. Other C++ wrappers have been created from time to time. See, for example:
https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
style to the C API.
In addition, there is a set of C wrapper functions (again, just for the 8-bit
library) that are based on the POSIX regular expression API (see the pcreposix
man page). These end up in the library called libpcreposix. Note that this just
provides a POSIX calling interface to PCRE; the regular expressions themselves
still follow Perl syntax and semantics. The POSIX API is restricted, and does
not give full access to all of PCRE's facilities.
The distribution also contains a set of C wrapper functions (again, just for
the 8-bit library) that are based on the POSIX regular expression API (see the
pcreposix man page). These end up in the library called libpcreposix. Note that
this just provides a POSIX calling interface to PCRE; the regular expressions
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
and does not give full access to all of PCRE's facilities.
The header file for the POSIX-style functions is called pcreposix.h. The
official POSIX name is regex.h, but I did not want to risk possible problems
......@@ -988,4 +990,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
Philip Hazel
Email local part: ph10
Email domain: cam.ac.uk
Last updated: 17 January 2014
Last updated: 24 October 2014
......@@ -39,8 +39,10 @@ arguments are as follows:
<i>where</i> Points to where to put the data
</pre>
The <i>where</i> argument must point to an integer variable, except for
PCRE_CONFIG_MATCH_LIMIT and PCRE_CONFIG_MATCH_LIMIT_RECURSION, when it must
point to an unsigned long integer. The available codes are:
PCRE_CONFIG_MATCH_LIMIT, PCRE_CONFIG_MATCH_LIMIT_RECURSION, and
PCRE_CONFIG_PARENS_LIMIT, when it must point to an unsigned long integer,
and for PCRE_CONFIG_JITTARGET, when it must point to a const char*.
The available codes are:
<pre>
PCRE_CONFIG_JIT Availability of just-in-time compiler
support (1=yes 0=no)
......
......@@ -57,6 +57,10 @@ The following information is available:
PCRE_INFO_JITSIZE Size of JIT compiled code
PCRE_INFO_LASTLITERAL Literal last data unit required
PCRE_INFO_MINLENGTH Lower bound length of matching strings
PCRE_INFO_MATCHEMPTY Return 1 if the pattern can match an empty string,
0 otherwise
PCRE_INFO_MATCHLIMIT Match limit if set, otherwise PCRE_RROR_UNSET
PCRE_INFO_MAXLOOKBEHIND Length (in characters) of the longest lookbehind assertion
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
......@@ -72,6 +76,7 @@ The following information is available:
2 if the first character is at the start of the data
string or after a newline, and
0 otherwise
PCRE_INFO_RECURSIONLIMIT Recursion limit if set, otherwise PCRE_ERROR_UNSET
PCRE_INFO_REQUIREDCHAR Literal last data unit required
PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then
be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise
......@@ -79,14 +84,18 @@ The following information is available:
The <i>where</i> argument must point to an integer variable, except for the
following <i>what</i> values:
<pre>
PCRE_INFO_DEFAULT_TABLES const unsigned char *
PCRE_INFO_FIRSTTABLE const unsigned char *
PCRE_INFO_DEFAULT_TABLES const uint8_t *
PCRE_INFO_FIRSTCHARACTER uint32_t
PCRE_INFO_FIRSTTABLE const uint8_t *
PCRE_INFO_JITSIZE size_t
PCRE_INFO_MATCHLIMIT uint32_t
PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library)
PCRE_INFO_NAMETABLE PCRE_SPTR32 (32-bit library)
PCRE_INFO_NAMETABLE const unsigned char * (8-bit library)
PCRE_INFO_OPTIONS unsigned long int
PCRE_INFO_SIZE size_t
PCRE_INFO_FIRSTCHARACTER uint32_t
PCRE_INFO_STUDYSIZE size_t
PCRE_INFO_RECURSIONLIMIT uint32_t
PCRE_INFO_REQUIREDCHAR uint32_t
</pre>
The yield of the function is zero on success or:
......@@ -95,6 +104,7 @@ The yield of the function is zero on success or:
the argument <i>where</i> was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
PCRE_ERROR_UNSET the option was not set
</PRE>
</P>
<P>
......
......@@ -703,6 +703,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
Bassa_Vah,
Batak,
Bengali,
Bopomofo,
......@@ -712,6 +713,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
......@@ -722,11 +724,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
Grantha,
Greek,
Gujarati,
Gurmukhi,
......@@ -746,40 +751,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
Khojki,
Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
Mahajani,
Malayalam,
Mandaic,
Manichaean,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
Modi,
Mongolian,
Mro,
Myanmar,
Nabataean,
New_Tai_Lue,
Nko,
Ogham,
Ol_Chiki,
Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
Ol_Chiki,
Oriya,
Osmanya,
Pahawh_Hmong,
Palmyrene,
Pau_Cin_Hau,
Phags_Pa,
Phoenician,
Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
......@@ -797,8 +818,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
</P>
<P>
......
......@@ -171,6 +171,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
Bassa_Vah,
Batak,
Bengali,
Bopomofo,
......@@ -180,6 +181,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
......@@ -190,11 +192,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
Grantha,
Greek,
Gujarati,
Gurmukhi,
......@@ -214,40 +219,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
Khojki,
Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
Mahajani,
Malayalam,
Mandaic,
Manichaean,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
Modi,
Mongolian,
Mro,
Myanmar,
Nabataean,
New_Tai_Lue,
Nko,
Ogham,
Ol_Chiki,
Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
Ol_Chiki,
Oriya,
Osmanya,
Pahawh_Hmong,
Palmyrene,
Pau_Cin_Hau,
Phags_Pa,
Phoenician,
Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
......@@ -265,8 +286,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
</P>
<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br>
......
......@@ -5326,21 +5326,25 @@ BACKSLASH
Those that are not part of an identified script are lumped together as
"Common". The current list of scripts is:
Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,
Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma,
Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic,
Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-
gana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian,
Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko,
Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic,
Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari-
tan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese,
Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,
Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,
Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali,
Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car-
ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei-
form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero-
glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha,
Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana,
Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin-
ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic,
Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean,
New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian,
Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya,
Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac,
Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi,
Yi.
Each character has exactly one Unicode general category property, spec-
......@@ -7777,21 +7781,25 @@ PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P
SCRIPT NAMES FOR \p AND \P
Arabic, Armenian, Avestan, Balinese, Bamum, Batak, Bengali, Bopomofo,
Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Chakma,
Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret,
Devanagari, Egyptian_Hieroglyphs, Ethiopic, Georgian, Glagolitic,
Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hira-
gana, Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lisu, Lycian,
Lydian, Malayalam, Mandaic, Meetei_Mayek, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Mongolian, Myanmar, New_Tai_Lue, Nko,
Ogham, Old_Italic, Old_Persian, Old_South_Arabian, Old_Turkic,
Ol_Chiki, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Samari-
tan, Saurashtra, Sharada, Shavian, Sinhala, Sora_Sompeng, Sundanese,
Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet,
Takri, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai,
Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali,
Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car-
ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei-
form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero-
glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha,
Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana,
Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip-
tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li,
Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin-
ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic,
Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive,
Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean,
New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian,
Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya,
Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician,
Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha-
vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac,
Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu,
Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi,
Yi.
......
.TH PCRE_CONFIG 3 "05 November 2013" "PCRE 8.34"
.TH PCRE_CONFIG 3 "20 April 2014" "PCRE 8.36"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
......@@ -24,8 +24,10 @@ arguments are as follows:
\fIwhere\fP Points to where to put the data
.sp
The \fIwhere\fP argument must point to an integer variable, except for
PCRE_CONFIG_MATCH_LIMIT and PCRE_CONFIG_MATCH_LIMIT_RECURSION, when it must
point to an unsigned long integer. The available codes are:
PCRE_CONFIG_MATCH_LIMIT, PCRE_CONFIG_MATCH_LIMIT_RECURSION, and
PCRE_CONFIG_PARENS_LIMIT, when it must point to an unsigned long integer,
and for PCRE_CONFIG_JITTARGET, when it must point to a const char*.
The available codes are:
.sp
PCRE_CONFIG_JIT Availability of just-in-time compiler
support (1=yes 0=no)
......
.TH PCRE_FULLINFO 3 "24 June 2012" "PCRE 8.30"
.TH PCRE_FULLINFO 3 "21 April 2014" "PCRE 8.36"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH SYNOPSIS
......@@ -43,6 +43,10 @@ The following information is available:
PCRE_INFO_JITSIZE Size of JIT compiled code
PCRE_INFO_LASTLITERAL Literal last data unit required
PCRE_INFO_MINLENGTH Lower bound length of matching strings
PCRE_INFO_MATCHEMPTY Return 1 if the pattern can match an empty string,
0 otherwise
PCRE_INFO_MATCHLIMIT Match limit if set, otherwise PCRE_RROR_UNSET
PCRE_INFO_MAXLOOKBEHIND Length (in characters) of the longest lookbehind assertion
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
......@@ -58,6 +62,7 @@ The following information is available:
2 if the first character is at the start of the data
string or after a newline, and
0 otherwise
PCRE_INFO_RECURSIONLIMIT Recursion limit if set, otherwise PCRE_ERROR_UNSET
PCRE_INFO_REQUIREDCHAR Literal last data unit required
PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then
be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise
......@@ -65,14 +70,18 @@ The following information is available:
The \fIwhere\fP argument must point to an integer variable, except for the
following \fIwhat\fP values:
.sp
PCRE_INFO_DEFAULT_TABLES const unsigned char *
PCRE_INFO_FIRSTTABLE const unsigned char *
PCRE_INFO_DEFAULT_TABLES const uint8_t *
PCRE_INFO_FIRSTCHARACTER uint32_t
PCRE_INFO_FIRSTTABLE const uint8_t *
PCRE_INFO_JITSIZE size_t
PCRE_INFO_MATCHLIMIT uint32_t
PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library)
PCRE_INFO_NAMETABLE PCRE_SPTR32 (32-bit library)
PCRE_INFO_NAMETABLE const unsigned char * (8-bit library)
PCRE_INFO_OPTIONS unsigned long int
PCRE_INFO_SIZE size_t
PCRE_INFO_FIRSTCHARACTER uint32_t
PCRE_INFO_STUDYSIZE size_t
PCRE_INFO_RECURSIONLIMIT uint32_t
PCRE_INFO_REQUIREDCHAR uint32_t
.sp
The yield of the function is zero on success or:
......@@ -81,6 +90,7 @@ The yield of the function is zero on success or:
the argument \fIwhere\fP was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of \fIwhat\fP was invalid
PCRE_ERROR_UNSET the option was not set
.P
There is a complete description of the PCRE native API in the
.\" HREF
......
......@@ -708,6 +708,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
Bassa_Vah,
Batak,
Bengali,
Bopomofo,
......@@ -717,6 +718,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
......@@ -727,11 +729,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
Grantha,
Greek,
Gujarati,
Gurmukhi,
......@@ -751,40 +756,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
Khojki,
Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
Mahajani,
Malayalam,
Mandaic,
Manichaean,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
Modi,
Mongolian,
Mro,
Myanmar,
Nabataean,
New_Tai_Lue,
Nko,
Ogham,
Ol_Chiki,
Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
Ol_Chiki,
Oriya,
Osmanya,
Pahawh_Hmong,
Palmyrene,
Pau_Cin_Hau,
Phags_Pa,
Phoenician,
Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
......@@ -802,8 +823,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
.P
Each character has exactly one Unicode general category property, specified by
......
......@@ -139,6 +139,7 @@ Armenian,
Avestan,
Balinese,
Bamum,
Bassa_Vah,
Batak,
Bengali,
Bopomofo,
......@@ -148,6 +149,7 @@ Buginese,
Buhid,
Canadian_Aboriginal,
Carian,
Caucasian_Albanian,
Chakma,
Cham,
Cherokee,
......@@ -158,11 +160,14 @@ Cypriot,
Cyrillic,
Deseret,
Devanagari,
Duployan,
Egyptian_Hieroglyphs,
Elbasan,
Ethiopic,
Georgian,
Glagolitic,
Gothic,
Grantha,
Greek,
Gujarati,
Gurmukhi,
......@@ -182,40 +187,56 @@ Katakana,
Kayah_Li,
Kharoshthi,
Khmer,
Khojki,
Khudawadi,
Lao,
Latin,
Lepcha,
Limbu,
Linear_A,
Linear_B,
Lisu,
Lycian,
Lydian,
Mahajani,
Malayalam,
Mandaic,
Manichaean,
Meetei_Mayek,
Mende_Kikakui,
Meroitic_Cursive,
Meroitic_Hieroglyphs,
Miao,
Modi,
Mongolian,
Mro,
Myanmar,
Nabataean,
New_Tai_Lue,
Nko,
Ogham,
Ol_Chiki,
Old_Italic,
Old_North_Arabian,
Old_Permic,
Old_Persian,
Old_South_Arabian,
Old_Turkic,
Ol_Chiki,
Oriya,
Osmanya,
Pahawh_Hmong,
Palmyrene,
Pau_Cin_Hau,
Phags_Pa,
Phoenician,
Psalter_Pahlavi,
Rejang,
Runic,
Samaritan,
Saurashtra,
Sharada,
Shavian,
Siddham,
Sinhala,
Sora_Sompeng,
Sundanese,
......@@ -233,8 +254,10 @@ Thaana,
Thai,
Tibetan,
Tifinagh,
Tirhuta,
Ugaritic,
Vai,
Warang_Citi,
Yi.
.
.
......
This diff is collapsed.
......@@ -3242,7 +3242,7 @@ md->callout_data = NULL;
if (extra_data != NULL)
{
unsigned int flags = extra_data->flags;
unsigned long int flags = extra_data->flags;
if ((flags & PCRE_EXTRA_STUDY_DATA) != 0)
study = (const pcre_study_data *)extra_data->study_data;
if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0) return PCRE_ERROR_DFA_UMLIMIT;
......
......@@ -1167,11 +1167,16 @@ for (;;)
if (rrc == MATCH_KETRPOS)
{
offset_top = md->end_offset_top;
eptr = md->end_match_ptr;
ecode = md->start_code + code_offset;
save_capture_last = md->capture_last;
matched_once = TRUE;
mstart = md->start_match_ptr; /* In case \K changed it */
if (eptr == md->end_match_ptr) /* Matched an empty string */
{
do ecode += GET(ecode, 1); while (*ecode == OP_ALT);
break;
}
eptr = md->end_match_ptr;
continue;
}
......@@ -1241,10 +1246,15 @@ for (;;)
if (rrc == MATCH_KETRPOS)
{
offset_top = md->end_offset_top;
eptr = md->end_match_ptr;
ecode = md->start_code + code_offset;
matched_once = TRUE;
mstart = md->start_match_ptr; /* In case \K reset it */
if (eptr == md->end_match_ptr) /* Matched an empty string */
{
do ecode += GET(ecode, 1); while (*ecode == OP_ALT);
break;
}
eptr = md->end_match_ptr;
continue;
}
......@@ -1979,6 +1989,19 @@ for (;;)
}
}
/* OP_KETRPOS is a possessive repeating ket. Remember the current position,
and return the MATCH_KETRPOS. This makes it possible to do the repeats one
at a time from the outer level, thus saving stack. This must precede the
empty string test - in this case that test is done at the outer level. */
if (*ecode == OP_KETRPOS)
{
md->start_match_ptr = mstart; /* In case \K reset it */
md->end_match_ptr = eptr;
md->end_offset_top = offset_top;
RRETURN(MATCH_KETRPOS);
}
/* For an ordinary non-repeating ket, just continue at this level. This
also happens for a repeating ket if no characters were matched in the
group. This is the forcible breaking of infinite loops as implemented in
......@@ -2001,18 +2024,6 @@ for (;;)
break;
}
/* OP_KETRPOS is a possessive repeating ket. Remember the current position,
and return the MATCH_KETRPOS. This makes it possible to do the repeats one
at a time from the outer level, thus saving stack. */
if (*ecode == OP_KETRPOS)
{
md->start_match_ptr = mstart; /* In case \K reset it */
md->end_match_ptr = eptr;
md->end_offset_top = offset_top;
RRETURN(MATCH_KETRPOS);
}
/* The normal repeating kets try the rest of the pattern or restart from
the preceding bracket, in the appropriate order. In the second case, we can
use tail recursion to avoid using another stack frame, unless we have an
......@@ -5681,54 +5692,25 @@ for (;;)
switch(ctype)
{
case OP_ANY:
if (max < INT_MAX)
for (i = min; i < max; i++)
{
for (i = min; i < max; i++)
if (eptr >= md->end_subject)
{
if (eptr >= md->end_subject)
{
SCHECK_PARTIAL();
break;
}
if (IS_NEWLINE(eptr)) break;
if (md->partial != 0 && /* Take care with CRLF partial */
eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 &&
UCHAR21(eptr) == NLBLOCK->nl[0])
{
md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
}
eptr++;
ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++);
SCHECK_PARTIAL();
break;
}
}
/* Handle unlimited UTF-8 repeat */
else
{
for (i = min; i < max; i++)
if (IS_NEWLINE(eptr)) break;
if (md->partial != 0 && /* Take care with CRLF partial */
eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 &&
UCHAR21(eptr) == NLBLOCK->nl[0])
{
if (eptr >= md->end_subject)
{
SCHECK_PARTIAL();
break;
}
if (IS_NEWLINE(eptr)) break;
if (md->partial != 0 && /* Take care with CRLF partial */
eptr + 1 >= md->end_subject &&
NLBLOCK->nltype == NLTYPE_FIXED &&
NLBLOCK->nllen == 2 &&
UCHAR21(eptr) == NLBLOCK->nl[0])
{
md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
}
eptr++;
ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++);
md->hitend = TRUE;
if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL);
}
eptr++;
ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++);
}
break;
......@@ -6519,7 +6501,7 @@ tables = re->tables;
if (extra_data != NULL)
{
register unsigned int flags = extra_data->flags;
unsigned long int flags = extra_data->flags;
if ((flags & PCRE_EXTRA_STUDY_DATA) != 0)
study = (const pcre_study_data *)extra_data->study_data;
if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0)
......
......@@ -2281,7 +2281,7 @@ enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9,
ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59,
ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69,
ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79,
ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERRCOUNT };
ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERRCOUNT };
/* JIT compiling modes. The function list is indexed by them. */
......
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -149,6 +149,8 @@ static void TestBigComment() {
// small stack size
int main(int argc, char** argv) {
(void)argc;
(void)argv;
TestScanner();
TestBigComment();
......
......@@ -174,6 +174,7 @@ template<> struct __type_traits<pcrecpp::StringPiece> {
#endif
// allow StringPiece to be logged
std::ostream& operator<<(std::ostream& o, const pcrecpp::StringPiece& piece);
PCRECPP_EXP_DECL std::ostream& operator<<(std::ostream& o,
const pcrecpp::StringPiece& piece);
#endif /* _PCRE_STRINGPIECE_H */
......@@ -142,6 +142,8 @@ static void CheckComparisonOperators() {
}
int main(int argc, char** argv) {
(void)argc;
(void)argv;
CheckComparisonOperators();
CheckSTLComparator();
......
......@@ -863,7 +863,6 @@ do
case OP_NOTUPTOI:
case OP_NOT_HSPACE:
case OP_NOT_VSPACE:
case OP_PROP:
case OP_PRUNE:
case OP_PRUNE_ARG:
case OP_RECURSE:
......@@ -881,6 +880,31 @@ do
case OP_THEN_ARG:
return SSB_FAIL;
/* A "real" property test implies no starting bits, but the fake property
PT_CLIST identifies a list of characters. These lists are short, as they
are used for characters with more than one "other case", so there is no
point in recognizing them for OP_NOTPROP. */
case OP_PROP:
if (tcode[1] != PT_CLIST) return SSB_FAIL;
{
const pcre_uint32 *p = PRIV(ucd_caseless_sets) + tcode[2];
while ((c = *p++) < NOTACHAR)
{
#if defined SUPPORT_UTF && defined COMPILE_PCRE8
if (utf)
{
pcre_uchar buff[6];
(void)PRIV(ord2utf)(c, buff);
c = buff[0];
}
#endif
if (c > 0xff) SET_BIT(0xff); else SET_BIT(c);
}
}
try_next = FALSE;
break;
/* We can ignore word boundary tests. */
case OP_WORD_BOUNDARY:
......@@ -1106,24 +1130,17 @@ do
try_next = FALSE;
break;
/* The cbit_space table has vertical tab as whitespace; we have to
ensure it is set as not whitespace. Luckily, the code value is the same
(0x0b) in ASCII and EBCDIC, so we can just adjust the appropriate bit. */
/* The cbit_space table has vertical tab as whitespace; we no longer
have to play fancy tricks because Perl added VT to its whitespace at
release 5.18. PCRE added it at release 8.34. */
case OP_NOT_WHITESPACE:
set_nottype_bits(start_bits, cbit_space, table_limit, cd);
start_bits[1] |= 0x08;
try_next = FALSE;
break;
/* The cbit_space table has vertical tab as whitespace; we have to not
set it from the table. Luckily, the code value is the same (0x0b) in
ASCII and EBCDIC, so we can just adjust the appropriate bit. */
case OP_WHITESPACE:
c = start_bits[1]; /* Save in case it was already set */
set_type_bits(start_bits, cbit_space, table_limit, cd);
start_bits[1] = (start_bits[1] & ~0x08) | c;
try_next = FALSE;
break;
......
This diff is collapsed.
This diff is collapsed.
......@@ -511,7 +511,7 @@ int RE::TryMatch(const StringPiece& text,
return 0;
}
pcre_extra extra = { 0, 0, 0, 0, 0, 0 };
pcre_extra extra = { 0, 0, 0, 0, 0, 0, 0, 0 };
if (options_.match_limit() > 0) {
extra.flags |= PCRE_EXTRA_MATCH_LIMIT;
extra.match_limit = options_.match_limit();
......@@ -660,6 +660,8 @@ int RE::NumberOfCapturingGroups() const {
/***** Parsers for various types *****/
bool Arg::parse_null(const char* str, int n, void* dest) {
(void)str;
(void)n;
// We fail if somebody asked us to store into a non-NULL void* pointer
return (dest == NULL);
}
......
......@@ -455,7 +455,7 @@ Arguments:
s pattern string to add
after if not NULL points to item to insert after
Returns: new pattern block
Returns: new pattern block or NULL on error
*/
static patstr *
......@@ -471,6 +471,7 @@ if (strlen(s) > MAXPATLEN)
{
fprintf(stderr, "pcregrep: pattern is too long (limit is %d bytes)\n",
MAXPATLEN);
free(p);
return NULL;
}
p->next = NULL;
......@@ -2549,7 +2550,11 @@ while (fgets(buffer, PATBUFSIZE, f) != NULL)
afterwards, as a precaution against any later code trying to use it. */
*patlastptr = add_pattern(buffer, *patlastptr);
if (*patlastptr == NULL) return FALSE;
if (*patlastptr == NULL)
{
if (f != stdin) fclose(f);
return FALSE;
}
if (*patptr == NULL) *patptr = *patlastptr;
/* This loop is needed because compiling a "pattern" when -F is set may add
......@@ -2561,7 +2566,10 @@ while (fgets(buffer, PATBUFSIZE, f) != NULL)
{
if (!compile_pattern(*patlastptr, pcre_options, popts, TRUE, filename,
linenumber))
{
if (f != stdin) fclose(f);
return FALSE;
}
(*patlastptr)->string = NULL; /* Insurance */
if ((*patlastptr)->next == NULL) break;
*patlastptr = (*patlastptr)->next;
......@@ -2962,8 +2970,8 @@ if (locale == NULL)
locale_from = "LC_CTYPE";
}
/* If a locale has been provided, set it, and generate the tables the PCRE
needs. Otherwise, pcretables==NULL, which causes the use of default tables. */
/* If a locale is set, use it to generate the tables the PCRE needs. Otherwise,
pcretables==NULL, which causes the use of default tables. */
if (locale != NULL)
{
......@@ -2971,7 +2979,7 @@ if (locale != NULL)
{
fprintf(stderr, "pcregrep: Failed to set locale %s (obtained from %s)\n",
locale, locale_from);
return 2;
goto EXIT2;
}
pcretables = pcre_maketables();
}
......@@ -2986,7 +2994,7 @@ if (colour_option != NULL && strcmp(colour_option, "never") != 0)
{
fprintf(stderr, "pcregrep: Unknown colour setting \"%s\"\n",
colour_option);
return 2;
goto EXIT2;
}
if (do_colour)
{
......@@ -3026,7 +3034,7 @@ else if (strcmp(newline, "anycrlf") == 0 || strcmp(newline, "ANYCRLF") == 0)
else
{
fprintf(stderr, "pcregrep: Invalid newline specifier \"%s\"\n", newline);
return 2;
goto EXIT2;
}
/* Interpret the text values for -d and -D */
......@@ -3039,7 +3047,7 @@ if (dee_option != NULL)
else
{
fprintf(stderr, "pcregrep: Invalid value \"%s\" for -d\n", dee_option);
return 2;
goto EXIT2;
}
}
......@@ -3050,7 +3058,7 @@ if (DEE_option != NULL)
else
{
fprintf(stderr, "pcregrep: Invalid value \"%s\" for -D\n", DEE_option);
return 2;
goto EXIT2;
}
}
......@@ -3251,7 +3259,8 @@ EXIT:
if (jit_stack != NULL) pcre_jit_stack_free(jit_stack);
#endif
if (main_buffer != NULL) free(main_buffer);
free(main_buffer);
free((void *)pcretables);
free_pattern_chain(patterns);
free_pattern_chain(include_patterns);
......
......@@ -172,7 +172,8 @@ static const int eint[] = {
REG_BADPAT, /* invalid range in character class */
REG_BADPAT, /* group name must start with a non-digit */
/* 85 */
REG_BADPAT /* parentheses too deeply nested (stack check) */
REG_BADPAT, /* parentheses too deeply nested (stack check) */
REG_BADPAT /* missing digits in \x{} or \o{} */
};
/* Table of texts corresponding to POSIX error codes */
......
......@@ -111,7 +111,7 @@
bababbc
babababc
/^\ca\cA\c[\c{\c:/
/^\ca\cA\c[;\c:/
\x01\x01\e;z
/^[ab\]cde]/
......@@ -4937,6 +4937,12 @@ however, we need the complication for Perl. ---/
/((?(R1)a+|(?1)b))/
aaaabcde
/((?(R)a|(?1)))*/
aaa
/((?(R)a|(?1)))+/
aaa
/a(*:any
name)/K
......@@ -5666,4 +5672,52 @@ AbcdCBefgBhiBqz
/(a\Kb)*/+
ababc
/(?:x|(?:(xx|yy)+|x|x|x|x|x)|a|a|a)bc/
acb
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
/^\w+(?>\s*)(?<=\w)/
test test
/(?P<same>a)(?P<same>b)/gJ
abbaba
/(?P<same>a)(?P<same>b)(?P=same)/gJ
abbaba
/(?P=same)?(?P<same>a)(?P<same>b)/gJ
abbaba
/(?:(?P=same)?(?:(?P<same>a)|(?P<same>b))(?P=same))+/gJ
bbbaaabaabb
/(?:(?P=same)?(?:(?P=same)(?P<same>a)(?P=same)|(?P=same)?(?P<same>b)(?P=same)){2}(?P=same)(?P<same>c)(?P=same)){2}(?P<same>z)?/gJ
bbbaaaccccaaabbbcc
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
acl
bdl
adl
bcl
/\sabc/
\x{0b}abc
/[\Qa]\E]+/
aa]]
/[\Q]a\E]+/
aa]]
/-- End of testinput1 --/
......@@ -132,4 +132,6 @@ is required for these tests. --/
/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/B
/(((a\2)|(a*)\g<-1>))*a?/B
/-- End of testinput11 --/
......@@ -32,4 +32,10 @@
/[[:blank:]]/WBZ
/\x{212a}+/i8SI
KKkk\x{212a}
/s+/i8SI
SSss\x{17f}
/-- End of testinput16 --/
......@@ -19,4 +19,10 @@
/[[:blank:]]/WBZ
/\x{212a}+/i8SI
KKkk\x{212a}
/s+/i8SI
SSss\x{17f}
/-- End of testinput19 --/
......@@ -4035,6 +4035,8 @@ backtracking verbs. --/
/(?(R&6yh)abc)/
/(((a\2)|(a*)\g<-1>))*a?/BZ
/-- Test the ugly "start or end of word" compatibility syntax --/
/[[:<:]]red[[:>:]]/BZ
......@@ -4062,4 +4064,18 @@ backtracking verbs. --/
/(((((a)))))/Q
/^\w+(?>\s*)(?<=\w)/BZ
/\othing/
/\o{}/
/\o{whatever}/
/\xthing/
/\x{}/
/\x{whatever}/
/-- End of testinput2 --/
......@@ -421,8 +421,8 @@
/^[\p{Arabic}]/8
\x{06e9}
\x{060b}
\x{061c}
** Failers
\x{061c}
X\x{06e9}
/^[\P{Yi}]/8
......@@ -1493,4 +1493,7 @@
/[q-u]+/8iW
Ss\x{17f}
/^s?c/mi8
scat
/-- End of testinput6 --/
......@@ -835,4 +835,7 @@ of case for anything other than the ASCII letters. --/
/[Q-U]+/8iWBZ
/^s?c/mi8I
scat
/-- End of testinput7 --/
......@@ -4831,4 +4831,10 @@
/[ab]{2,}?/
aaaa
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
/-- End of testinput8 --/
......@@ -223,7 +223,7 @@ No match
babababc
No match
/^\ca\cA\c[\c{\c:/
/^\ca\cA\c[;\c:/
\x01\x01\e;z
0: \x01\x01\x1b;z
......@@ -8234,6 +8234,16 @@ MK: M
aaaabcde
0: aaaab
1: aaaab
/((?(R)a|(?1)))*/
aaa
0: aaa
1: a
/((?(R)a|(?1)))+/
aaa
0: aaa
1: a
/a(*:any
name)/K
......@@ -9313,4 +9323,92 @@ No match
0+ c
1: ab
/(?:x|(?:(xx|yy)+|x|x|x|x|x)|a|a|a)bc/
acb
No match
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
1: AFTER
2:
/^\w+(?>\s*)(?<=\w)/
test test
0: tes
/(?P<same>a)(?P<same>b)/gJ
abbaba
0: ab
1: a
2: b
0: ab
1: a
2: b
/(?P<same>a)(?P<same>b)(?P=same)/gJ
abbaba
0: aba
1: a
2: b
/(?P=same)?(?P<same>a)(?P<same>b)/gJ
abbaba
0: ab
1: a
2: b
0: ab
1: a
2: b
/(?:(?P=same)?(?:(?P<same>a)|(?P<same>b))(?P=same))+/gJ
bbbaaabaabb
0: bbbaaaba
1: a
2: b
0: bb
1: <unset>
2: b
/(?:(?P=same)?(?:(?P=same)(?P<same>a)(?P=same)|(?P=same)?(?P<same>b)(?P=same)){2}(?P=same)(?P<same>c)(?P=same)){2}(?P<same>z)?/gJ
bbbaaaccccaaabbbcc
No match
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
acl
0: acl
1: a
bdl
0: bdl
1: <unset>
2: b
adl
0: dl
bcl
0: l
/\sabc/
\x{0b}abc
0: \x0babc
/[\Qa]\E]+/
aa]]
0: aa]]
/[\Q]a\E]+/
aa]]
0: aa]]
/-- End of testinput1 --/
......@@ -709,4 +709,28 @@ Memory allocation (code space): 14
62 End
------------------------------------------------------------------
/(((a\2)|(a*)\g<-1>))*a?/B
------------------------------------------------------------------
0 39 Bra
2 Brazero
3 32 SCBra 1
6 27 Once
8 12 CBra 2
11 7 CBra 3
14 a
16 \2
18 7 Ket
20 11 Alt
22 5 CBra 4
25 a*
27 5 Ket
29 22 Recurse
31 23 Ket
33 27 Ket
35 32 KetRmax
37 a?+
39 39 Ket
41 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -709,4 +709,28 @@ Memory allocation (code space): 28
62 End
------------------------------------------------------------------
/(((a\2)|(a*)\g<-1>))*a?/B
------------------------------------------------------------------
0 39 Bra
2 Brazero
3 32 SCBra 1
6 27 Once
8 12 CBra 2
11 7 CBra 3
14 a
16 \2
18 7 Ket
20 11 Alt
22 5 CBra 4
25 a*
27 5 Ket
29 22 Recurse
31 23 Ket
33 27 Ket
35 32 KetRmax
37 a?+
39 39 Ket
41 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -709,4 +709,28 @@ Memory allocation (code space): 10
76 End
------------------------------------------------------------------
/(((a\2)|(a*)\g<-1>))*a?/B
------------------------------------------------------------------
0 57 Bra
3 Brazero
4 48 SCBra 1
9 40 Once
12 18 CBra 2
17 10 CBra 3
22 a
24 \2
27 10 Ket
30 16 Alt
33 7 CBra 4
38 a*
40 7 Ket
43 33 Recurse
46 34 Ket
49 40 Ket
52 48 KetRmax
55 a?+
57 57 Ket
60 End
------------------------------------------------------------------
/-- End of testinput11 --/
......@@ -871,7 +871,7 @@ Options: utf
No first char
Need char = 'x'
Subject length lower bound = 5
Starting chars: \x09 \x0a \x0c \x0d \x20 \xc2
Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \xc2
AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ
......@@ -883,15 +883,15 @@ Options: utf
No first char
Need char = ' '
Subject length lower bound = 3
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3
\xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2
\xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1
\xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0
\xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4
\xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3
\xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2
\xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1
\xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
\x{a2} \x{84}
0: \x{a2} \x{84}
A Z
......
......@@ -118,4 +118,24 @@ Starting chars: \x0a \x0b \x0c \x0d \x85
End
------------------------------------------------------------------
/\x{212a}+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: K k \xe2
KKkk\x{212a}
0: KKkk\x{212a}
/s+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: S s \xc5
SSss\x{17f}
0: SSss\x{17f}
/-- End of testinput16 --/
......@@ -752,7 +752,7 @@ Options: utf
No first char
Need char = 'x'
Subject length lower bound = 5
Starting chars: \x09 \x0a \x0c \x0d \x20 \x85 \xa0
Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ
......@@ -764,20 +764,20 @@ Options: utf
No first char
Need char = ' '
Subject length lower bound = 3
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83
\x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93
\x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3
\xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2
\xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1
\xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0
\xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf
\xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee
\xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd
\xfe \xff
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84
\x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94
\x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4
\xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3
\xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2
\xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1
\xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0
\xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef
\xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe
\xff
\x{a2} \x{84}
0: \x{a2} \x{84}
A Z
......
......@@ -749,7 +749,7 @@ Options: utf
No first char
Need char = 'x'
Subject length lower bound = 5
Starting chars: \x09 \x0a \x0c \x0d \x20 \x85 \xa0
Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0
AB\x{85}xxx\x{a0}XYZ
0: \x{85}xxx\x{a0}
AB\x{a0}xxx\x{85}XYZ
......@@ -761,20 +761,20 @@ Options: utf
No first char
Need char = ' '
Subject length lower bound = 3
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0b \x0e
\x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d
\x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e
f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83
\x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93
\x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3
\xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2
\xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1
\xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0
\xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf
\xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee
\xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd
\xfe \xff
Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
\x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
\x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84
\x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94
\x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4
\xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3
\xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2
\xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1
\xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0
\xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef
\xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe
\xff
\x{a2} \x{84}
0: \x{a2} \x{84}
A Z
......
......@@ -85,4 +85,24 @@ No starting char list
End
------------------------------------------------------------------
/\x{212a}+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: K k \xff
KKkk\x{212a}
0: KKkk\x{212a}
/s+/i8SI
Capturing subpattern count = 0
Options: caseless utf
No first char
No need char
Subject length lower bound = 1
Starting chars: S s \xff
SSss\x{17f}
0: SSss\x{17f}
/-- End of testinput19 --/
......@@ -5821,13 +5821,13 @@ No match
No match
/a{11111111111111111111}/I
Failed: number too big in {} quantifier at offset 22
Failed: number too big in {} quantifier at offset 8
/(){64294967295}/I
Failed: number too big in {} quantifier at offset 14
Failed: number too big in {} quantifier at offset 9
/(){2,4294967295}/I
Failed: number too big in {} quantifier at offset 15
Failed: number too big in {} quantifier at offset 11
"(?i:a)(?i:b)(?i:c)(?i:d)(?i:e)(?i:f)(?i:g)(?i:h)(?i:i)(?i:j)(k)(?i:l)A\1B"I
Capturing subpattern count = 1
......@@ -14093,6 +14093,30 @@ Failed: malformed number or name after (?( at offset 4
/(?(R&6yh)abc)/
Failed: group name must start with a non-digit at offset 5
/(((a\2)|(a*)\g<-1>))*a?/BZ
------------------------------------------------------------------
Bra
Brazero
SCBra 1
Once
CBra 2
CBra 3
a
\2
Ket
Alt
CBra 4
a*
Ket
Recurse
Ket
Ket
KetRmax
a?+
Ket
End
------------------------------------------------------------------
/-- Test the ugly "start or end of word" compatibility syntax --/
/[[:<:]]red[[:>:]]/BZ
......@@ -14149,4 +14173,37 @@ Failed: parentheses are too deeply nested (stack check) at offset 0
/(((((a)))))/Q
** Missing 0 or 1 after /Q
/^\w+(?>\s*)(?<=\w)/BZ
------------------------------------------------------------------
Bra
^
\w+
Once_NC
\s*+
Ket
AssertB
Reverse
\w
Ket
Ket
End
------------------------------------------------------------------
/\othing/
Failed: missing opening brace after \o at offset 1
/\o{}/
Failed: digits missing in \x{} or \o{} at offset 1
/\o{whatever}/
Failed: non-octal character in \o{} (closing brace missing?) at offset 3
/\xthing/
/\x{}/
Failed: digits missing in \x{} or \o{} at offset 3
/\x{whatever}/
Failed: non-hex character in \x{} (closing brace missing?) at offset 3
/-- End of testinput2 --/
......@@ -719,9 +719,9 @@ No match
0: \x{6e9}
\x{060b}
0: \x{60b}
\x{061c}
0: \x{61c}
** Failers
No match
\x{061c}
No match
X\x{06e9}
No match
......@@ -2457,4 +2457,8 @@ No match
Ss\x{17f}
0: Ss\x{17f}
/^s?c/mi8
scat
0: sc
/-- End of testinput6 --/
......@@ -2287,4 +2287,12 @@ No match
End
------------------------------------------------------------------
/^s?c/mi8I
Capturing subpattern count = 0
Options: caseless multiline utf
First char at start or follows newline
Need char = 'c' (caseless)
scat
0: sc
/-- End of testinput7 --/
......@@ -7777,4 +7777,12 @@ Matched, but offsets vector is too small to show all matches
1: aaa
2: aa
'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++'
NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED
0: NON QUOTED "QUOT""ED" AFTER
/-- End of testinput8 --/
......@@ -192,7 +192,31 @@ enum {
ucp_Miao,
ucp_Sharada,
ucp_Sora_Sompeng,
ucp_Takri
ucp_Takri,
/* New for Unicode 7.0.0: */
ucp_Bassa_Vah,
ucp_Caucasian_Albanian,
ucp_Duployan,
ucp_Elbasan,
ucp_Grantha,
ucp_Khojki,
ucp_Khudawadi,
ucp_Linear_A,
ucp_Mahajani,
ucp_Manichaean,
ucp_Mende_Kikakui,
ucp_Modi,
ucp_Mro,
ucp_Nabataean,
ucp_Old_North_Arabian,
ucp_Old_Permic,
ucp_Pahawh_Hmong,
ucp_Palmyrene,
ucp_Psalter_Pahlavi,
ucp_Pau_Cin_Hau,
ucp_Siddham,
ucp_Tirhuta,
ucp_Warang_Citi
};
#endif
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment