PCRECOMPAT(3)                                                    PCRECOMPAT(3)



NAME
       PCRE - Perl-compatible regular expressions

DIFFERENCES BETWEEN PCRE AND PERL

       This document describes the differences in the ways that
       PCRE and Perl handle regular  expressions.  The  differ-
       ences  described  here  are  mainly with respect to Perl
       5.8, though PCRE version 7.0 contains some features that
       are expected to be in the forthcoming Perl 5.10.

       1.  PCRE  has  only a subset of Perl's UTF-8 and Unicode
       support. Details of what it does have are given  in  the
       section on UTF-8 support in the main pcre page.

       2.  PCRE  does not allow repeat quantifiers on lookahead
       assertions. Perl permits them, but they do not mean what
       you  might  think. For example, (?!a){3} does not assert
       that the next three characters  are  not  "a".  It  just
       asserts  that the next character is not "a" three times.

       3. Capturing  subpatterns  that  occur  inside  negative
       lookahead  assertions  are counted, but their entries in
       the offsets vector are never set. Perl sets its  numeri-
       cal  variables  from  any such patterns that are matched
       before the assertion fails to match  something  (thereby
       succeeding),  but  only if the negative lookahead asser-
       tion contains just one branch.

       4. Though binary zero characters are  supported  in  the
       subject string, they are not allowed in a pattern string
       because it is passed as a normal C string, terminated by
       zero.  The escape sequence \0 can be used in the pattern
       to represent a binary zero.

       5. The following Perl  escape  sequences  are  not  sup-
       ported: \l, \u, \L, \U, and \N. In fact these are imple-
       mented by Perl's general  string-handling  and  are  not
       part of its pattern matching engine. If any of these are
       encountered by PCRE, an error is generated.

       6. The Perl escape sequences \p, \P,  and  \X  are  sup-
       ported  only  if  PCRE  is  built with Unicode character
       property support. The properties that can be tested with
       \p and \P are limited to the general category properties
       such as Lu and Nd, script names such as  Greek  or  Han,
       and the derived properties Any and L&.

       7. PCRE does support the \Q...\E escape for quoting sub-
       strings. Characters in between are treated as  literals.
       This is slightly different from Perl in that $ and @ are
       also handled as literals inside  the  quotes.  In  Perl,
       they  cause  variable  interpolation (but of course PCRE
       does not have variables). Note the following examples:

           Pattern            PCRE matches      Perl matches

           \Qabc$xyz\E        abc$xyz           abc followed by
       the
                                                  contents   of
       $xyz
           \Qabc\$xyz\E       abc\$xyz          abc\$xyz
           \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz

       The \Q...\E sequence is recognized both inside and  out-
       side character classes.

       8. Fairly obviously, PCRE does not support the (?{code})
       and (??{code}) constructions. However, there is  support
       for  recursive  patterns.  This is not available in Perl
       5.8, but will be in Perl 5.10. Also, the PCRE  "callout"
       feature  allows an external function to be called during
       pattern matching. See the pcrecallout documentation  for
       details.

       9.  Subpatterns  that are called recursively or as "sub-
       routines" are always treated as atomic groups  in  PCRE.
       This is like Python, but unlike Perl.

       10.  There  are some differences that are concerned with
       the settings of captured strings when part of a  pattern
       is  repeated.  For  example,  matching "aba" against the
       pattern /^(a(b)?)+$/ in Perl leaves  $2  unset,  but  in
       PCRE it is set to "b".

       11.  PCRE  provides  some extensions to the Perl regular
       expression facilities.  Perl 5.10 will include new  fea-
       tures  that  are  not in earlier versions, some of which
       (such as named parentheses) have been in PCRE  for  some
       time. This list is with respect to Perl 5.10:

       (a)  Although  lookbehind  assertions  must  match fixed
       length strings, each alternative branch of a  lookbehind
       assertion  can  match a different length of string. Perl
       requires them all to have the same length.

       (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE  is
       not  set,  the $ meta-character matches only at the very
       end of the string.

       (c) If PCRE_EXTRA is set, a backslash followed by a let-
       ter  with no special meaning is faulted. Otherwise, like
       Perl, the backslash is ignored. (Perl  can  be  made  to
       issue a warning.)

       (d) If PCRE_UNGREEDY is set, the greediness of the repe-
       tition quantifiers is inverted, that is, by default they
       are  not greedy, but if followed by a question mark they
       are.

       (e) PCRE_ANCHORED can be used at matching time to  force
       a  pattern  to be tried only at the first matching posi-
       tion in the subject string.

       (f) The  PCRE_NOTBOL,  PCRE_NOTEOL,  PCRE_NOTEMPTY,  and
       PCRE_NO_AUTO_CAPTURE  options  for  pcre_exec()  have no
       Perl equivalents.

       (g) The callout facility is PCRE-specific.

       (h) The partial matching facility is PCRE-specific.

       (i) Patterns compiled by PCRE can be saved  and  re-used
       at  a  later time, even on different hosts that have the
       other endianness.

       (j) The alternative matching function  (pcre_dfa_exec())
       matches in a different way and is not Perl-compatible.

Last updated: 28 November 2006
Copyright (c) 1997-2006 University of Cambridge.



                                                                 PCRECOMPAT(3)
