PCREAPI(3)                                                          PCREAPI(3)



NAME
       PCRE - Perl-compatible regular expressions

PCRE NATIVE API

       #include <pcre.h>

       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre *pcre_compile2(const char *pattern, int options,
            int *errorcodeptr,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre_extra *pcre_study(const pcre *code, int options,
            const char **errptr);

       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       int pcre_dfa_exec(const  pcre  *code,  const  pcre_extra
       *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize,
            int *workspace, int wscount);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int  pcre_copy_substring(const char *subject, int *ovec-
       tor,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int pcre_get_stringtable_entries(const pcre *code,
            const char *name, char **first, char **last);

       int pcre_get_substring(const char *subject,  int  *ovec-
       tor,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int   *ovector,   int   stringcount,   const   char
       ***listptr);

       void pcre_free_substring(const char *stringptr);

       void pcre_free_substring_list(const char **stringptr);

       const unsigned char *pcre_maketables(void);

       int pcre_fullinfo(const  pcre  *code,  const  pcre_extra
       *extra,
            int what, void *where);

       int   pcre_info(const   pcre  *code,  int  *optptr,  int
       *firstcharptr);

       int pcre_refcount(pcre *code, int adjust);

       int pcre_config(int what, void *where);

       char *pcre_version(void);

       void *(*pcre_malloc)(size_t);

       void (*pcre_free)(void *);

       void *(*pcre_stack_malloc)(size_t);

       void (*pcre_stack_free)(void *);

       int (*pcre_callout)(pcre_callout_block *);

PCRE API OVERVIEW

       PCRE has its own native API, which is described in  this
       document.  There  are  also  some wrapper functions that
       correspond to the POSIX regular  expression  API.  These
       are  described  in  the pcreposix documentation. Both of
       these APIs define a set of C function calls. A C++ wrap-
       per  is  distributed  with PCRE. It is documented in the
       pcrecpp page.

       The native API C function prototypes are defined in  the
       header  file  pcre.h,  and  on  Unix systems the library
       itself is called libpcre.  It can normally  be  accessed
       by  adding -lpcre to the command for linking an applica-
       tion that uses PCRE. The header file defines the  macros
       PCRE_MAJOR and PCRE_MINOR to contain the major and minor
       release numbers for the library.  Applications  can  use
       these to include support for different releases of PCRE.

       The    functions    pcre_compile(),     pcre_compile2(),
       pcre_study(), and pcre_exec() are used for compiling and
       matching regular expressions in a  Perl-compatible  man-
       ner. A sample program that demonstrates the simplest way
       of using them is provided in the file called  pcredemo.c
       in the source distribution. The pcresample documentation
       describes how to run it.

       A second matching function,  pcre_dfa_exec(),  which  is
       not  Perl-compatible, is also provided. This uses a dif-
       ferent algorithm for the matching. The alternative algo-
       rithm  finds  all  possible matches (at a given point in
       the subject), and scans the subject just once.  However,
       this  algorithm  does  not return captured substrings. A
       description of the two  matching  algorithms  and  their
       advantages  and disadvantages is given in the pcrematch-
       ing documentation.

       In addition to the main  compiling  and  matching  func-
       tions,  there  are  convenience functions for extracting
       captured  substrings  from  a  subject  string  that  is
       matched by pcre_exec(). They are:

         pcre_copy_substring()
         pcre_copy_named_substring()
         pcre_get_substring()
         pcre_get_named_substring()
         pcre_get_substring_list()
         pcre_get_stringnumber()
         pcre_get_stringtable_entries()

       pcre_free_substring() and pcre_free_substring_list() are
       also provided, to free the  memory  used  for  extracted
       strings.

       The function pcre_maketables() is used to build a set of
       character tables in the current locale  for  passing  to
       pcre_compile(), pcre_exec(), or pcre_dfa_exec(). This is
       an optional facility that  is  provided  for  specialist
       use.  Most  commonly,  no  special tables are passed, in
       which case internal tables that are generated when  PCRE
       is built are used.

       The  function pcre_fullinfo() is used to find out infor-
       mation about a compiled pattern; pcre_info() is an obso-
       lete  version  that  returns  only some of the available
       information, but is retained for  backwards  compatibil-
       ity.  The function pcre_version() returns a pointer to a
       string containing the version of PCRE and  its  date  of
       release.

       The function pcre_refcount() maintains a reference count
       in a data block containing a compiled pattern.  This  is
       provided  for  the  benefit  of object-oriented applica-
       tions.

       The global variables pcre_malloc and pcre_free initially
       contain  the  entry  points of the standard malloc() and
       free() functions, respectively. PCRE  calls  the  memory
       management  functions  via these variables, so a calling
       program can replace them if it wishes to  intercept  the
       calls. This should be done before calling any PCRE func-
       tions.

       The    global    variables     pcre_stack_malloc     and
       pcre_stack_free  are also indirections to memory manage-
       ment functions. These special functions  are  used  only
       when  PCRE  is  compiled to use the heap for remembering
       data, instead of recursive function calls, when  running
       the  pcre_exec()  function. See the pcrebuild documenta-
       tion for details of how to do this. It is a non-standard
       way  of building PCRE, for use in environments that have
       limited stacks. Because of the  greater  use  of  memory
       management,  it runs more slowly. Separate functions are
       provided so that special-purpose external  code  can  be
       used  for  this  case.  When  used,  these functions are
       always called in a  stack-like  manner  (last  obtained,
       first  freed),  and always for memory blocks of the same
       size. There is a discussion about PCRE's stack usage  in
       the pcrestack documentation.

       The  global  variable  pcre_callout  initially  contains
       NULL. It can be set by the caller to a  "callout"  func-
       tion, which PCRE will then call at specified points dur-
       ing a matching operation. Details are given in the pcre-
       callout documentation.

NEWLINES

       PCRE  supports four different conventions for indicating
       line breaks in strings: a single  CR  (carriage  return)
       character,  a  single  LF (linefeed) character, the two-
       character  sequence  CRLF,  or   any   Unicode   newline
       sequence.   The  Unicode newline sequences are the three
       just mentioned, plus the single characters VT  (vertical
       tab,  U+000B),  FF  (formfeed,  U+000C), NEL (next line,
       U+0085), LS (line separator, U+2028), and PS  (paragraph
       separator, U+2029).

       Each  of the first three conventions is used by at least
       one operating system as its standard  newline  sequence.
       When  PCRE  is  built,  a default can be specified.  The
       default default is LF, which is the Unix standard.  When
       PCRE  is run, the default can be overridden, either when
       a pattern is compiled, or when it is matched.

       In the PCRE documentation the word "newline" is used  to
       mean  "the character or pair of characters that indicate
       a line break". The choice of newline convention  affects
       the   handling   of  the  dot,  circumflex,  and  dollar
       metacharacters, the handling of #-comments in  /x  mode,
       and, when CRLF is a recognized line ending sequence, the
       match position advancement for a  non-anchored  pattern.
       The  choice  of  newline  convention does not affect the
       interpretation of the \n or \r escape sequences.

MULTITHREADING

       The PCRE functions can be used in multi-threading appli-
       cations,  with  the  proviso  that the memory management
       functions  pointed   to   by   pcre_malloc,   pcre_free,
       pcre_stack_malloc,  and pcre_stack_free, and the callout
       function pointed to by pcre_callout, are shared  by  all
       threads.

       The compiled form of a regular expression is not altered
       during matching, so the same compiled pattern can safely
       be used by several threads at once.

SAVING PRECOMPILED PATTERNS FOR LATER USE

       The  compiled  form of a regular expression can be saved
       and re-used at a later time,  possibly  by  a  different
       program,  and even on a host other than the one on which
       it was compiled. Details are given in the pcreprecompile
       documentation.

CHECKING BUILD-TIME OPTIONS

       int pcre_config(int what, void *where);

       The  function pcre_config() makes it possible for a PCRE
       client to discover which  optional  features  have  been
       compiled into the PCRE library. The pcrebuild documenta-
       tion has more details about these optional features.

       The first argument  for  pcre_config()  is  an  integer,
       specifying  which  information  is  required; the second
       argument is a pointer  to  a  variable  into  which  the
       information  is  placed.  The  following  information is
       available:

         PCRE_CONFIG_UTF8

       The output is an integer that is set  to  one  if  UTF-8
       support is available; otherwise it is set to zero.

         PCRE_CONFIG_UNICODE_PROPERTIES

       The  output  is an integer that is set to one if support
       for Unicode character properties is available; otherwise
       it is set to zero.

         PCRE_CONFIG_NEWLINE

       The  output  is  an  integer  whose  value specifies the
       default character sequence that is recognized as meaning
       "newline".  The  four  values that are supported are: 10
       for LF, 13 for CR, 3338 for CRLF, and -1  for  ANY.  The
       default  should  normally  be  the standard sequence for
       your operating system.

         PCRE_CONFIG_LINK_SIZE

       The output is an integer that  contains  the  number  of
       bytes  used  for  internal  linkage  in compiled regular
       expressions. The value is 2,  3,  or  4.  Larger  values
       allow  larger regular expressions to be compiled, at the
       expense of slower matching. The default value  of  2  is
       sufficient  for all but the most massive patterns, since
       it allows the compiled pattern to be up to 64K in  size.

         PCRE_CONFIG_POSIX_MALLOC_THRESHOLD

       The  output  is  an  integer that contains the threshold
       above which the POSIX interface uses malloc() for output
       vectors. Further details are given in the pcreposix doc-
       umentation.

         PCRE_CONFIG_MATCH_LIMIT

       The output is an integer that gives  the  default  limit
       for  the number of internal matching function calls in a
       pcre_exec() execution. Further details  are  given  with
       pcre_exec() below.

         PCRE_CONFIG_MATCH_LIMIT_RECURSION

       The  output  is  an integer that gives the default limit
       for the depth of recursion  when  calling  the  internal
       matching  function  in  a pcre_exec() execution. Further
       details are given with pcre_exec() below.

         PCRE_CONFIG_STACKRECURSE

       The output is an integer that is set to one if  internal
       recursion  when  running  pcre_exec()  is implemented by
       recursive function calls that use the stack to  remember
       their  state.  This  is  the usual way that PCRE is com-
       piled. The output is zero if PCRE was  compiled  to  use
       blocks of data on the heap instead of recursive function
       calls.   In    this    case,    pcre_stack_malloc    and
       pcre_stack_free  are  called  to manage memory blocks on
       the heap, thus avoiding the use of the stack.

COMPILING A PATTERN

       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       pcre *pcre_compile2(const char *pattern, int options,
            int *errorcodeptr,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       Either of  the  functions  pcre_compile()  or  pcre_com-
       pile2()  can  be  called  to  compile  a pattern into an
       internal form.  The  only  difference  between  the  two
       interfaces  is  that  pcre_compile2()  has an additional
       argument, errorcodeptr, via which a numerical error code
       can be returned.

       The  pattern  is a C string terminated by a binary zero,
       and is passed in the pattern argument. A  pointer  to  a
       single  block of memory that is obtained via pcre_malloc
       is returned. This contains the compiled code and related
       data.  The  pcre type is defined for the returned block;
       this is a typedef for a structure whose contents are not
       externally  defined.  It is up to the caller to free the
       memory (via pcre_free) when it is no longer required.

       Although the compiled code of a PCRE regex  is  relocat-
       able,  that  is,  it does not depend on memory location,
       the complete pcre data block is not  fully  relocatable,
       because  it may contain a copy of the tableptr argument,
       which is an address (see below).

       The options argument contains various bit settings  that
       affect  the compilation. It should be zero if no options
       are required. The available options are described below.
       Some  of  them, in particular, those that are compatible
       with Perl, can also be set and  unset  from  within  the
       pattern (see the detailed description in the pcrepattern
       documentation). For these options, the contents  of  the
       options argument specifies their initial settings at the
       start of compilation and  execution.  The  PCRE_ANCHORED
       and  PCRE_NEWLINE_xxx  options can be set at the time of
       matching as well as at compile time.

       If errptr is NULL, pcre_compile() returns  NULL  immedi-
       ately.   Otherwise,  if  compilation of a pattern fails,
       pcre_compile()  returns  NULL,  and  sets  the  variable
       pointed  to  by  errptr to point to a textual error mes-
       sage. This is a  static  string  that  is  part  of  the
       library.  You  must  not try to free it. The offset from
       the start of the pattern  to  the  character  where  the
       error  was  discovered is placed in the variable pointed
       to by erroffset, which must not be NULL. If  it  is,  an
       immediate error is given.

       If  pcre_compile2()  is  used instead of pcre_compile(),
       and the errorcodeptr argument is not  NULL,  a  non-zero
       error  code  number is returned via this argument in the
       event of an error. This is in addition  to  the  textual
       error  message.  Error  codes  and  messages  are listed
       below.

       If the final argument, tableptr, is NULL,  PCRE  uses  a
       default set of character tables that are built when PCRE
       is compiled, using  the  default  C  locale.  Otherwise,
       tableptr must be an address that is the result of a call
       to pcre_maketables(). This value is stored with the com-
       piled  pattern,  and  used  again by pcre_exec(), unless
       another table pointer is passed to it. For more  discus-
       sion, see the section on locale support below.

       This  code fragment shows a typical straightforward call
       to pcre_compile():

         pcre *re;
         const char *error;
         int erroffset;
         re = pcre_compile(
           "^A.*Z",          /* the pattern */
           0,                /* default options */
           &error,           /* for error message */
           &erroffset,       /* for error offset */
           NULL);            /* use default character tables */

       The  following  names for option bits are defined in the
       pcre.h header file:

         PCRE_ANCHORED

       If this  bit  is  set,  the  pattern  is  forced  to  be
       "anchored",  that is, it is constrained to match only at
       the first matching point in the  string  that  is  being
       searched (the "subject string"). This effect can also be
       achieved  by  appropriate  constructs  in  the   pattern
       itself, which is the only way to do it in Perl.

         PCRE_AUTO_CALLOUT

       If this bit is set, pcre_compile() automatically inserts
       callout items, all with number 255, before each  pattern
       item.  For  discussion  of the callout facility, see the
       pcrecallout documentation.

         PCRE_CASELESS

       If this bit is set, letters in the  pattern  match  both
       upper and lower case letters. It is equivalent to Perl's
       /i option, and it can be changed within a pattern  by  a
       (?i)  option  setting. In UTF-8 mode, PCRE always under-
       stands the concept of case for characters  whose  values
       are less than 128, so caseless matching is always possi-
       ble. For characters with higher values, the  concept  of
       case is supported if PCRE is compiled with Unicode prop-
       erty support, but not otherwise.  If  you  want  to  use
       caseless matching for characters 128 and above, you must
       ensure that PCRE is compiled with Unicode property  sup-
       port as well as with UTF-8 support.

         PCRE_DOLLAR_ENDONLY

       If  this  bit is set, a dollar metacharacter in the pat-
       tern matches only at the  end  of  the  subject  string.
       Without  this  option, a dollar also matches immediately
       before a newline at the  end  of  the  string  (but  not
       before  any  other  newlines).  The  PCRE_DOLLAR_ENDONLY
       option is ignored if PCRE_MULTILINE is set.  There is no
       equivalent  to this option in Perl, and no way to set it
       within a pattern.

         PCRE_DOTALL

       If this bit is set, a dot metacharater  in  the  pattern
       matches  all  characters,  including those that indicate
       newline. Without it, a dot does not match when the  cur-
       rent position is at a newline. This option is equivalent
       to Perl's /s option, and it can be changed within a pat-
       tern  by a (?s) option setting. A negative class such as
       [^a] always matches newline characters,  independent  of
       the setting of this option.

         PCRE_DUPNAMES

       If  this  bit  is  set, names used to identify capturing
       subpatterns need not be unique. This can be helpful  for
       certain  types of pattern when it is known that only one
       instance of the named subpattern can  ever  be  matched.
       There  are  more details of named subpatterns below; see
       also the pcrepattern documentation.

         PCRE_EXTENDED

       If this bit is set, whitespace data  characters  in  the
       pattern  are  totally  ignored  except  when  escaped or
       inside a character class. Whitespace  does  not  include
       the  VT  character  (code  11).  In addition, characters
       between an unescaped # outside a character class and the
       next  newline,  inclusive,  are  also  ignored.  This is
       equivalent to Perl's /x option, and it  can  be  changed
       within a pattern by a (?x) option setting.

       This option makes it possible to include comments inside
       complicated patterns.  Note, however, that this  applies
       only to data characters. Whitespace characters may never
       appear within special character sequences in a  pattern,
       for  example  within the sequence (?( which introduces a
       conditional subpattern.

         PCRE_EXTRA

       This option was invented in order to turn on  additional
       functionality  of  PCRE  that is incompatible with Perl,
       but it is currently of very little use.  When  set,  any
       backslash in a pattern that is followed by a letter that
       has no special meaning causes an error,  thus  reserving
       these  combinations for future expansion. By default, as
       in Perl, a backslash followed by a letter with  no  spe-
       cial  meaning  is  treated as a literal. (Perl can, how-
       ever, be persuaded to give a warning  for  this.)  There
       are  at  present  no  other  features controlled by this
       option. It can also be set  by  a  (?X)  option  setting
       within a pattern.

         PCRE_FIRSTLINE

       If this option is set, an unanchored pattern is required
       to match before or at the first newline in  the  subject
       string,  though  the  matched text may continue over the
       newline.

         PCRE_MULTILINE

       By default, PCRE treats the subject string as consisting
       of a single line of characters (even if it actually con-
       tains newlines). The "start of line"  metacharacter  (^)
       matches  only at the start of the string, while the "end
       of line" metacharacter ($) matches only at  the  end  of
       the  string,  or  before  a  terminating newline (unless
       PCRE_DOLLAR_ENDONLY is set). This is the same as Perl.

       When PCRE_MULTILINE it is set, the "start of  line"  and
       "end  of line" constructs match immediately following or
       immediately before  internal  newlines  in  the  subject
       string,  respectively,  as well as at the very start and
       end. This is equivalent to Perl's /m option, and it  can
       be changed within a pattern by a (?m) option setting. If
       there are no newlines in a subject string, or no  occur-
       rences  of  ^  or $ in a pattern, setting PCRE_MULTILINE
       has no effect.

         PCRE_NEWLINE_CR
         PCRE_NEWLINE_LF
         PCRE_NEWLINE_CRLF
         PCRE_NEWLINE_ANY

       These options override the  default  newline  definition
       that  was  chosen when PCRE was built. Setting the first
       or the second specifies that a newline is indicated by a
       single  character  (CR  or  LF,  respectively).  Setting
       PCRE_NEWLINE_CRLF specifies that a newline is  indicated
       by  the  two-character  CRLF sequence. Setting PCRE_NEW-
       LINE_ANY specifies that  any  Unicode  newline  sequence
       should  be recognized. The Unicode newline sequences are
       the three just mentioned, plus the single characters  VT
       (vertical tab, U+000B), FF (formfeed, U+000C), NEL (next
       line, U+0085),  LS  (line  separator,  U+2028),  and  PS
       (paragraph  separator,  U+2029). The last two are recog-
       nized only in UTF-8 mode.

       The newline setting in the options word uses three  bits
       that  are  treated  as a number, giving eight possibili-
       ties. Currently only five are  used  (default  plus  the
       four values above). This means that if you set more than
       one newline option, the combination may or  may  not  be
       sensible.  For  example,  PCRE_NEWLINE_CR with PCRE_NEW-
       LINE_LF is equivalent to  PCRE_NEWLINE_CRLF,  but  other
       combinations yield unused numbers and cause an error.

       The  only time that a line break is specially recognized
       when compiling a pattern is if PCRE_EXTENDED is set, and
       an unescaped # outside a character class is encountered.
       This indicates a comment that lasts until after the next
       line  break sequence. In other circumstances, line break
       sequences are treated as literal data,  except  that  in
       PCRE_EXTENDED  mode,  both  CR  and  LF  are  treated as
       whitespace characters and are therefore ignored.

       The newline option that is set at compile  time  becomes
       the   default   that   is   used   for  pcre_exec()  and
       pcre_dfa_exec(), but it can be overridden.

         PCRE_NO_AUTO_CAPTURE

       If this option is set, it disables the use  of  numbered
       capturing parentheses in the pattern. Any opening paren-
       thesis that is not followed by ? behaves as if  it  were
       followed  by  ?: but named parentheses can still be used
       for capturing (and they acquire  numbers  in  the  usual
       way). There is no equivalent of this option in Perl.

         PCRE_UNGREEDY

       This  option inverts the "greediness" of the quantifiers
       so that they are  not  greedy  by  default,  but  become
       greedy  if  followed  by  "?". It is not compatible with
       Perl. It can also be set by a (?U) option setting within
       the pattern.

         PCRE_UTF8

       This  option  causes PCRE to regard both the pattern and
       the subject as strings of UTF-8  characters  instead  of
       single-byte  character strings. However, it is available
       only when PCRE is built to  include  UTF-8  support.  If
       not,  the  use of this option provokes an error. Details
       of how this option changes the  behaviour  of  PCRE  are
       given  in  the section on UTF-8 support in the main pcre
       page.

         PCRE_NO_UTF8_CHECK

       When PCRE_UTF8 is set, the validity of the pattern as  a
       UTF-8  string  is  automatically  checked. If an invalid
       UTF-8 sequence of bytes is found, pcre_compile() returns
       an  error.  If  you  already  know  that your pattern is
       valid, and you want to skip this check  for  performance
       reasons, you can set the PCRE_NO_UTF8_CHECK option. When
       it is set, the effect of passing an invalid UTF-8 string
       as  a pattern is undefined. It may cause your program to
       crash.  Note that this option  can  also  be  passed  to
       pcre_exec()  and  pcre_dfa_exec(), to suppress the UTF-8
       validity checking of subject strings.

COMPILATION ERROR CODES

       The following table lists the error codes  than  may  be
       returned  by  pcre_compile2(), along with the error mes-
       sages that may be returned by both compiling  functions.
       As  PCRE has developed, some error codes have fallen out
       of use. To avoid confusion, they have not been  re-used.

          0  no error
          1  \ at end of pattern
          2  \c at end of pattern
          3  unrecognized character follows \
          4  numbers out of order in {} quantifier
          5  number too big in {} quantifier
          6  missing terminating ] for character class
          7  invalid escape sequence in character class
          8  range out of order in character class
          9  nothing to repeat
         10  [this code is not in use]
         11  internal error: unexpected repeat
         12  unrecognized character after (?
         13   POSIX  named  classes are supported only within a
       class
         14  missing )
         15  reference to non-existent subpattern
         16  erroffset passed as NULL
         17  unknown option bit(s) set
         18  missing ) after comment
         19  [this code is not in use]
         20  regular expression too large
         21  failed to get memory
         22  unmatched parentheses
         23  internal error: code overflow
         24  unrecognized character after (?<
         25  lookbehind assertion is not fixed length
         26  malformed number or name after (?(
         27  conditional group contains more than two branches
         28  assertion expected after (?(
         29  (?R or (?digits must be followed by )
         30  unknown POSIX class name
         31  POSIX collating elements are not supported
         32   this  version  of  PCRE  is  not  compiled   with
       PCRE_UTF8 support
         33  [this code is not in use]
         34  character value in \x{...} sequence is too large
         35  invalid condition (?(0)
         36  \C not allowed in lookbehind assertion
         37  PCRE does not support \L, \l, \N, \U, or \u
         38  number after (?C is > 255
         39  closing ) for (?C expected
         40  recursive call could loop indefinitely
         41  unrecognized character after (?P
         42   syntax error in subpattern name (missing termina-
       tor)
         43  two named subpatterns have the same name
         44  invalid UTF-8 string
         45  support for \P, \p, and \X has not been compiled
         46  malformed \P or \p sequence
         47  unknown property name after \P or \p
         48  subpattern name is too long  (maximum  32  charac-
       ters)
         49  too many named subpatterns (maximum 10,000)
         50  repeated subpattern is too long
         51   octal  value  is  greater than \377 (not in UTF-8
       mode)
         52  internal error: overran compiling workspace
         53  internal error: previously-checked referenced sub-
       pattern not found
         54  DEFINE group contains more than one branch
         55  repeating a DEFINE group is not allowed
         56  inconsistent NEWLINE options"

STUDYING A PATTERN

       pcre_extra *pcre_study(const pcre *code, int options
            const char **errptr);

       If a compiled pattern is going to be used several times,
       it is worth spending more time analyzing it in order  to
       speed  up  the  time  taken  for  matching. The function
       pcre_study() takes a pointer to a  compiled  pattern  as
       its  first  argument.  If  studying the pattern produces
       additional information that will help speed up matching,
       pcre_study() returns a pointer to a pcre_extra block, in
       which the study_data field points to the results of  the
       study.

       The  returned  value  from  pcre_study()  can  be passed
       directly to pcre_exec().  However,  a  pcre_extra  block
       also contains other fields that can be set by the caller
       before the block is passed; these are described below in
       the section on matching a pattern.

       If  studying the pattern does not produce any additional
       information pcre_study() returns NULL. In  that  circum-
       stance,  if the calling program wants to pass any of the
       other fields to pcre_exec(), it  must  set  up  its  own
       pcre_extra block.

       The  second  argument  of  pcre_study()  contains option
       bits. At present, no options are defined, and this argu-
       ment should always be zero.

       The  third argument for pcre_study() is a pointer for an
       error message. If studying succeeds (even if no data  is
       returned),  the  variable  it  points to is set to NULL.
       Otherwise it is set to point to a textual error message.
       This is a static string that is part of the library. You
       must not try to free  it.  You  should  test  the  error
       pointer  for NULL after calling pcre_study(), to be sure
       that it has run successfully.

       This is a typical call to pcre_study():

         pcre_extra *pe;
         pe = pcre_study(
           re,             /* result of pcre_compile() */
           0,              /* no options exist */
           &error);        /* set to NULL or points to  a  mes-
       sage */

       At  present,  studying a pattern is useful only for non-
       anchored patterns that do not have a single fixed start-
       ing  character.  A  bitmap of possible starting bytes is
       created.

LOCALE SUPPORT

       PCRE handles caseless matching, and  determines  whether
       characters are letters digits, or whatever, by reference
       to a set of tables, indexed  by  character  value.  When
       running  in  UTF-8 mode, this applies only to characters
       with codes less  than  128.  Higher-valued  codes  never
       match  escapes  such as \w or \d, but can be tested with
       \p if PCRE is built with Unicode character property sup-
       port. The use of locales with Unicode is discouraged.

       An  internal  set  of tables is created in the default C
       locale when PCRE is built. This is used when  the  final
       argument  of  pcre_compile()  is NULL, and is sufficient
       for many applications. An alternative set of tables can,
       however,  be supplied. These may be created in a differ-
       ent locale from the default. As more and  more  applica-
       tions  change to using Unicode, the need for this locale
       support is expected to die away.

       External tables are built by  calling  the  pcre_maketa-
       bles() function, which has no arguments, in the relevant
       locale. The result can then be passed to  pcre_compile()
       or  pcre_exec()  as  often as necessary. For example, to
       build and use tables that are appropriate for the French
       locale  (where  accented  characters with values greater
       than 128 are treated as  letters),  the  following  code
       could be used:

         setlocale(LC_CTYPE, "fr_FR");
         tables = pcre_maketables();
         re = pcre_compile(..., tables);

       When  pcre_maketables()  runs,  the  tables are built in
       memory that is  obtained  via  pcre_malloc.  It  is  the
       caller's  responsibility  to ensure that the memory con-
       taining the tables remains available for as long  as  it
       is needed.

       The  pointer  that  is passed to pcre_compile() is saved
       with the compiled pattern, and the same tables are  used
       via  this  pointer  by pcre_study() and normally also by
       pcre_exec(). Thus, by default, for any  single  pattern,
       compilation,  studying  and  matching  all happen in the
       same locale, but different patterns can be  compiled  in
       different locales.

       It is possible to pass a table pointer or NULL (indicat-
       ing the use of  the  internal  tables)  to  pcre_exec().
       Although  not  intended  for this purpose, this facility
       could be used to match a pattern in a  different  locale
       from  the  one  in  which it was compiled. Passing table
       pointers at run time is discussed below in  the  section
       on matching a pattern.

INFORMATION ABOUT A PATTERN

       int  pcre_fullinfo(const  pcre  *code,  const pcre_extra
       *extra,
            int what, void *where);

       The pcre_fullinfo() function returns information about a
       compiled  pattern.  It replaces the obsolete pcre_info()
       function, which is nevertheless retained  for  backwards
       compability (and is documented below).

       The  first  argument for pcre_fullinfo() is a pointer to
       the compiled pattern. The second argument is the  result
       of pcre_study(), or NULL if the pattern was not studied.
       The third argument specifies which piece of  information
       is  required,  and the fourth argument is a pointer to a
       variable to receive the data. The yield of the  function
       is  zero  for  success, or one of the following negative
       numbers:

         PCRE_ERROR_NULL       the argument code was NULL
                               the argument where was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found
         PCRE_ERROR_BADOPTION  the value of what was invalid

       The  "magic  number" is placed at the start of each com-
       piled pattern as an  simple  check  against  passing  an
       arbitrary  memory  pointer.  Here  is  a typical call of
       pcre_fullinfo(), to obtain the length  of  the  compiled
       pattern:

         int rc;
         size_t length;
         rc = pcre_fullinfo(
           re,               /* result of pcre_compile() */
           pe,               /* result of pcre_study(), or NULL
       */
           PCRE_INFO_SIZE,   /* what is required */
           &length);         /* where to put the data */

       The possible values for the third argument  are  defined
       in pcre.h, and are as follows:

         PCRE_INFO_BACKREFMAX

       Return  the  number of the highest back reference in the
       pattern. The fourth argument  should  point  to  an  int
       variable.  Zero  is returned if there are no back refer-
       ences.

         PCRE_INFO_CAPTURECOUNT

       Return the number of capturing subpatterns in  the  pat-
       tern.  The  fourth argument should point to an int vari-
       able.

         PCRE_INFO_DEFAULT_TABLES

       Return a  pointer  to  the  internal  default  character
       tables  within PCRE. The fourth argument should point to
       an unsigned char * variable. This  information  call  is
       provided  for internal use by the pcre_study() function.
       External callers can cause  PCRE  to  use  its  internal
       tables by passing a NULL table pointer.

         PCRE_INFO_FIRSTBYTE

       Return  information  about the first byte of any matched
       string, for a non-anchored pattern. The fourth  argument
       should point to an int variable. (This option used to be
       called PCRE_INFO_FIRSTCHAR; the old name is still recog-
       nized for backwards compatibility.)

       If there is a fixed first byte, for example, from a pat-
       tern such as (cat|cow|coyote), its  value  is  returned.
       Otherwise, if either

       (a)  the  pattern  was  compiled with the PCRE_MULTILINE
       option, and every branch starts with "^", or

       (b) every branch of the pattern  starts  with  ".*"  and
       PCRE_DOTALL  is  not  set  (if  it were set, the pattern
       would be anchored),

       -1 is returned, indicating that the pattern matches only
       at  the  start  of a subject string or after any newline
       within  the  string.  Otherwise  -2  is  returned.   For
       anchored patterns, -2 is returned.

         PCRE_INFO_FIRSTTABLE

       If  the  pattern  was  studied, and this resulted in the
       construction of a 256-bit table indicating a  fixed  set
       of  bytes  for  the first byte in any matching string, a
       pointer to the table  is  returned.  Otherwise  NULL  is
       returned.   The  fourth  argument  should  point  to  an
       unsigned char * variable.

         PCRE_INFO_LASTLITERAL

       Return the value of the rightmost literal byte that must
       exist in any matched string, other than at its start, if
       such a byte  has  been  recorded.  The  fourth  argument
       should  point  to  an  int variable. If there is no such
       byte, -1 is returned. For anchored patterns, a last lit-
       eral  byte  is  recorded only if it follows something of
       variable  length.   For   example,   for   the   pattern
       /^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/
       the returned value is -1.

         PCRE_INFO_NAMECOUNT
         PCRE_INFO_NAMEENTRYSIZE
         PCRE_INFO_NAMETABLE

       PCRE supports the use of named as well as numbered  cap-
       turing parentheses. The names are just an additional way
       of identifying the parentheses, which still acquire num-
       bers.    Several    convenience    functions   such   as
       pcre_get_named_substring() are provided  for  extracting
       captured  substrings  by  name.  It  is also possible to
       extract the data directly, by first converting the  name
       to  a  number in order to access the correct pointers in
       the output vector (described with pcre_exec() below). To
       do  the  conversion,  you need to use the name-to-number
       map, which is described by these three values.

       The map consists of  a  number  of  fixed-size  entries.
       PCRE_INFO_NAMECOUNT  gives  the  number  of entries, and
       PCRE_INFO_NAMEENTRYSIZE gives the size  of  each  entry;
       both  of  these  return  an  int  value.  The entry size
       depends   on   the   length   of   the   longest   name.
       PCRE_INFO_NAMETABLE returns a pointer to the first entry
       of the table (a pointer to char). The first two bytes of
       each  entry are the number of the capturing parenthesis,
       most significant byte first. The rest of  the  entry  is
       the  corresponding  name, zero terminated. The names are
       in alphabetical order. When PCRE_DUPNAMES is set, dupli-
       cate  names  are  in order of their parentheses numbers.
       For example,  consider  the  following  pattern  (assume
       PCRE_EXTENDED  is  set,  so white space - including new-
       lines - is ignored):

         (?<date> (?<year>(\d\d)?\d\d) -
         (?<month>\d\d) - (?<day>\d\d) )

       There are four named subpatterns, so the table has  four
       entries,  and  each  entry  in  the table is eight bytes
       long. The table is as follows, with  non-printing  bytes
       shows in hexadecimal, and undefined bytes shown as ??:

         00 01 d  a  t  e  00 ??
         00 05 d  a  y  00 ?? ??
         00 04 m  o  n  t  h  00
         00 02 y  e  a  r  00 ??

       When writing code to extract data from named subpatterns
       using the name-to-number map, remember that  the  length
       of  the  entries is likely to be different for each com-
       piled pattern.

         PCRE_INFO_OPTIONS

       Return a copy of the options with which the pattern  was
       compiled.   The  fourth  argument  should  point  to  an
       unsigned long int variable. These option bits are  those
       specified in the call to pcre_compile(), modified by any
       top-level option settings within the pattern itself.

       A pattern is automatically anchored by PCRE  if  all  of
       its top-level alternatives begin with one of the follow-
       ing:

         ^     unless PCRE_MULTILINE is set
         \A    always
         \G    always
         .*    if PCRE_DOTALL is set and there are no back
                 references  to  the  subpattern  in  which  .*
       appears

       For  such  patterns, the PCRE_ANCHORED bit is set in the
       options returned by pcre_fullinfo().

         PCRE_INFO_SIZE

       Return the size of the compiled pattern,  that  is,  the
       value  that  was passed as the argument to pcre_malloc()
       when PCRE was getting memory in which to place the  com-
       piled data. The fourth argument should point to a size_t
       variable.

         PCRE_INFO_STUDYSIZE

       Return the size of the data  block  pointed  to  by  the
       study_data  field  in a pcre_extra block. That is, it is
       the value that was passed to pcre_malloc() when PCRE was
       getting  memory  into which to place the data created by
       pcre_study(). The fourth  argument  should  point  to  a
       size_t variable.

OBSOLETE INFO FUNCTION

       int   pcre_info(const   pcre  *code,  int  *optptr,  int
       *firstcharptr);

       The pcre_info() function is  now  obsolete  because  its
       interface is too restrictive to return all the available
       data about a compiled pattern. New programs  should  use
       pcre_fullinfo() instead. The yield of pcre_info() is the
       number of capturing subpatterns, or one of the following
       negative numbers:

         PCRE_ERROR_NULL       the argument code was NULL
         PCRE_ERROR_BADMAGIC   the "magic number" was not found

       If the optptr argument  is  not  NULL,  a  copy  of  the
       options with which the pattern was compiled is placed in
       the integer it points to (see PCRE_INFO_OPTIONS  above).

       If  the  pattern  is  not  anchored and the firstcharptr
       argument is not NULL, it is used to pass  back  informa-
       tion  about  the  first  character of any matched string
       (see PCRE_INFO_FIRSTBYTE above).

REFERENCE COUNTS

       int pcre_refcount(pcre *code, int adjust);

       The pcre_refcount() function is used to maintain a  ref-
       erence  count in the data block that contains a compiled
       pattern. It is provided for the benefit of  applications
       that operate in an object-oriented manner, where differ-
       ent parts of the application may be using the same  com-
       piled  pattern, but you want to free the block when they
       are all done.

       When a pattern is compiled, the reference count field is
       initialized to zero.  It is changed only by calling this
       function, whose action is to add the adjust value (which
       may  be  positive  or  negative) to it. The yield of the
       function is the new value. However,  the  value  of  the
       count  is constrained to lie between 0 and 65535, inclu-
       sive. If the new value is outside these  limits,  it  is
       forced to the appropriate limit value.

       Except  when it is zero, the reference count is not cor-
       rectly preserved if a pattern is compiled  on  one  host
       and  then transferred to a host whose byte-order is dif-
       ferent. (This seems a highly unlikely scenario.)

MATCHING A PATTERN: THE TRADITIONAL FUNCTION

       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);

       The  function  pcre_exec()  is called to match a subject
       string against a compiled pattern, which  is  passed  in
       the  code argument. If the pattern has been studied, the
       result of the study should be passed in the extra  argu-
       ment. This function is the main matching facility of the
       library, and it operates in a Perl-like manner. For spe-
       cialist  use there is also an alternative matching func-
       tion, which is described below in the section about  the
       pcre_dfa_exec() function.

       In  most  applications,  the pattern will have been com-
       piled (and optionally studied) in the same process  that
       calls  pcre_exec(). However, it is possible to save com-
       piled patterns and study data, and then use  them  later
       in  different  processes,  possibly  even  on  different
       hosts. For a discussion about this, see the  pcreprecom-
       pile documentation.

       Here is an example of a simple call to pcre_exec():

         int rc;
         int ovector[30];
         rc = pcre_exec(
           re,             /* result of pcre_compile() */
           NULL,           /* we didn't study the pattern */
           "some string",  /* the subject string */
           11,              /* the length of the subject string
       */
           0,              /* start at offset 0 in the  subject
       */
           0,              /* default options */
           ovector,         /* vector of integers for substring
       information */
           30);            /* number of elements (NOT  size  in
       bytes) */

   Extra data for pcre_exec()

       If  the  extra  argument is not NULL, it must point to a
       pcre_extra data block. The pcre_study() function returns
       such  a block (when it doesn't return NULL), but you can
       also create one for yourself, and pass additional infor-
       mation  in it. The pcre_extra block contains the follow-
       ing fields (not necessarily in this order):

         unsigned long int flags;
         void *study_data;
         unsigned long int match_limit;
         unsigned long int match_limit_recursion;
         void *callout_data;
         const unsigned char *tables;

       The flags field is a bitmap that specifies which of  the
       other fields are set. The flag bits are:

         PCRE_EXTRA_STUDY_DATA
         PCRE_EXTRA_MATCH_LIMIT
         PCRE_EXTRA_MATCH_LIMIT_RECURSION
         PCRE_EXTRA_CALLOUT_DATA
         PCRE_EXTRA_TABLES

       Other  flag  bits  should be set to zero. The study_data
       field is set in the pcre_extra block that is returned by
       pcre_study(),  together  with  the appropriate flag bit.
       You should not set this yourself, but you may add to the
       block  by setting the other fields and their correspond-
       ing flag bits.

       The match_limit field provides  a  means  of  preventing
       PCRE  from using up a vast amount of resources when run-
       ning patterns that are not going  to  match,  but  which
       have  a  very  large  number  of  possibilities in their
       search trees. The classic example is the use  of  nested
       unlimited repeats.

       Internally, PCRE uses a function called match() which it
       calls repeatedly (sometimes recursively). The limit  set
       by  match_limit  is  imposed on the number of times this
       function is called during a match, which has the  effect
       of  limiting  the  amount  of backtracking that can take
       place. For patterns that are  not  anchored,  the  count
       restarts  from  zero  for  each  position in the subject
       string.

       The default value for the limit can be set when PCRE  is
       built;  the default default is 10 million, which handles
       all but the most extreme cases.  You  can  override  the
       default  by suppling pcre_exec() with a pcre_extra block
       in which match_limit is set, and  PCRE_EXTRA_MATCH_LIMIT
       is  set  in  the  flags field. If the limit is exceeded,
       pcre_exec() returns PCRE_ERROR_MATCHLIMIT.

       The   match_limit_recursion   field   is   similar    to
       match_limit, but instead of limiting the total number of
       times that match() is called, it  limits  the  depth  of
       recursion.  The recursion depth is a smaller number than
       the total number of calls,  because  not  all  calls  to
       match()  are recursive.  This limit is of use only if it
       is set smaller than match_limit.

       Limiting the recursion depth limits the amount of  stack
       that can be used, or, when PCRE has been compiled to use
       memory on the heap instead of the stack, the  amount  of
       heap memory that can be used.

       The  default  value for match_limit_recursion can be set
       when PCRE is built; the  default  default  is  the  same
       value  as  the default for match_limit. You can override
       the default by suppling pcre_exec()  with  a  pcre_extra
       block   in   which  match_limit_recursion  is  set,  and
       PCRE_EXTRA_MATCH_LIMIT_RECURSION is  set  in  the  flags
       field.  If  the  limit  is exceeded, pcre_exec() returns
       PCRE_ERROR_RECURSIONLIMIT.

       The pcre_callout field is used in conjunction  with  the
       "callout" feature, which is described in the pcrecallout
       documentation.

       The tables field is used  to  pass  a  character  tables
       pointer to pcre_exec(); this overrides the value that is
       stored with the compiled pattern. A  non-NULL  value  is
       stored  with  the compiled pattern only if custom tables
       were supplied to pcre_compile() via its  tableptr  argu-
       ment.  If NULL is passed to pcre_exec() using this mech-
       anism, it forces PCRE's internal tables to be used. This
       facility  is  helpful  when  re-using patterns that have
       been saved after  compiling  with  an  external  set  of
       tables,  because  the external tables might be at a dif-
       ferent address  when  pcre_exec()  is  called.  See  the
       pcreprecompile  documentation for a discussion of saving
       compiled patterns for later use.

   Option bits for pcre_exec()

       The unused bits of the options argument for  pcre_exec()
       must  be  zero.  The  only  bits  that  may  be  set are
       PCRE_ANCHORED,      PCRE_NEWLINE_xxx,       PCRE_NOTBOL,
       PCRE_NOTEOL,   PCRE_NOTEMPTY,   PCRE_NO_UTF8_CHECK   and
       PCRE_PARTIAL.

         PCRE_ANCHORED

       The PCRE_ANCHORED option limits pcre_exec() to  matching
       at  the  first  matching position. If a pattern was com-
       piled with PCRE_ANCHORED, or turned out to  be  anchored
       by  virtue  of its contents, it cannot be made unachored
       at matching time.

         PCRE_NEWLINE_CR
         PCRE_NEWLINE_LF
         PCRE_NEWLINE_CRLF
         PCRE_NEWLINE_ANY

       These options override the newline definition  that  was
       chosen  or  defaulted when the pattern was compiled. For
       details, see the description  of  pcre_compile()  above.
       During   matching,   the   newline  choice  affects  the
       behaviour of the dot, circumflex, and dollar metacharac-
       ters.  It  may  also alter the way the match position is
       advanced after a match failure for  an  unanchored  pat-
       tern. When PCRE_NEWLINE_CRLF or PCRE_NEWLINE_ANY is set,
       and a match attempt fails when the current  position  is
       at  a  CRLF  sequence, the match position is advanced by
       two characters instead of one, in other words, to  after
       the CRLF.

         PCRE_NOTBOL

       This  option  specifies that first character of the sub-
       ject string is not the beginning of a line, so the  cir-
       cumflex  metacharacter  should not match before it. Set-
       ting  this  without  PCRE_MULTILINE  (at  compile  time)
       causes  circumflex  never  to match. This option affects
       only the behaviour of the circumflex  metacharacter.  It
       does not affect \A.

         PCRE_NOTEOL

       This option specifies that the end of the subject string
       is not the end of a line, so  the  dollar  metacharacter
       should  not  match  it  nor (except in multiline mode) a
       newline immediately  before  it.  Setting  this  without
       PCRE_MULTILINE  (at compile time) causes dollar never to
       match. This option affects only  the  behaviour  of  the
       dollar metacharacter. It does not affect \Z or \z.

         PCRE_NOTEMPTY

       An empty string is not considered to be a valid match if
       this option is set. If there  are  alternatives  in  the
       pattern,  they  are tried. If all the alternatives match
       the empty string, the entire match fails.  For  example,
       if the pattern

         a?b?

       is applied to a string not beginning with "a" or "b", it
       matches the empty string at the start  of  the  subject.
       With PCRE_NOTEMPTY set, this match is not valid, so PCRE
       searches further into the string for occurrences of  "a"
       or "b".

       Perl  has  no direct equivalent of PCRE_NOTEMPTY, but it
       does make a special case of a pattern match of the empty
       string  within  its split() function, and when using the
       /g modifier. It is possible to emulate Perl's  behaviour
       after  matching  a null string by first trying the match
       again  at  the  same  offset  with   PCRE_NOTEMPTY   and
       PCRE_ANCHORED,  and  then if that fails by advancing the
       starting offset (see below) and trying an ordinary match
       again.  There  is  some code that demonstrates how to do
       this in the pcredemo.c sample program.

         PCRE_NO_UTF8_CHECK

       When PCRE_UTF8 is set at compile time, the  validity  of
       the  subject  as a UTF-8 string is automatically checked
       when pcre_exec() is subsequently called.  The  value  of
       startoffset  is also checked to ensure that it points to
       the start of a UTF-8  character.  If  an  invalid  UTF-8
       sequence  of  bytes  is  found,  pcre_exec() returns the
       error PCRE_ERROR_BADUTF8.  If  startoffset  contains  an
       invalid value, PCRE_ERROR_BADUTF8_OFFSET is returned.

       If  you already know that your subject is valid, and you
       want to skip these checks for performance  reasons,  you
       can  set  the  PCRE_NO_UTF8_CHECK  option  when  calling
       pcre_exec(). You might want to do this  for  the  second
       and  subsequent  calls  to pcre_exec() if you are making
       repeated calls to find all the matches in a single  sub-
       ject  string. However, you should be sure that the value
       of startoffset points to the start of a UTF-8 character.
       When PCRE_NO_UTF8_CHECK is set, the effect of passing an
       invalid UTF-8  string  as  a  subject,  or  a  value  of
       startoffset  that does not point to the start of a UTF-8
       character, is undefined. Your program may crash.

         PCRE_PARTIAL

       This option turns on the partial  matching  feature.  If
       the  subject  string  fails to match the pattern, but at
       some point during the matching process the  end  of  the
       subject  was  reached  (that  is,  the subject partially
       matches the pattern and the failure  to  match  occurred
       only  because there were not enough subject characters),
       pcre_exec()  returns   PCRE_ERROR_PARTIAL   instead   of
       PCRE_ERROR_NOMATCH. When PCRE_PARTIAL is used, there are
       restrictions on what may appear in  the  pattern.  These
       are discussed in the pcrepartial documentation.

   The string to be matched by pcre_exec()

       The subject string is passed to pcre_exec() as a pointer
       in subject, a length in length, and a starting byte off-
       set  in startoffset. In UTF-8 mode, the byte offset must
       point to the start of a UTF-8 character. Unlike the pat-
       tern  string, the subject may contain binary zero bytes.
       When the starting offset is zero, the search for a match
       starts  at  the beginning of the subject, and this is by
       far the most common case.

       A non-zero starting offset is useful when searching  for
       another match in the same subject by calling pcre_exec()
       again after a  previous  success.   Setting  startoffset
       differs  from  just  passing over a shortened string and
       setting PCRE_NOTBOL in the case of a pattern that begins
       with  any  kind of lookbehind. For example, consider the
       pattern

         \Biss\B

       which finds occurrences of "iss" in the middle of words.
       (\B  matches only if the current position in the subject
       is not a word boundary.)  When  applied  to  the  string
       "Mississipi"  the  first  call  to pcre_exec() finds the
       first occurrence. If pcre_exec() is  called  again  with
       just  the  remainder of the subject, namely "issipi", it
       does not match, because \B is always false at the  start
       of  the  subject, which is deemed to be a word boundary.
       However, if pcre_exec()  is  passed  the  entire  string
       again,  but with startoffset set to 4, it finds the sec-
       ond occurrence of "iss"  because  it  is  able  to  look
       behind  the  starting  point to discover that it is pre-
       ceded by a letter.

       If a non-zero starting offset is passed when the pattern
       is anchored, one attempt to match at the given offset is
       made. This can only succeed  if  the  pattern  does  not
       require the match to be at the start of the subject.

   How pcre_exec() returns captured substrings

       In  general,  a pattern matches a certain portion of the
       subject, and in addition, further  substrings  from  the
       subject  may be picked out by parts of the pattern. Fol-
       lowing the usage  in  Jeffrey  Friedl's  book,  this  is
       called "capturing" in what follows, and the phrase "cap-
       turing subpattern" is used for a fragment of  a  pattern
       that  picks out a substring. PCRE supports several other
       kinds of parenthesized subpattern that do not cause sub-
       strings to be captured.

       Captured  substrings  are  returned  to the caller via a
       vector of integer offsets whose  address  is  passed  in
       ovector.  The number of elements in the vector is passed
       in ovecsize, which must be a non-negative number.  Note:
       this argument is NOT the size of ovector in bytes.

       The  first two-thirds of the vector is used to pass back
       captured substrings, each  substring  using  a  pair  of
       integers.  The  remaining third of the vector is used as
       workspace by pcre_exec() while matching  capturing  sub-
       patterns, and is not available for passing back informa-
       tion. The length passed in ovecsize should always  be  a
       multiple of three. If it is not, it is rounded down.

       When  a  match is successful, information about captured
       substrings is returned in pairs of integers, starting at
       the  beginning  of  ovector,  and  continuing up to two-
       thirds of its length at the most. The first element of a
       pair  is  set  to the offset of the first character in a
       substring, and the second is set to the  offset  of  the
       first  character after the end of a substring. The first
       pair, ovector[0] and ovector[1], identify the portion of
       the  subject  string  matched by the entire pattern. The
       next pair is used for the  first  capturing  subpattern,
       and so on. The value returned by pcre_exec() is one more
       than the highest numbered pair that has  been  set.  For
       example,  if  two  substrings  have  been  captured, the
       returned value is 3. If there are no  capturing  subpat-
       terns,  the  return  value from a successful match is 1,
       indicating that just the first pair of offsets has  been
       set.

       If  a  capturing subpattern is matched repeatedly, it is
       the last portion of the string that it matched  that  is
       returned.

       If the vector is too small to hold all the captured sub-
       string offsets, it is used as far  as  possible  (up  to
       two-thirds  of  its  length), and the function returns a
       value of zero. In particular, if the  substring  offsets
       are  not  of  interest,  pcre_exec()  may be called with
       ovector passed as NULL and ovecsize as zero. However, if
       the  pattern contains back references and the ovector is
       not big enough to remember the related substrings,  PCRE
       has  to  get  additional memory for use during matching.
       Thus it is usually advisable to supply an ovector.

       The pcre_info() function can be used  to  find  out  how
       many  capturing subpatterns there are in a compiled pat-
       tern. The smallest size for ovector that will allow  for
       n captured substrings, in addition to the offsets of the
       substring matched by the whole pattern, is (n+1)*3.

       It is possible for capturing subpattern  number  n+1  to
       match some part of the subject when subpattern n has not
       been used at all. For example, if the  string  "abc"  is
       matched  against the pattern (a|(z))(bc) the return from
       the function is 4, and subpatterns 1 and 3 are  matched,
       but 2 is not. When this happens, both values in the off-
       set pairs corresponding to unused subpatterns are set to
       -1.

       Offset  values  that correspond to unused subpatterns at
       the end of the expression are also set to -1. For  exam-
       ple,  if the string "abc" is matched against the pattern
       (abc)(x(yz)?)? subpatterns 2 and 3 are not matched.  The
       return  from the function is 2, because the highest used
       capturing subpattern number is 1. However, you can refer
       to  the  offsets for the second and third capturing sub-
       patterns if you  wish  (assuming  the  vector  is  large
       enough, of course).

       Some  convenience  functions are provided for extracting
       the captured substrings as separate strings.  These  are
       described below.

   Error return values from pcre_exec()

       If  pcre_exec() fails, it returns a negative number. The
       following are defined in the header file:

         PCRE_ERROR_NOMATCH        (-1)

       The subject string did not match the pattern.

         PCRE_ERROR_NULL           (-2)

       Either code or subject was passed as  NULL,  or  ovector
       was NULL and ovecsize was not zero.

         PCRE_ERROR_BADOPTION      (-3)

       An unrecognized bit was set in the options argument.

         PCRE_ERROR_BADMAGIC       (-4)

       PCRE  stores a 4-byte "magic number" at the start of the
       compiled code, to catch the case when  it  is  passed  a
       junk  pointer and to detect when a pattern that was com-
       piled in an environment of one endianness is run  in  an
       environment with the other endianness. This is the error
       that PCRE gives when the magic number is not present.

         PCRE_ERROR_UNKNOWN_OPCODE (-5)

       While running the pattern match,  an  unknown  item  was
       encountered in the compiled pattern. This error could be
       caused by a bug in PCRE or by overwriting  of  the  com-
       piled pattern.

         PCRE_ERROR_NOMEMORY       (-6)

       If  a  pattern contains back references, but the ovector
       that is passed to  pcre_exec()  is  not  big  enough  to
       remember the referenced substrings, PCRE gets a block of
       memory at the start of matching to use for this purpose.
       If  the  call  via  pcre_malloc()  fails,  this error is
       given. The memory is automatically freed at the  end  of
       matching.

         PCRE_ERROR_NOSUBSTRING    (-7)

       This   error   is  used  by  the  pcre_copy_substring(),
       pcre_get_substring(),   and    pcre_get_substring_list()
       functions   (see   below).   It  is  never  returned  by
       pcre_exec().

         PCRE_ERROR_MATCHLIMIT     (-8)

       The backtracking limit, as specified by the  match_limit
       field  in  a  pcre_extra  structure  (or  defaulted) was
       reached. See the description above.

         PCRE_ERROR_CALLOUT        (-9)

       This error is never generated by pcre_exec() itself.  It
       is  provided  for  use by callout functions that want to
       yield a distinctive error code. See the pcrecallout doc-
       umentation for details.

         PCRE_ERROR_BADUTF8        (-10)

       A  string  that  contains an invalid UTF-8 byte sequence
       was passed as a subject.

         PCRE_ERROR_BADUTF8_OFFSET (-11)

       The UTF-8 byte sequence that was passed as a subject was
       valid, but the value of startoffset did not point to the
       beginning of a UTF-8 character.

         PCRE_ERROR_PARTIAL        (-12)

       The subject string did not match, but it did match  par-
       tially. See the pcrepartial documentation for details of
       partial matching.

         PCRE_ERROR_BADPARTIAL     (-13)

       The PCRE_PARTIAL option was used with a compiled pattern
       containing  items  that  are  not  supported for partial
       matching. See the pcrepartial documentation for  details
       of partial matching.

         PCRE_ERROR_INTERNAL       (-14)

       An  unexpected  internal  error has occurred. This error
       could be caused by a bug in PCRE or  by  overwriting  of
       the compiled pattern.

         PCRE_ERROR_BADCOUNT       (-15)

       This  error  is given if the value of the ovecsize argu-
       ment is negative.

         PCRE_ERROR_RECURSIONLIMIT (-21)

       The  internal  recursion  limit,  as  specified  by  the
       match_limit_recursion  field  in  a pcre_extra structure
       (or defaulted) was reached. See the description above.

         PCRE_ERROR_NULLWSLIMIT    (-22)

       When a group  that  can  match  an  empty  substring  is
       repeated  with  an  unbounded  upper  limit, the subject
       position at the start of the group must  be  remembered,
       so  that a test for an empty string can be made when the
       end of the group is reached. Some workspace is  required
       for this; if it runs out, this error is given.

         PCRE_ERROR_BADNEWLINE     (-23)

       An  invalid  combination of PCRE_NEWLINE_xxx options was
       given.

       Error numbers -16 to -20 are not used by pcre_exec().

EXTRACTING CAPTURED SUBSTRINGS BY NUMBER

       int   pcre_copy_substring(const   char   *subject,   int
       *ovector,
            int stringcount, int stringnumber, char *buffer,
            int buffersize);

       int  pcre_get_substring(const  char *subject, int *ovec-
       tor,
            int stringcount, int stringnumber,
            const char **stringptr);

       int pcre_get_substring_list(const char *subject,
            int   *ovector,   int   stringcount,   const   char
       ***listptr);

       Captured  substrings  can  be accessed directly by using
       the offsets returned by pcre_exec() in ovector. For con-
       venience,     the    functions    pcre_copy_substring(),
       pcre_get_substring(), and pcre_get_substring_list()  are
       provided for extracting captured substrings as new, sep-
       arate, zero-terminated strings. These functions identify
       substrings  by  number. The next section describes func-
       tions for extracting named substrings.

       A substring that contains a  binary  zero  is  correctly
       extracted  and  has a further zero added on the end, but
       the result is not, of course, a C string.  However,  you
       can  process  such  a  string by referring to the length
       that   is   returned   by   pcre_copy_substring()    and
       pcre_get_substring().   Unfortunately,  the interface to
       pcre_get_substring_list() is not adequate  for  handling
       strings  containing binary zeros, because the end of the
       final string is not independently indicated.

       The first three arguments are the same for all three  of
       these  functions: subject is the subject string that has
       just been successfully matched, ovector is a pointer  to
       the  vector  of  integer  offsets  that  was  passed  to
       pcre_exec(), and stringcount is the number of substrings
       that were captured by the match, including the substring
       that matched the entire regular expression. This is  the
       value  returned  by  pcre_exec()  if  it is greater than
       zero. If pcre_exec() returned zero, indicating  that  it
       ran out of space in ovector, the value passed as string-
       count should be the number of  elements  in  the  vector
       divided by three.

       The  functions  pcre_copy_substring()  and pcre_get_sub-
       string() extract a single  substring,  whose  number  is
       given as stringnumber. A value of zero extracts the sub-
       string that matched the entire pattern,  whereas  higher
       values    extract    the    captured   substrings.   For
       pcre_copy_substring(), the string is placed  in  buffer,
       whose   length   is   given  by  buffersize,  while  for
       pcre_get_substring() a new block of memory  is  obtained
       via pcre_malloc, and its address is returned via string-
       ptr. The yield of the function  is  the  length  of  the
       string,  not  including  the terminating zero, or one of
       these error codes:

         PCRE_ERROR_NOMEMORY       (-6)

       The buffer was too small for  pcre_copy_substring(),  or
       the  attempt  to  get  memory  failed  for pcre_get_sub-
       string().

         PCRE_ERROR_NOSUBSTRING    (-7)

       There is no substring whose number is stringnumber.

       The  pcre_get_substring_list()  function  extracts   all
       available  substrings  and  builds a list of pointers to
       them. All this is done in a single block of memory  that
       is  obtained  via pcre_malloc. The address of the memory
       block is returned via listptr, which is also  the  start
       of  the  list of string pointers. The end of the list is
       marked by a NULL pointer. The yield of the  function  is
       zero if all went well, or the error code

         PCRE_ERROR_NOMEMORY       (-6)

       if the attempt to get the memory block failed.

       When  any  of these functions encounter a substring that
       is unset, which can  happen  when  capturing  subpattern
       number n+1 matches some part of the subject, but subpat-
       tern n has not been used at all, they  return  an  empty
       string.  This  can be distinguished from a genuine zero-
       length substring by inspecting the appropriate offset in
       ovector, which is negative for unset substrings.

       The  two convenience functions pcre_free_substring() and
       pcre_free_substring_list() can be used to free the  mem-
       ory  returned by a previous call of pcre_get_substring()
       or  pcre_get_substring_list(),  respectively.  They   do
       nothing  more  than  call  the  function  pointed  to by
       pcre_free, which of course could be called directly from
       a  C  program.  However, PCRE is used in some situations
       where it is linked via a special  interface  to  another
       programming language that cannot use pcre_free directly;
       it is for these cases that the functions are provided.

EXTRACTING CAPTURED SUBSTRINGS BY NAME

       int pcre_get_stringnumber(const pcre *code,
            const char *name);

       int pcre_copy_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            char *buffer, int buffersize);

       int pcre_get_named_substring(const pcre *code,
            const char *subject, int *ovector,
            int stringcount, const char *stringname,
            const char **stringptr);

       To extract a substring by name, you first have  to  find
       associated number.  For example, for this pattern

         (a+)b(?<xxx>\d+)...

       the  number  of the subpattern called "xxx" is 2. If the
       name is known to be unique (PCRE_DUPNAMES was not  set),
       you  can  find  the  number  from  the  name  by calling
       pcre_get_stringnumber(). The first argument is the  com-
       piled  pattern, and the second is the name. The yield of
       the   function   is   the    subpattern    number,    or
       PCRE_ERROR_NOSUBSTRING (-7) if there is no subpattern of
       that name.

       Given  the  number,  you  can  extract   the   substring
       directly,  or  use one of the functions described in the
       previous section. For convenience, there  are  also  two
       functions that do the whole job.

       Most of the arguments of pcre_copy_named_substring() and
       pcre_get_named_substring() are the same as those for the
       similarly  named  functions  that  extract by number. As
       these are described in the previous  section,  they  are
       not re-described here. There are just two differences:

       First,  instead  of a substring number, a substring name
       is given. Second, there is an extra argument,  given  at
       the  start,  which is a pointer to the compiled pattern.
       This is needed in order to gain access to  the  name-to-
       number translation table.

       These  functions call pcre_get_stringnumber(), and if it
       succeeds,  they  then  call   pcre_copy_substring()   or
       pcre_get_substring(), as appropriate.

DUPLICATE SUBPATTERN NAMES

       int pcre_get_stringtable_entries(const pcre *code,
            const char *name, char **first, char **last);

       When  a  pattern  is  compiled  with  the  PCRE_DUPNAMES
       option, names for subpatterns are  not  required  to  be
       unique. Normally, patterns with duplicate names are such
       that in any one match, only one of the named subpatterns
       participates.  An  example  is  shown in the pcrepattern
       documentation.    When    duplicates    are     present,
       pcre_copy_named_substring()    and   pcre_get_named_sub-
       string() return the first substring corresponding to the
       given name that is set. If none are set, an empty string
       is  returned.   The   pcre_get_stringnumber()   function
       returns  one of the numbers that are associated with the
       name, but it is not defined which it is.

       If you want to get full details  of  all  captured  sub-
       strings   for   a   given   name,   you   must  use  the
       pcre_get_stringtable_entries() function. The first argu-
       ment  is  the  compiled  pattern,  and the second is the
       name. The third and fourth  are  pointers  to  variables
       which  are  updated  by  the function. After it has run,
       they point to the first and last entries in the name-to-
       number  table  for  the  given name. The function itself
       returns the length of each entry,  or  PCRE_ERROR_NOSUB-
       STRING  (-7)  if there are none. The format of the table
       is described above in the section  entitled  Information
       about a pattern.  Given all the relevant entries for the
       name, you can extract each of their numbers,  and  hence
       the captured data, if any.

FINDING ALL POSSIBLE MATCHES

       The  traditional  matching function uses a similar algo-
       rithm to Perl, which  stops  when  it  finds  the  first
       match,  starting at a given point in the subject. If you
       want to find all possible matches, or the longest possi-
       ble match, consider using the alternative matching func-
       tion (see below) instead. If you cannot use the alterna-
       tive  function,  but  still  need  to  find all possible
       matches, you can kludge it up by making use of the call-
       out facility, which is described in the pcrecallout doc-
       umentation.

       What you have to do is to insert a callout right at  the
       end  of  the  pattern.   When  your  callout function is
       called, extract and save the current matched  substring.
       Then return 1, which forces pcre_exec() to backtrack and
       try other alternatives. Ultimately, when it runs out  of
       matches, pcre_exec() will yield PCRE_ERROR_NOMATCH.

MATCHING A PATTERN: THE ALTERNATIVE FUNCTION

       int  pcre_dfa_exec(const  pcre  *code,  const pcre_extra
       *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize,
            int *workspace, int wscount);

       The function pcre_dfa_exec() is called to match  a  sub-
       ject string against a compiled pattern, using a matching
       algorithm that scans the subject string just  once,  and
       does  not  backtrack. This has different characteristics
       to the normal algorithm,  and  is  not  compatible  with
       Perl. Some of the features of PCRE patterns are not sup-
       ported. Nevertheless, there are times when this kind  of
       matching  can  be  useful.  For  a discussion of the two
       matching algorithms, see the pcrematching documentation.

       The  arguments  for the pcre_dfa_exec() function are the
       same as for pcre_exec(), plus two  extras.  The  ovector
       argument  is  used  in  a  different  way,  and  this is
       described below. The other common arguments are used  in
       the same way as for pcre_exec(), so their description is
       not repeated here.

       The two additional arguments provide workspace  for  the
       function.  The  workspace vector should contain at least
       20 elements. It is used for keeping  track  of  multiple
       paths  through  the pattern tree. More workspace will be
       needed for patterns and subjects where there are  a  lot
       of potential matches.

       Here is an example of a simple call to pcre_dfa_exec():

         int rc;
         int ovector[10];
         int wspace[20];
         rc = pcre_dfa_exec(
           re,             /* result of pcre_compile() */
           NULL,           /* we didn't study the pattern */
           "some string",  /* the subject string */
           11,              /* the length of the subject string
       */
           0,              /* start at offset 0 in the  subject
       */
           0,              /* default options */
           ovector,         /* vector of integers for substring
       information */
           10,             /* number of elements (NOT  size  in
       bytes) */
           wspace,         /* working space vector */
           20);             /*  number of elements (NOT size in
       bytes) */

   Option bits for pcre_dfa_exec()

       The  unused   bits   of   the   options   argument   for
       pcre_dfa_exec()  must be zero. The only bits that may be
       set are  PCRE_ANCHORED,  PCRE_NEWLINE_xxx,  PCRE_NOTBOL,
       PCRE_NOTEOL,      PCRE_NOTEMPTY,     PCRE_NO_UTF8_CHECK,
       PCRE_PARTIAL, PCRE_DFA_SHORTEST,  and  PCRE_DFA_RESTART.
       All  but  the  last  three  of these are the same as for
       pcre_exec(), so their description is not repeated  here.

         PCRE_PARTIAL

       This  has  the  same  general  effect  as  it  does  for
       pcre_exec(), but the  details  are  slightly  different.
       When PCRE_PARTIAL is set for pcre_dfa_exec(), the return
       code    PCRE_ERROR_NOMATCH     is     converted     into
       PCRE_ERROR_PARTIAL if the end of the subject is reached,
       there have been no complete matches, but there is  still
       at  least  one  matching possibility. The portion of the
       string that provided the partial match  is  set  as  the
       first matching string.

         PCRE_DFA_SHORTEST

       Setting the PCRE_DFA_SHORTEST option causes the matching
       algorithm to stop as soon as it  has  found  one  match.
       Because of the way the alternative algorithm works, this
       is necessarily the shortest possible match at the  first
       possible matching point in the subject string.

         PCRE_DFA_RESTART

       When  pcre_dfa_exec()  is  called  with the PCRE_PARTIAL
       option, and returns a partial match, it is  possible  to
       call  it  again, with additional subject characters, and
       have   it   continue   with   the   same   match.    The
       PCRE_DFA_RESTART option requests this action; when it is
       set, the workspace and wscount  options  must  reference
       the  same  vector as before because data about the match
       so far is left in them after a partial match.  There  is
       more discussion of this facility in the pcrepartial doc-
       umentation.

   Successful returns from pcre_dfa_exec()

       When pcre_dfa_exec() succeeds, it may have matched  more
       than  one  substring in the subject. Note, however, that
       all the matches from one run of the  function  start  at
       the  same  point in the subject. The shorter matches are
       all initial substrings of the longer matches. For  exam-
       ple, if the pattern

         <.*>

       is matched against the string

         This  is  <something> <something else> <something fur-
       ther> no more

       the three matched strings are

         <something>
         <something> <something else>
         <something> <something else> <something further>

       On success, the  yield  of  the  function  is  a  number
       greater  than  zero, which is the number of matched sub-
       strings. The substrings themselves are returned in ovec-
       tor.  Each  string  uses  two elements; the first is the
       offset to the start, and the second is the offset to the
       end.  In  fact, all the strings have the same start off-
       set. (Space could have been saved by  giving  this  only
       once,  but  it  was decided to retain some compatibility
       with the way pcre_exec() returns data, even  though  the
       meaning of the strings is different.)

       The  strings  are  returned  in reverse order of length;
       that is, the longest matching string is given first.  If
       there  were  too  many  matches to fit into ovector, the
       yield of the function is zero, and the vector is  filled
       with the longest matches.

   Error returns from pcre_dfa_exec()

       The  pcre_dfa_exec()  function returns a negative number
       when it fails.  Many of the errors are the same  as  for
       pcre_exec(),  and  these are described above.  There are
       in addition the following errors that  are  specific  to
       pcre_dfa_exec():

         PCRE_ERROR_DFA_UITEM      (-16)

       This  return  is  given if pcre_dfa_exec() encounters an
       item in the  pattern  that  it  does  not  support,  for
       instance, the use of \C or a back reference.

         PCRE_ERROR_DFA_UCOND      (-17)

       This  return  is  given  if pcre_dfa_exec() encounters a
       condition item that uses a back reference for the condi-
       tion, or a test for recursion in a specific group. These
       are not supported.

         PCRE_ERROR_DFA_UMLIMIT    (-18)

       This return is given if pcre_dfa_exec() is  called  with
       an   extra   block   that  contains  a  setting  of  the
       match_limit  field.  This  is  not  supported   (it   is
       meaningless).

         PCRE_ERROR_DFA_WSSIZE     (-19)

       This  return  is  given  if  pcre_dfa_exec() runs out of
       space in the workspace vector.

         PCRE_ERROR_DFA_RECURSE    (-20)

       When a recursive subpattern is processed,  the  matching
       function calls itself recursively, using private vectors
       for ovector and workspace. This error is  given  if  the
       output  vector  is  not  large  enough.  This  should be
       extremely rare, as a vector of size 1000 is used.

SEE ALSO

       pcrebuild(3), pcrecallout(3), pcrecpp(3)(3),  pcrematch-
       ing(3), pcrepartial(3), pcreposix(3), pcreprecompile(3),
       pcresample(3), pcrestack(3).

Last updated: 30 November 2006
Copyright (c) 1997-2006 University of Cambridge.



                                                                    PCREAPI(3)
