PCREBUILD(3)                                                      PCREBUILD(3)



NAME
       PCRE - Perl-compatible regular expressions

PCRE BUILD-TIME OPTIONS

       This  document  describes  the optional features of PCRE
       that can be selected when the library is compiled.  They
       are all selected, or deselected, by providing options to
       the configure script that is run before  the  make  com-
       mand.  The complete list of options for configure (which
       includes the standard ones such as the selection of  the
       installation directory) can be obtained by running

         ./configure --help

       The  following  sections  describe certain options whose
       names begin with --enable or --disable.  These  settings
       specify  changes  to the defaults for the configure com-
       mand. Because of the way that configure works,  --enable
       and --disable always come in pairs, so the complementary
       option always exists as well, but as  it  specifies  the
       default, it is not described.

C++ SUPPORT

       By  default,  the configure script will search for a C++
       compiler and C++ header files.  If  it  finds  them,  it
       automatically  builds  the C++ wrapper library for PCRE.
       You can disable this by adding

         --disable-cpp

       to the configure command.

UTF-8 SUPPORT

       To build PCRE with support for UTF-8 character  strings,
       add

         --enable-utf8

       to  the configure command. Of itself, this does not make
       PCRE treat strings as UTF-8. As well as  compiling  PCRE
       with  this  option,  you  also  have  have  to  set  the
       PCRE_UTF8 option when you call the pcre_compile()  func-
       tion.

UNICODE CHARACTER PROPERTY SUPPORT

       UTF-8  support  allows  PCRE to process character values
       greater than 255 in the strings that it handles. On  its
       own,  however,  it  does  not provide any facilities for
       accessing the properties of such characters. If you want
       to  be  able  to use the pattern escapes \P, \p, and \X,
       which refer to Unicode character  properties,  you  must
       add

         --enable-unicode-properties

       to  the  configure  command. This implies UTF-8 support,
       even if you have not explicitly requested it.

       Including Unicode property support adds  around  90K  of
       tables  to  the PCRE library, approximately doubling its
       size. Only the general category properties  such  as  Lu
       and   Nd   are  supported.  Details  are  given  in  the
       pcrepattern documentation.

CODE VALUE OF NEWLINE

       By default, PCRE interprets character 10 (linefeed,  LF)
       as indicating the end of a line. This is the normal new-
       line character on Unix-like  systems.  You  can  compile
       PCRE  to use character 13 (carriage return, CR) instead,
       by adding

         --enable-newline-is-cr

       to the configure command. There is also a  --enable-new-
       line-is-lf  option,  which explicitly specifies linefeed
       as the newline character.

       Alternatively, you can specify that line endings are  to
       be  indicated by the two character sequence CRLF. If you
       want this, add

         --enable-newline-is-crlf

       to the configure command.  There  is  a  fourth  option,
       specified by

         --enable-newline-is-any

       which  causes  PCRE  to  recognize  any  Unicode newline
       sequence.

       Whatever line ending convention is selected when PCRE is
       built  can  be overridden when the library functions are
       called. At build time it  is  conventional  to  use  the
       standard for your operating system.

BUILDING SHARED AND STATIC LIBRARIES

       The  PCRE  building  process  uses libtool to build both
       shared and static Unix libraries  by  default.  You  can
       suppress one of these by adding one of

         --disable-shared
         --disable-static

       to the configure command, as required.

POSIX MALLOC USAGE

       When PCRE is called through the POSIX interface (see the
       pcreposix documentation), additional working storage  is
       required  for  holding  the  pointers  to capturing sub-
       strings, because PCRE requires three integers  per  sub-
       string,  whereas  the POSIX interface provides only two.
       If the number of expected substrings is small, the wrap-
       per  function  uses  space on the stack, because this is
       faster than using malloc() for each  call.  The  default
       threshold above which the stack is no longer used is 10;
       it can be changed by adding a setting such as

         --with-posix-malloc-threshold=20

       to the configure command.

HANDLING VERY LARGE PATTERNS

       Within a compiled pattern, offset  values  are  used  to
       point  from  one  part  to another (for example, from an
       opening parenthesis to an alternation metacharacter). By
       default,  two-byte  values  are  used for these offsets,
       leading to a maximum size  for  a  compiled  pattern  of
       around  64K.  This  is  sufficient to handle all but the
       most gigantic patterns.  Nevertheless,  some  people  do
       want  to process enormous patterns, so it is possible to
       compile PCRE to use three-byte or four-byte  offsets  by
       adding a setting such as

         --with-link-size=3

       to  the configure command. The value given must be 2, 3,
       or 4. Using longer offsets slows down the  operation  of
       PCRE  because  it has to load additional bytes when han-
       dling them.

       If you build PCRE with an increased link  size,  test  2
       (and  test  5 if you are using UTF-8) will fail. Part of
       the output of these tests is  a  representation  of  the
       compiled pattern, and this changes with the link size.

AVOIDING EXCESSIVE STACK USAGE

       When matching with the pcre_exec() function, PCRE imple-
       ments backtracking  by  making  recursive  calls  to  an
       internal  function called match(). In environments where
       the size of the stack  is  limited,  this  can  severely
       limit  PCRE's  operation. (The Unix environment does not
       usually suffer from this problem, but it  may  sometimes
       be  necessary to increase the maximum stack size.  There
       is a discussion  in  the  pcrestack  documentation.)  An
       alternative  approach to recursion that uses memory from
       the heap to remember data, instead  of  using  recursive
       function  calls,  has been implemented to work round the
       problem of limited stack size. If you want  to  build  a
       version of PCRE that works this way, add

         --disable-stack-for-recursion

       to  the configure command. With this configuration, PCRE
       will use the pcre_stack_malloc and pcre_stack_free vari-
       ables  to  call  memory  management  functions. Separate
       functions are provided because the usage  is  very  pre-
       dictable: the block sizes requested are always the same,
       and the blocks are always  freed  in  reverse  order.  A
       calling  program  might  be  able to implement optimized
       functions that perform better than the standard malloc()
       and  free()  functions. PCRE runs noticeably more slowly
       when built in this way. This  option  affects  only  the
       pcre_exec()  function;  it  is  not relevant for the the
       pcre_dfa_exec() function.

LIMITING PCRE RESOURCE USAGE

       Internally, PCRE has a function called match(), which it
       calls repeatedly (sometimes recursively) when matching a
       pattern with the pcre_exec()  function.  By  controlling
       the  maximum number of times this function may be called
       during a single  matching  operation,  a  limit  can  be
       placed  on  the  resources  used  by  a  single  call to
       pcre_exec(). The limit can be changed at  run  time,  as
       described  in  the pcreapi documentation. The default is
       10 million, but this can be changed by adding a  setting
       such as

         --with-match-limit=500000

       to  the configure command. This setting has no effect on
       the pcre_dfa_exec() matching function.

       In some environments it is desirable to limit the  depth
       of  recursive  calls  of  match() more strictly than the
       total number of calls, in order to restrict the  maximum
       amount  of stack (or heap, if --disable-stack-for-recur-
       sion is specified) that is used. A second limit controls
       this;  it  defaults to the value that is set for --with-
       match-limit, which imposes  no  additional  constraints.
       However,  you can set a lower limit by adding, for exam-
       ple,

         --with-match-limit-recursion=10000

       to the configure command. This value can also  be  over-
       ridden at run time.

USING EBCDIC CODE

       PCRE  assumes by default that it will run in an environ-
       ment where the character  code  is  ASCII  (or  Unicode,
       which  is  a  superset  of ASCII). PCRE can, however, be
       compiled to run in an EBCDIC environment by adding

         --enable-ebcdic

       to the configure command.

SEE ALSO

       pcreapi(3), pcre_config(3).

Last updated: 30 November 2006
Copyright (c) 1997-2006 University of Cambridge.



                                                                  PCREBUILD(3)
