PCRESTACK(3)                                                      PCRESTACK(3)



NAME
       PCRE - Perl-compatible regular expressions

PCRE DISCUSSION OF STACK USAGE

       When  you  call pcre_exec(), it makes use of an internal
       function called match(). This calls  itself  recursively
       at  branch  points  in the pattern, in order to remember
       the state of the match so that it can back up and try  a
       different  alternative if the first one fails. As match-
       ing proceeds deeper and deeper into the tree  of  possi-
       bilities, the recursion depth increases.

       Not  all  calls of match() increase the recursion depth;
       for an item such as a* it may be called several times at
       the same level, after matching different numbers of a's.
       Furthermore, in a number of cases where  the  result  of
       the  recursive  call would immediately be passed back as
       the result of the current call (a "tail recursion"), the
       function is just restarted instead.

       The  pcre_dfa_exec()  function  operates  in an entirely
       different way, and hardly uses  recursion  at  all.  The
       limit on its complexity is the amount of workspace it is
       given.  The  comments  that  follow  do  NOT  apply   to
       pcre_dfa_exec(); they are relevant only for pcre_exec().

       You can set limits on the number of times  that  match()
       is  called,  both in total and recursively. If the limit
       is exceeded, an error occurs. For details, see the  sec-
       tion  on extra data for pcre_exec() in the pcreapi docu-
       mentation.

       Each time that match() is actually  called  recursively,
       it uses memory from the process stack. For certain kinds
       of pattern and data, very large amounts of stack may  be
       needed,  despite  the  recognition  of "tail recursion".
       You can often reduce the amount of recursion, and there-
       fore  the amount of stack used, by modifying the pattern
       that is being matched. Consider, for example, this  pat-
       tern:

         ([^<]|<(?!inet))+

       It  matches  from wherever it starts until it encounters
       "<inet" or the end of the data, and is the kind of  pat-
       tern  that  might  be  used when processing an XML file.
       Each iteration of the outer parentheses  matches  either
       one  character that is not "<" or a "<" that is not fol-
       lowed by "inet". However, each  time  a  parenthesis  is
       processed,  a recursion occurs, so this formulation uses
       a stack frame for each matched  character.  For  a  long
       string,  a  lot  of stack is required. Consider now this
       rewritten  pattern,  which  matches  exactly  the   same
       strings:

         ([^<]++|<(?!inet))

       This  uses very much less stack, because runs of charac-
       ters that do not contain "<" are "swallowed" in one item
       inside  the  parentheses.  Recursion happens only when a
       "<" character that is not followed by "inet" is  encoun-
       tered (and we assume this is relatively rare). A posses-
       sive quantifier is used to stop  any  backtracking  into
       the  runs of non-"<" characters, but that is not related
       to stack usage.

       This example shows that one way of avoiding stack  prob-
       lems  when  matching  long  subject  strings is to write
       repeated parenthesized subpatterns to  match  more  than
       one character whenever possible.

       In  environments  where stack memory is constrained, you
       might want to compile PCRE to use heap memory instead of
       stack  for remembering back-up points. This makes it run
       a lot more slowly, however. Details of how  to  do  this
       are given in the pcrebuild documentation.

       In  Unix-like environments, there is not often a problem
       with the stack unless very long  strings  are  involved,
       though  the default limit on stack size varies from sys-
       tem to system. Values from 8Mb to 64Mb are  common.  You
       can find your default limit by running the command:

         ulimit -s

       Unfortunately,  the  effect  of  running out of stack is
       often SIGSEGV, though sometimes a  more  explicit  error
       message is given. You can normally increase the limit on
       stack size by code such as this:

         struct rlimit rlim;
         getrlimit(RLIMIT_STACK, &rlim);
         rlim.rlim_cur = 100*1024*1024;
         setrlimit(RLIMIT_STACK, &rlim);

       This reads the current  limits  (soft  and  hard)  using
       getrlimit(), then attempts to increase the soft limit to
       100Mb using setrlimit(). You must do this before calling
       pcre_exec().

       PCRE  has  an internal counter that can be used to limit
       the depth of recursion, and thus  cause  pcre_exec()  to
       give  an  error  code  before  it  runs out of stack. By
       default, the limit is very large, and unlikely  ever  to
       operate.  It  can  be changed when PCRE is built, and it
       can also be set when pcre_exec() is called. For  details
       of these interfaces, see the pcrebuild and pcreapi docu-
       mentation.

       As a very rough rule of  thumb,  you  should  reckon  on
       about  500  bytes  per  recursion.  Thus, if you want to
       limit your stack usage to 8Mb, you should set the  limit
       at  16000  recursions.  A 64Mb stack, on the other hand,
       can support around 128000 recursions. The pcretest  test
       program  has a command line option (-S) that can be used
       to increase the size of its stack.

Last updated: 14 September 2006
Copyright (c) 1997-2006 University of Cambridge.



                                                                  PCRESTACK(3)
