Sortix main manual
This manual documents Sortix main. You can instead view this document in the latest official manual.
PCREMATCHING(3) | Library Functions Manual | PCREMATCHING(3) |
NAME
PCRE - Perl-compatible regular expressionsPCRE MATCHING ALGORITHMS
This document describes the two different algorithms that are available in PCRE for matching a compiled regular expression against a given subject string. The "standard" algorithm is the one provided by the pcre_exec(), pcre16_exec() and pcre32_exec() functions. These work in the same as as Perl's matching function, and provide a Perl-compatible matching operation. The just-in-time (JIT) optimization that is described in the pcrejit documentation is compatible with these functions.^<.*>
<something> <something else> <something further>
REGULAR EXPRESSIONS AS TREES
The set of strings that are matched by a regular expression can be represented as a tree structure. An unlimited repetition in the pattern makes the tree of infinite size, but it is still a tree. Matching the pattern to a given subject string (from a given starting point) can be thought of as a search of the tree. There are two ways to search a tree: depth-first and breadth-first, and these correspond to the two matching algorithms provided by PCRE.THE STANDARD MATCHING ALGORITHM
In the terminology of Jeffrey Friedl's book "Mastering Regular Expressions", the standard algorithm is an "NFA algorithm". It conducts a depth-first search of the pattern tree. That is, it proceeds along a single path through the tree, checking that the subject matches what is required. When there is a mismatch, the algorithm tries any alternatives at the current point, and if they all fail, it backs up to the previous branch point in the tree, and tries the next alternative branch at that level. This often involves backing up (moving to the left) in the subject string as well. The order in which repetition branches are tried is controlled by the greedy or ungreedy nature of the quantifier.THE ALTERNATIVE MATCHING ALGORITHM
This algorithm conducts a breadth-first search of the tree. Starting from the first matching point in the subject, it scans the subject string from left to right, once, character by character, and as it does this, it remembers all the paths through the tree that represent valid matches. In Friedl's terminology, this is a kind of "DFA algorithm", though it is not implemented as a traditional finite state machine (it keeps multiple states active simultaneously).cat(er(pillar)?)?
^a++\w!
ADVANTAGES OF THE ALTERNATIVE ALGORITHM
Using the alternative matching algorithm provides the following advantages:DISADVANTAGES OF THE ALTERNATIVE ALGORITHM
The alternative algorithm suffers from a number of disadvantages:12 November 2013 | PCRE 8.34 |