Undergraduate Honors Thesis
Pairing Strings to Their Most Specific Regular Expression Match
Author: Don Miner
Advisor: Richard Chang
Abstract:
An efficient method is given that solves the problem of mapping a list of strings to their most specific
matches from a list of Perl compatible regular expressions (regexps). The assigning of context labels to a file
system, using a list of regular expressions as way to determine them, done by the setfiles program in an SELinux
system, is the inspiration for this problem and is used as the sample dataset on which tests are performed. Graphs
are generated to describe set relationships between pairs of regexps, specifically: subset, superset and disjoint.
These graphs are used to infer regular expression matches so that the actual match does not have to be done.
Also, they are used to determine specificity between regular expressions. Through experimental testing, this
algorithm is faster than the simplistic approach of matching each regular expression to each string, while also
correctly assigning the most specific match.
Paper
Python script implementation