Undergraduate Honors Thesis

Pairing Strings to Their Most Specific Regular Expression Match
Author: Don Miner
Advisor: Richard Chang
Abstract:
An efficient method is given that solves the problem of mapping a list of strings to their most specific matches from a list of Perl compatible regular expressions (regexps). The assigning of context labels to a file system, using a list of regular expressions as way to determine them, done by the setfiles program in an SELinux system, is the inspiration for this problem and is used as the sample dataset on which tests are performed. Graphs are generated to describe set relationships between pairs of regexps, specifically: subset, superset and disjoint. These graphs are used to infer regular expression matches so that the actual match does not have to be done. Also, they are used to determine specificity between regular expressions. Through experimental testing, this algorithm is faster than the simplistic approach of matching each regular expression to each string, while also correctly assigning the most specific match.
Paper
Python script implementation