.* and other operators, without careful use of
[^/] character class, grab several directory levels. For example,
/usr/.*/lib/.* and /usr/.*/bin/.* introduces several ambiguities. This
can be eliminated by careful construction of the FCs but there should
be something that prevents this.
The new syntax, being proposed, would address all these issues:
The new syntax also provides several useful features developers would miss from RE:
The purpose of the file context globbing syntax is to restrict users to only what they need through the syntax not through a compiler of some sort.
Metacharacters:
| Character | Meaning |
|---|---|
| \ | Escape character |
| ? | Match any character except / |
| [...] | Match any one of the characters inside of the [] |
| (...) | OR - Match any one of the string of characters separated by the pipe |
| * | Match zero or more of any characters confined within a single directory level; only one permitted per directory level |
| [asterisk][asterisk] | Match any number of characters over several directory levels; only one permitted per line |
Examples of syntax (note: these are not necessarily good examples):
The definition of specificity, given globs A and B:
In order to compare the specificity of two globs, a recursive approach is taken, breaking the glob into pieces and comparing the specificity of each piece. For example: each directory component inside of a glob must be analyzed to determine the overall relationship; then, each character class inside of each directory component has to compared to the each corresponding character class in the other directory component. Superset occurs when all subcomponents are either supersets or equal. Disjoint occurs when any one subcomponent is disjoint. Invalid intersection occurs in several cases: when they are equal, when one directory component is a superset and the other is a subset... etc.
* and [asterisk][asterisk] increase the complexity of the comparison. However, * can be expanded using ?'s to have the lengths match up, which makes it easy. For example, when comparing directory components: /foo*ba[rz]/ and /f*/, /f*/ gets expanded to /f??*???/, so now each character class may be compared to a respective character class. The same type of operation is done for *, * may be expanded into as many /*/ as needed.
There are two approaches, one is to sort the list from least specific to most specific, like it is now. This would be done like a normal sort, however some comparisons may be disjoint and it doesn't matter which order the pair is. In this case, no swapping or reording of the list is made.
The other approach would be a data file containing the set information pre-computed into set relationship graphs. This will greatly decrease times for matchpathcon and setfiles because instead of checking each line of the FC file, the search will be able to be performed intelligently by inferring if a path matches a globs or not. The inferences are, given glob A, glob B and path P:
With the inferences and the pre-compiled set relationships, this method looks more like binary seach (inference method) vs. linear search (current method).
There are two approaches to implementing this. The first and fastest way is to write out the globs in linear order like the FC file is now. Then, with another program at build time, translate the globs into REs. This should not be hard at all, since the glob syntax is clearly a subset of the RE syntax.
The other approach is to modify the way libselinux uses the file contexts. A built in FCGlob parser would be needed. All special inferences would be made in this library.
It is suggested that in the prototype stages the first method is used, simply translating the FCGlobs into REs is used.