Introduction

The current file contexts (FCs) in SELinux has several problems:
  • The regular expressions used in the FCs are very hard to analyze. There is a reason why apol alows users to do so much analysis on policy but does not have a comprehensive file context tool. If the specificities of individual FCs would be easily computable, the file context system would work much better. The lack of being able to do analysis is also why the sorting of specificity of the FCs currently is an approximation.
  • All "bugs" in the FCs occur silently and without warning. There is no way to detect if FCs overlap and cause ambiguity.
  • The complexity of regular expression syntax is not necessary for the goal of FCs and allows policy writers to do too much.
    • It is very prone to error since many people are not knowledgable about regular expressions. For example the regular expression "/etc/tresys.conf" matches /etc/tresysZconf.
    • It allows people to be TOO clever. Regular expressions can sometimes be so clever they are completely obfuscated from the actual meaning. This makes some REs not human readable and makes analysis very hard.
  • .* and other operators, without careful use of [^/] character class, grab several directory levels. For example, /usr/.*/lib/.* and /usr/.*/bin/.* introduces several ambiguities. This can be eliminated by careful construction of the FCs but there should be something that prevents this.

The new syntax, being proposed, would address all these issues:

  • The syntax is easy for a computer to analyze, making it a simple task to determin which of two is more specific.
  • A pair of ambiguous FCs can be detected and reported as an error or warning, since analysis of specificity will be able to be done.
  • The syntax is less prone to user error since it is very similar to shell globbing, a more commonly used and wider known syntax than RE.
  • Asterisk is confined to one directory. Also, only one asterisk per directory level is permitted to avoid ambiguity.

The new syntax also provides several useful features developers would miss from RE:

  • Or syntax, such as d(o|aw)n matching don or dawn or lib(|64) matching lib or lib64.
  • Double Asterisk that provides the functionalty of .*, however only one is permitted per line to avoid ambiguity.
  • Character classes, such as [ab][cd] matching ac, ad, bc and bd. Also the basic character classes found in RE like :alpha: will be possible character classes.

Specification of File Context Globbing syntax

The purpose of the file context globbing syntax is to restrict users to only what they need through the syntax not through a compiler of some sort.

Metacharacters:

Character Meaning
\ Escape character
? Match any character except /
[...] Match any one of the characters inside of the []
(...) OR - Match any one of the string of characters separated by the pipe
* Match zero or more of any characters confined within a single directory level; only one permitted per directory level
[asterisk][asterisk] Match any number of characters over several directory levels; only one permitted per line

Examples of syntax (note: these are not necessarily good examples):

  • /etc/*.conf - matches any .conf files in the /etc directory
  • /usr/\// - matches the file with name / in usr
  • /??*/foo - matches any file named foo in a first level directory with 2 or more characters
  • /usr/lib(|64)/*.so - matches all .so in /usr/lib or /usr/lib64
  • /usr/share/[asterisk][asterisk]/java/*.jar match any jar file in a java directory at any level in /usr/share

Calculation of specificity

The definition of specificity, given globs A and B:

  • Superset - If A matches all strings that B matches, and at least one more, A is less specific than B
  • Disjoint - If A and B never match the same string, they are unrelated and neither is more specific than the other
  • Invalid Intersection - A matches at least one string B does not match, B matches at least one string A does not match and A and B match at least one string together OR A and B match the same set of strings. Either of these cases cause ambiguity because neither are more specific than the other, but they do share strings.

In order to compare the specificity of two globs, a recursive approach is taken, breaking the glob into pieces and comparing the specificity of each piece. For example: each directory component inside of a glob must be analyzed to determine the overall relationship; then, each character class inside of each directory component has to compared to the each corresponding character class in the other directory component. Superset occurs when all subcomponents are either supersets or equal. Disjoint occurs when any one subcomponent is disjoint. Invalid intersection occurs in several cases: when they are equal, when one directory component is a superset and the other is a subset... etc.

* and [asterisk][asterisk] increase the complexity of the comparison. However, * can be expanded using ?'s to have the lengths match up, which makes it easy. For example, when comparing directory components: /foo*ba[rz]/ and /f*/, /f*/ gets expanded to /f??*???/, so now each character class may be compared to a respective character class. The same type of operation is done for *, * may be expanded into as many /*/ as needed.

New File Context File

There are two approaches, one is to sort the list from least specific to most specific, like it is now. This would be done like a normal sort, however some comparisons may be disjoint and it doesn't matter which order the pair is. In this case, no swapping or reording of the list is made.

The other approach would be a data file containing the set information pre-computed into set relationship graphs. This will greatly decrease times for matchpathcon and setfiles because instead of checking each line of the FC file, the search will be able to be performed intelligently by inferring if a path matches a globs or not. The inferences are, given glob A, glob B and path P:

  • If P matches A and A is disjoint from B: P does not match B
  • If P matches A and A is a subset of B: P matches B (but it doesn't matter because we are trying to find the most specific glob anyways)
  • If P does not match A and B is a subset of A: P does not match B

With the inferences and the pre-compiled set relationships, this method looks more like binary seach (inference method) vs. linear search (current method).

Actual Implementation

There are two approaches to implementing this. The first and fastest way is to write out the globs in linear order like the FC file is now. Then, with another program at build time, translate the globs into REs. This should not be hard at all, since the glob syntax is clearly a subset of the RE syntax.

The other approach is to modify the way libselinux uses the file contexts. A built in FCGlob parser would be needed. All special inferences would be made in this library.

It is suggested that in the prototype stages the first method is used, simply translating the FCGlobs into REs is used.