Compression Contest and Marctar's Axe
Marcus Hutter has posted a 50,000 Euro prize contest for compressing (a subset of) Wikipedia. While I applaud Dr. Hutter for his generous contribution to stimulating AI research, and I do believe that cognition often can be viewed as compression (as championed by Gerard Wolff), I do have one reservation with this idea:
The idea that a smaller theory is a better theory can be traced at least to Ockham's Razor. When I applied The Cruncher to the frames of a (very low resolution, but continuitous) movie, I found that the best compression was when the 1st frame was described, then every frame was described in terms of the previous frame (much the way that mpegs work). However, I don't think this is how people do it. I don't record my day by remembering when I woke up then every instant in terms of the previous instant. If I did, I'd have to spend a good deal of time "unpacking" my day to tell you what I had for supper (and presumably less time to tell you what I had for breakfast).
Perhaps a more extreme example would be Euclidean Geometry. This can be compressed down to the 5 postulates (plus some inference rules), but I don't think anyone rederives a commonly used lemma every time they use it.
Therefore, I propose an alternate to Ockham's Razor, which I'll call Marctar's Axe: The best model is that which has the smallest average query time (in terms of steps of computation to answer it). By "query" I mean things like "How many times have you seen this pattern?" or "What is the usual outcome of this pattern?".
My suspicion is that compression will usually fall out of Marctar's Axe, but there will be some cases where Marctar's Axe will tell you to "cache" results and thereby trade in a lot of computation time for a little bit of memory.

6 Comments:
Caching in rule derivation is also known as tabling in logic programming and memoization in procedural programming. We'll stick with "tabling" for the purposes of rule-derivation. In a contest like the Hutter Prize there is no penalty for tabling if you generate your tabled entries during the time limit of decompression. You submit your self-extracting archive with no tabled entries, its size is measured and then generate the tabled entries during decompression.
This post has been removed by a blog administrator.
The time limits for the Hutter Prize are pretty liberal (allowing a month-ish to extract), so it doesn't do much to encourage a program to use tabling. (In fact, tabling is slightly discouraged because your program would need to have the infrastructure for doing that (even if it doesn't have the tabled entries to begin with).)
I should make it clear that I think that the Hutter Prize is a good contest. I just wanted to point out that solving the compression problem does not solve the AI problem, and that a measure like Marctar's Axe would point research a little closer to the goal of AI (since computation time needs to be taken into account in any complete solution to the AI problem).
The benefit of Marctar's Axe is that it unifies time and memory into a single measure. Otherwise, we'd need to explicitly address how many clock cycles are worth how many memory bits.
I should also point out that even if the time to extract the entire data set were taken into account, this wouldn't do much to encourage caching either (because you'd need to touch every data item, which takes a lot of time). In particular, this wouldn't address the "movie of my day" issue.
Marctar's Axe would have a model minimize the access time for an arbitrary query starting with the "compressed" model.
Correction: The time-limit is 10 hours, not a month-ish. However, this still isn't a very tight restriction, and the contest prefers a smaller program that runs in 9 hours to a slightly-bigger program that runs in 2 minutes.
Suppose we have a "movie" made of, say, 38 "frames", numbered 0 through 37. First, define 0. Then, for every higher frame, take its number in binary and turn the rightmost "1" into a "0", then define the frame in terms of that frame. Then the last frame, 37 (binary 100101), is defined in terms of frame 36 (100100), which is defined in terms of frame 32 (100000), which is define in terms of frame 0.
The compression here isn't as good, since a frame is less likely to be related to a much earlier frame than the frame immediately before it. However, chances are a movie will contain many radical transitions in which a frame is completely unrelated to the one before anyway, so the apparent number of radical transitions is, I think, multiplied by something proportional to the log of the movie's length.
And the big benefit: the worst-case frame here is number 31, defined in terms of 30, defined in terms of 28, defined in terms of 24, defined in terms of 16, defined in terms of 0: five steps. In general, the worst-case time is proportional to the log of movie length. If each frame is defined in terms of the previous one, 37 requires 37 steps: the worst-case time is proportional to the movie length itself.
The last obvious method is to define each frame independently: every frame is treated as a radical transition. Here, decompression of one frame is constant time, but the number of radical transitions is proportional to the length of the movie rather than the log of its length times the actual number.
So suppose we have r radical transitions in a movie of length l. The independent method gives a decompression time proportional to 1 but a size proportional to l. The rightmost-1 method gives a decompression time proportional to log l and a size proportional to r*log l. The previous-frame method gives a decompression time proportional to l but a size proportional to r.
So it really depends on parameters and objectives, but unless size is critical or you have a very large number of radical transitions, I'd say the rightmost-1 method is best.
Post a Comment
<< Home