Implementation and Experimental Evaluation of Flexible Parsing For Dynamic Dictionary Based Compression

Matias Y, Rajpoot N, Sahinalp SC.
20 August 1998
Proc. of Algorithm Engineering 1998, 49-61
Presented at WAE’98 (2nd International Workshop on Algorithm Engineering), Saarbrucken, Germany, Aug 20-22.

Abstract

We report on the implementation and performance evaluation of greedy parsing with lookaheads for dy- namic dictionary compression. Specifically, we consider the greedy parsing with a single step lookahead which we call Flexible Parsing (FP) as an alternative to the commonly used greedy parsing (with no-lookaheads) scheme. Greedy parsing is the basis of most popular compression programs including unix compress and gzip, however it does not necessarily achieve optimality with regard to the dictionary construction scheme in use. Flexible parsing, however, is optimal, i.e., partitions any given input to the smallest number of phrases possible, for dictionary construction schemes which satisfy the prefix property throughout their execution. There is an on-line linear time and space implementation of the FP scheme via the trie-reverse-trie pair data structure [MS98]. In this paper, we introduce a more practical, randomized data structure to implement FP scheme whose expected theoretical performance matches the worst case performance of the trie-reverse- trie-pair. We then report on the compression ratios achieved by two FP based compression programs we implemented. We test our programs against compress and gzip on various types of data on some of which we obtain up to 35% improvement.