Sign In
New ItemNew Item

Full name

Igor Nitto 


Text partitioning algorithms in data compression 

Start Time





Text partitioning (or parsing for brevity) is the task of partitioning an input text into contiguous substrings. Parsing algorithms are a fundamental tool for text compression. In fact, we can identify at least two important classes of text compressors in which parsing of the input data is a key step.
   The well-known Lempel-Ziv compressors perform a parsing of the input text into a sequence of previously occurred
substrings called phrases. In the compressed text each phrase is encoded by a codeword, which is a variable-length
pointer to one of its previous copy in the text. In this setting the main problem is to find a parsing strategy
which minimizes the size in bits of the sequence of codewords and hence of the compressed text. This is referred as
the bit-optimal parsing problem.
Moreover, text partitioning is also a main ingredient in the compression algorithms following the so-called Permute-
Partition-Compress paradigm. This is a general paradigm for the design of textual compressor, based on the idea
of reorganizing the input data before compression. The reorganization process generally consists in permuting the input data to form a new string which is then partitioned into a sequence of contiguous substrings that are finally compressed individually by a given base compressor. In this
case the goal of the partitioning step is to compute a partition of the permuted data resulting in the best
possible compression.
Our results have shown that, although seemingly unrelated, both the above problem boils down to computing shortest path on a particular class of directed acyclic graphs, which enjoy several useful combinatorial properties. These allowed us to devise the first space and time efficient solutions for them, which will be the main subject of this talk.


text compression, algorithms on strings 


Paolo Ferragina 




Created at 8/16/2009 3:29 PM  by  
Last modified at 9/10/2009 9:00 PM  by Cristian Dittamo