A large fraction of the data we process every day consists of a
sequence of symbols over an alphabet, and hence is a text.
Although the scientific literature offers many solutions to the storage,
access and search of textual data, the current growth and availability
of massive amounts of texts gathered and processed by applications
has changed the algorithmic requirements of these basic processing tools
and provide ample motivation for a great deal of new theoretical
research on algorithms and data structures.
In fact the memory hierarchies on current PCs and workstations are
very complex because they consist of multiple levels: L1 and L2 caches,
internal memory, one or more disks, other external storage devices (like
CD-ROMs and DVDs), and memories of multiple hosts over a network.
Although the virtualization of the memory permits the address space
to be larger than the internal memory, it is well-known that not all
memory references are equal.
In this scenario data compression seems mandatory because it may
induce a twofold advantage: fitting more data into high (fast)
memory levels reduces the transfer time from the slow levels, and
may speed up the execution of algorithms. It goes without saying
that storing data in compressed format is beneficial whenever the
cost of accessing them out-weights their decompression time.
In this seminar we will briefly introduce our work on designing
solutions that provide high compression, efficient accesses and fast
searches on textual data.