Sign In
New ItemNew Item

Full name

Giovanni Battaglia 


Discovery of unconventional patterns for sequence analysis: theory and algorithms 

Start Time





The biology community is collecting a large amount of raw data, such as the genome sequences of organisms, microarray data, interaction data such as gene-protein interactions, protein-protein interactions and so on. This volume is rapidly increasing and the process of understanding the data is lagging behind the process of acquiring
it. An inevitable first step towards making sense of the data is to study their regularities focusing on the patterns, which are not random structures appearing surprisingly often in the input. Having chosen which class of patterns is of interest, the pattern discovery
task consists of the following: we are given a text T and some constrains either on the combinatorial pattern structure or on the occurrence lists, and we have to find the patterns in T satisfying the given constraints, also reporting their occurrence lists. The
goal of the thesis is to study new classes of patterns that can represent further properties of the repetitions, and propose novel algorithms for extracting them. We call these pattern unconventional to mean the unusual combinatorial structure of the patterns we are
looking for. Our line of research intend to explore two different kind of patterns: mask patterns, where each pattern represent a set of string pattern with wildcards, and permutation patterns where each pattern is a multiset of characters, since the order of the contained symbols does not matter.


Pattern Discovery, Mask, Permutation Pattern. 


Roberto Grossi 




Created at 8/14/2009 10:04 AM  by  
Last modified at 8/31/2009 12:56 PM  by Cristian Dittamo