Sign In
New ItemNew Item

Full name

Hoang Thanh Lam 


Optimization Issues  in Large-Scale Web Search 

Start Time





Building a large-scale Web search service is very challenging and  involves many complex tasks requiring effective and efficient solutions. Each  one of these tasks must be addressed by exploiting  algorithms optimizing query throughput and query answering time while  granting  the high quality of results  returned to users. Our work focuses on improving the efficiency of some aspects of the  complex query answering process. Concretely, we deal with optimization problems the solutions of which  can reduce the costs of query processing.  Firstly, we propose a novel lossless inverted index  compression technique   which can achieve a better compression ratio by exploiting the co-occurrences of terms in indexed documents. Secondly, we investigate the use of information extracted from query  log to optimize static index pruning techniques allowing to remarkably reduce the size of the index  with almost negligible effects on the precisions of query results retrieved. On the other side, we  investigate the efficiency of solutions to the all-pairs similarity search problem which is common to several important  web information  retrieval tasks such as the detection of near-duplicate documents, or query recommendation.


Web Search, Inverted Index Compression, Similarity Search  


Raffele Perego 




Created at 8/14/2009 9:53 PM  by  
Last modified at 9/1/2009 5:41 PM  by Cristian Dittamo