Sign In
 
 
 
New ItemNew Item

Full name

Hoang Thanh Lam 

Title

Optimization Issues  in Large-Scale Web Search 

Start Time

16:45 

Location

Gerace 

Abstract

Building a large-scale Web search service is very challenging and  involves many complex tasks requiring effective and efficient solutions. Each  one of these tasks must be addressed by exploiting  algorithms optimizing query throughput and query answering time while  granting  the high quality of results  returned to users. Our work focuses on improving the efficiency of some aspects of the  complex query answering process. Concretely, we deal with optimization problems the solutions of which  can reduce the costs of query processing.  Firstly, we propose a novel lossless inverted index  compression technique   which can achieve a better compression ratio by exploiting the co-occurrences of terms in indexed documents. Secondly, we investigate the use of information extracted from query  log to optimize static index pruning techniques allowing to remarkably reduce the size of the index  with almost negligible effects on the precisions of query results retrieved. On the other side, we  investigate the efficiency of solutions to the all-pairs similarity search problem which is common to several important  web information  retrieval tasks such as the detection of near-duplicate documents, or query recommendation.

Keywords

Web Search, Inverted Index Compression, Similarity Search  

Supervisor(s)

Raffele Perego 

Notes

 

Session

Attachments
lam.pdf    
Created at 8/14/2009 9:53 PM  by  
Last modified at 9/1/2009 5:41 PM  by Cristian Dittamo