definition: Queries given to search engines or other retrieval systems are often not very specific, and lead to a large number of matching documents. In these cases the retrieval system should have a good estimate of the relevance of the documents to the user's needs, so that "good" documents show up early in the enumeration. A large number of factors should enter into a good ranking method, including the positions of the query terms in the document, linguistic context of the matches, link popularity, classification of the documents, user models etc. "Classical" methods compute a mesaure of "distance" between the query and the retrieved document, such as TF/IDF or cosine similarity. For hyperlinked documents, methods which make use of the hyperlink structure have proved very effective for relevance ranking. Google was the first large-scale search engine to make use of hyperlink sructure for relevance ranking.
- Jon M. Kleinberg
- Gerald Salton
- Sergey Brin
- Monika R. Henzinger
related system(s) / resource(s):
- Wikipedia: Relevance in Information Retrieval
Authoritative sources in a hyperlinked environment.
Jon M. Kleinberg.
Journal of the ACM. 46 (5). 1999. 604--632.
The anatomy of a large-scale hypertextual Web search engine.
Sergey Brin and Lawrence Page.
Computer Networks and ISDN Systems. 30 (1--7). 1998. 107--117.
Mining the Web.
Morgan Kaufmann Publishers. San Francico. 2003.
Link Analysis in Web Information Retrieval.
Monika R. Henzinger.
IEEE Data Engineering Bulletin. 23 (3) 2000. 3-8.
Term-weighting approaches in automatic text retrieval.
Gerard Salton and Christopher Buckley.
Information Processing and Management. 24 (5). 1988. 513-523.