Information Retrieval Evaluation
The task of Information Retrieval (IR) systems is to find as many documents as possible that are relevant to a query, and as few as possible irrelevant documents. IR systems are evaluated by making use of a collection of documents, a set of queries, and a set of relevance judgements for document/query pairs. Text collections typically comprise several gigabytes of data; some terabyte-sized collections are available. The fundamental measures in information retrieval are precision and recall. Precision is the proportion of relevant documents among all documents found by an IR system. Recall is the proportion of relevant documents found by a system among all relevant documents in the test collection. Since 1992, the National Institute of Standards (NIST) has organised the Text Retrieval Conferences (TREC), which distributes evaluation data to different research groups and analyses results submitted by the groups. The TREC evaluations comprise various tracks, including spoken document retrieval, interactive retrieval, web retrieval, and question answering. Since 2000, the Cross-Language Evaluation Forum (CLEF) organises evaluations for cross-language retrieval systems. Large-scale evaluation compaigns such as TREC and CLEF are regarded as having a great influence on progress of the field of IR.