TREC9 SDR Track — LT World

LT World


provided by

dfki logo

with support by

eu star logofp7 logo


meta logo
clarin logo

as well as by

bmbf logo


take logo


This site uses Google Analytics to record statistics about site visits - see Legal Information.

You are here: Home kb Information & Knowledge Information Sources Relevant Sources TREC9 SDR Track


Spoken Document Retrieval for TREC-9 at Cambridge University

by S. E. Johnson , P. Jourlin , K. Spärck Jones , P. C. Woodland


This paper presents work done at Cambridge University for the TREC-9 Spoken Document Retrieval (SDR) track. The CUHTK transcriptions from TREC-8 with Word Error Rate (WER) of 20.5% were used in conjunction with stopping, Porter stemming, Okapi-style weighting and query expansion using a contemporaneous corpus of newswire. A windowing/recombination strategy was applied for the case where story boundaries were unknown (SU) obtaining a final result of 38.8% and 43.0% Average Precision for the TREC-9 short and terse queries respectively. The corresponding results for the story boundaries known runs (SK) were 49.5% and 51.9%. Document expansion was used in the SK runs and shown to also be beneficial for SU under certain circumstances. Non-lexical information was generated, which although not used within the evaluation, should prove useful to enrich the transcriptions in real-world applications. Finally, cross recogniser experiments again showed there is little performance degradation as WER increases and thus SDR now needs new challenges such as integration with video data.