Spoken Language Corpora

definition: Spoken language corpora are collections of recorded spoken language, generally associated with transcriptions of speech and noises, and with annotations at different linguistic levels. Speech corpora can contain read speech, spontaneous speech, dialogues and may be recorded under different conditions with regard to microphones, environment (e.g., laboratory, office, background noise), and transmission channel (e.g., telephone, broadcast). Speech corpora are used for different purposes, including training and evaluation of speech recognisers, phonetic and phonological research, dialect research, dialogue research, and speech synthesis.
See also the corresponding HLT Survey chapter: http://www.lt-world.org/hlt_survey/ltw-chapter12-3.pdf
related publication(s):

Handbook of Standards and Resources for Spoken Language Systems.
Dafydd Gibbon and Roger Moore and Richard Winski.
Walter de Gruyter. Berlin, Germany. 1997.

