The CHRISTINE Corpus
syntactic trees, syntactic dependencies, POS
- University of Sussex
- POS-tagged Text Corpus
The CHRISTINE Corpus is a structurally-annotated sample of spoken English. The sample is based on extracts from the “demographically-sampled” speech section of the British National Corpus. It therefore forms a suitable resource for studying grammatical and other structural features in the spontaneous, informal usage of a cross-section of speakers drawn from all social classes and regions of the United Kingdom in the 1990s. The CHRISTINE Corpus conforms to relevant recommendations of the EAGLES (Expert Advisory Group on Language Engineering Standards) Spoken Language Working Group (Gibbon et al. 1997), as well as to preferences expressed by an international group of more than thirty experts consulted via the Internet at the beginning of the project which created it.
The CHRISTINE project was sponsored from 1996 to 1999 by the Economic and Social Research Council (UK), under award no. R000 23 6443, as a successor to the project which produced the SUSANNE analytic scheme and Corpus. The main aim of both SUSANNE and CHRISTINE projects has been to develop detailed, comprehensive, and explicit standards for annotating the structural properties of samples of English language as used in real life. Such standards can be developed only by applying an annotation scheme to language samples and refining it in response to problematic cases; so the work yields, as a valuable by-product, corpora, or “treebanks”, annotated in accordance with the scheme.