English Parser Evaluation Corpus
syntactic dependencies, POS
- POS-tagged Text Corpus
A parser evaluation corpus of English based on a grammatical relation annotation scheme is now available. It consists of 500 sentences (around 10000 words) extracted randomly from the SUSANNE corpus.
There are four files: the (tokenised) raw text, the lemmatised and numbered sentences, the grammatical relation annotation and software that can be used to automatically evaluate parser output. An up-to-date specification of the annotation scheme is also online. (Please note that this specification refers to the latest version of the annotated corpus, and supersedes the one in the publications listed below). The corpus is free for research purposes; for any proposed commercial use please contact John Carroll.
Descriptions of the grammatical relation annotation scheme are published in
Carroll, J., G. Minnen and E. Briscoe (in press) `Parser evaluation using a grammatical relation annotation scheme'. In A. Abeillé (ed.), Treebanks: Building and Using Syntactically Annotated Corpora, Dordrecht: Kluwer.
Carroll, J., G. Minnen and E. Briscoe (1999) `Corpus annotation for parser evaluation'. In Proceedings of the EACL-99 Post-Conference Workshop on Linguistically Interpreted Corpora, Bergen, Norway. 35-41. Also in Proceedings of the ATALA Workshop on Corpus Annotés pour la Syntaxe - Treebanks, Paris, France. 13-20.
Carroll, J., E. Briscoe and A. Sanfilippo (1998) `Parser evaluation: a survey and a new proposal'. In Proceedings of the 1st International Conference on Language Resources and Evaluation, Granada, Spain. 447-454.
- GNU GPL