The LinGO Redwoods Treebank
- HPSG / GPSP
syntactic trees, syntactic dependencies, POS
- POS-tagged Text Corpus
The LinGO Redwoods treebank is a collection of hand-annotated corpora analysed with the LinGO ERG. For each utterance from a corpus, the treebank records (in principle) all analyses hypothesized by the grammar, together with an annotator decision as to which reading is preferred in context. The key innovative aspect of the Redwoods approach to treebanking is the anchoring of all linguistic data captured in the treebank to the HPSG framework and a generally-available broad-coverage grammar of English, viz. the LinGO English Resource Grammar. Unlike existing treebanks, there is no need to define a (new) form of grammatical representation specific to the treebank (and, consequently, less dissemination effort in establishing this representation).Instead, the treebank records complete syntacto-semantic analyses as defined by the LinGO ERG; tools are provided to extract many different types of linguistic information at varying granularity.
Other relevant aspects of the Redwoods treebank include the integration of alternate, though dis-preferred analyses for each utterance and the dynamic nature of the annotations: as the underlying grammar evolves and improves its analyses, there is a provision for a (nearly) fully automated update of the treebank against a version of the original corpus analysed with the revised grammar. As a methodological results, part of the Redwoods data are now regularly maintained as part of the grammar regression cycle with each new release of the ERG.