The SUSANNE Corpus

The SUSANNE Corpus

  • English

  • Monolingual


syntactic dependencies, POS

  • Geoffrey Sampson

  • POS-tagged Text Corpus
  • Treebank

The SUSANNE Corpus was created, with the sponsorship of the Economic and Social Research Council (UK), as part of the process of developing a comprehensive language-engineering-oriented taxonomy and annotation scheme for the (logical and surface) grammar of English. The SUSANNE scheme attempts to provide a method of representing all aspects of English grammar which are sufficiently definite to be susceptible of formal annotation, with the categories and boundaries between categories specified in sufficient detail that, ideally, two analysts independently annotating the same text and referring to the same scheme must produce the same structural analysis.


The SUSANNE scheme may be likened to a "Linnaean taxonomy" of the grammatical domain: its aim (comparable to that of Linnaeus's eighteenth-century taxonomy for the domain of botany) is not to identify categories which are theoretically optimal or which necessarily reflect the psychological organization of speakers' linguistic competence, but simply to offer a scheme of categories and ways of applying them that make it practical for language-engineering researchers to register everything that occurs in real-life usage systematically and unambiguously, and for researchers at different sites to exchange empirical grammatical data without misunderstandings over local uses of analytic terminology.


Release 5