LT World

Sections
Personal tools
Log in

Skip to content. | Skip to navigation

Supporters

provided by

dfki logo

with support by

eu star logofp7 logo

through

meta logo
clarin logo

as well as by

bmbf logo

through

take logo

You are here: Home kb Resources & Tools Language Data The TIGER Treebank

The TIGER Treebank


http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/doc/html/TigerXML.html

app. 900,000 tokens (50,000 sentences)

  • German

  • Monolingual

  • Syntax
  • Morphosyntax

morphosyntactically

syntactic dependencies, POS, syntactic trees

  • The TIGER project
  • University of Stuttgart
  • Thorsten Brants

  • Treebank

The TIGER Treebank (Version 2.1) consists of app. 900,000 tokens (50,000 sentences) of German newspaper text, taken from the Frankfurter Rundschau. The corpus was semi-automatically POS-tagged and annotated with syntactic structure. Moreover, it contains morphological and lemma information for terminal nodes. For details, see the annotation page. The TIGER Treebank is delivered in two treebank formats: Negra export format (text format) TIGER-XML format (XML-based format) Both versions of the corpus can be processed by the treebank query tool TIGERSearch, which has also been developed within the TIGER project.

 

In addition to the TIGER Treebank proper, several resources derived from it are available. These are the TiGer Dependency Bank, which is a dependency-based gold standard for (hand-crafted) German parsers for the TIGER Corpus sentences 8,001 through 10,000, the TIGER 700 RMRS Bank, the TIGER data sets for the CoNLL-X shared task and dependency triple representations for (almost) the entire treebank, which, like the TiGer DB structures, are intended for evaluation purposes.


http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/

  • GNU GPL

  • University of Stuttgart
  • Thorsten Brants

2.1