LT World

Personal tools
Log in

Skip to content. | Skip to navigation


provided by

dfki logo

with support by

eu star logofp7 logo


meta logo
clarin logo

as well as by

bmbf logo


take logo

You are here: Home kb Resources & Tools Language Data The Prague Dependency Treebank (PDT)

The Prague Dependency Treebank (PDT)

7,110 manually annotated textual documents, containing altogether 115,844 sentences with 1,957,247 tokens

  • Czech

  • Monolingual

  • Syntax
  • Semantics
  • Morphology


morphosyntactically, semantically

syntactic dependencies, POS

  • Charles University in Prague

The Prague Dependency Treebank 2.0 (PDT 2.0) contains a large amount of Czech texts with complex and interlinked morphological (2 million words), syntactic (1.5 MW) and complex semantic annotation (0.8 MW); in addition, certain properties of sentence information structure and coreference relations are annotated at the semantic level. PDT 2.0 is based on the long-standing Praguian linguistic tradition, adapted for the current Computational Linguistics research needs. The corpus itself uses the latest annotation technology. Software tools for corpus search, annotation and language analysis are included. Extensive documentation (in English) is provided as well

  • Charles University in Prague