LT World

Personal tools
Log in

Skip to content. | Skip to navigation


provided by

dfki logo

with support by

eu star logofp7 logo


meta logo
clarin logo

as well as by

bmbf logo


take logo

You are here: Home kb Resources & Tools Language Data Sinica Treebank

Sinica Treebank

  • Chinese

  • Monolingual

  • Syntax
  • Grammar Writing


syntactic dependencies, POS

  • Treebank
  • POS-tagged Text Corpus

The goal of Sinica Treebank is to provide a syntactic, structure-tagged corpus for Chinese natural language processing. By extracting grammatical information from Treebank, we can improve the performance of the parser and learn more about the syntactic knowledge. Sinica Treebank was built by CKIP in 1997 with texts taken from the Sinica Corpus.


Based on ICG grammar (Information-based Case Grammar), the contexts are automatically parsed before being manually checked. The present version, Sinica Treebank v3.0, includes 61,087 trees (361,834 words). There are 1,000 tree structures open to the public for researchers to download. Meanwhile, a search interface on the website helps users who are interested in Chinese syntax and semantic relation. The structural frame of Sinica Treebank is based on the Head-Driven Principle; that is, a sentence or phrase is composed of a core Head and its arguments, or adjuncts. The Head defines its phrasal category and relations with other constituents. For example, the Head of a sentence (S) or verb phrase (VP) is a verb (V). See Chen et al. (1999) The Construction of Sinica Treebank for details of supplementary principles, symbol illustrations, semantic roles, and phrasal structures.