LT World

Personal tools
Log in

Skip to content. | Skip to navigation


provided by

dfki logo

with support by

eu star logofp7 logo


meta logo
clarin logo

as well as by

bmbf logo


take logo

You are here: Home kb Resources & Tools Language Data Kyoto University Text Corpus

Kyoto University Text Corpus

  • Monolingual


  • Kyoto University

  • POS-tagged Text Corpus

A corpus project was under development at Kyoto University, whose goal was to create the corpus semi-automatically, to provide grammatically parsed sentences and to improve the automatic parsers at the same time.

The corpus contained 20,000 sentences in July 1998. Sentences were automatically parsed with JUMAN for the morphology and KNP for the syntax. Every sentence was then checked and eventually modified by humans, and the errors are used to improve the parsing algorithms used. The rate of growth of the corpus was of about 40 sentences per hour and per person.

This corpus can be downloaded through Internet on Kyoto University web site, but it is necessary to buy CD-Roms of the newspaper that was used to provide the sentences.


Download Kyoto Text Corpus (7990765 bytes). You need to purchase Mainichi Newspaper '95 CD-ROM

  • rar

7990765 bytes

  • Creative Commens