LT World

Sections
Personal tools
Log in

Skip to content. | Skip to navigation

Supporters

provided by

dfki logo

with support by

eu star logofp7 logo

through

meta logo
clarin logo

as well as by

bmbf logo

through

take logo

You are here: Home kb Resources & Tools Language Data Kyoto University Text Corpus

Kyoto University Text Corpus


  • Monolingual

morphosyntactically

  • Kyoto University

  • POS-tagged Text Corpus

A corpus project was under development at Kyoto University, whose goal was to create the corpus semi-automatically, to provide grammatically parsed sentences and to improve the automatic parsers at the same time.

The corpus contained 20,000 sentences in July 1998. Sentences were automatically parsed with JUMAN for the morphology and KNP for the syntax. Every sentence was then checked and eventually modified by humans, and the errors are used to improve the parsing algorithms used. The rate of growth of the corpus was of about 40 sentences per hour and per person.

This corpus can be downloaded through Internet on Kyoto University web site, but it is necessary to buy CD-Roms of the newspaper that was used to provide the sentences.

 

Download Kyoto Text Corpus (7990765 bytes). You need to purchase Mainichi Newspaper '95 CD-ROM


http://www-nagao.kuee.kyoto-u.ac.jp/nl-resource/corpus-e.html

  • rar

7990765 bytes

  • Creative Commens

4.0