European Corpus Initiative Multilingual Corpus I (ECI/MCI)
syntactic dependencies, POS
- European Corpus Initiative (ECI)
- POS-tagged Text Corpus
The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus (ECI/MCI) to be made available in digital form for scientific research at a low a cost as possible. The corpus has been available on CD-ROM since 1994, and is being distributed by ELSNET.
Below you find a sampling of the contents of the CD-ROM. There is also a complete listing of the contents available. Read the READ-ME file on the CD-ROM
- German newspaper texts from the Frankfurter Rundschau from July 1992 - March 1993. Provided by Universität Gesamthochschule, Paderborn, Germany. Approximately 34 million words.
- French newspaper texts from Le Monde, consisting of material from September 1989, October 1989, and January 1990. Provided by LIMSI CNRS, France. Approximately 4.1 million words
- Extracts from the Leiden Corpus of Dutch, consisting of newspapers, transcribed speech, etc. Provided by Instituut voor Nederlandse Lexicologie, Leiden, Holland. Approximately 5.5 million words
- International Labor Organisation (ILO) "Official Bulletin, B Series". Vols LXVII(1984) - LXXII(1989). Parallel texts in English, French and Spanish. Provided by the International Labor Organisation. Approximately 5 million words.
- European Network of Excellence in Human Language Technologies (ELSNET)