LT World

Personal tools
Log in

Skip to content. | Skip to navigation


provided by

dfki logo

with support by

eu star logofp7 logo


meta logo
clarin logo

as well as by

bmbf logo


take logo

You are here: Home kb Resources & Tools Language Data European Corpus Initiative Multilingual Corpus I (ECI/MCI)

European Corpus Initiative Multilingual Corpus I (ECI/MCI)

  • English
  • French
  • Dutch
  • Spanish
  • German


syntactic dependencies, POS

  • European Corpus Initiative (ECI)

  • POS-tagged Text Corpus

  • Written

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus (ECI/MCI) to be made available in digital form for scientific research at a low a cost as possible. The corpus has been available on CD-ROM since 1994, and is being distributed by ELSNET.



Below you find a sampling of the contents of the CD-ROM. There is also a complete listing of the contents available. Read the READ-ME file on the CD-ROM

  • German newspaper texts from the Frankfurter Rundschau from July 1992 - March 1993. Provided by Universität Gesamthochschule, Paderborn, Germany. Approximately 34 million words.
  • French newspaper texts from Le Monde, consisting of material from September 1989, October 1989, and January 1990. Provided by LIMSI CNRS, France. Approximately 4.1 million words
  • Extracts from the Leiden Corpus of Dutch, consisting of newspapers, transcribed speech, etc. Provided by Instituut voor Nederlandse Lexicologie, Leiden, Holland. Approximately 5.5 million words
  • International Labor Organisation (ILO) "Official Bulletin, B Series". Vols LXVII(1984) - LXXII(1989). Parallel texts in English, French and Spanish. Provided by the International Labor Organisation. Approximately 5 million words.

  • European Network of Excellence in Human Language Technologies (ELSNET)