LT World

You are here: Home kb Information & Knowledge Technologies Written Language Corpora

Written Language Corpora

Corpus Linguistics: investigating language structure and use.
D. Biber and S. Conrad and R. Reppen.
CUP. Cambridge, 1998.

Corpus Linguistics.
T. McEnery and A. Wilson.
EUP. Edinburgh, 2001.


  • Open Language Archives Community (OLAC)
  • Centre for Corpus Research (CCR)
  • Evaluations and Language Resources Distribution Agency (ELDA)
  • Oxford Text Archive (OTA)
  • Trans-European Language Resources Infrastructure (TELRI)
  • University Centre for Computer Corpus Research on Language (UCREL)
  • Tuscan Word Centre
  • The Centre for English Corpus Linguistics (CECL)
  • Linguistic Data Consortium (LDC)
  • Electronic Text Center
  • European Corpus Initiative (ECI)
  • European Language Resources Association (ELRA)
  • International Computer Archive of Modern and Medieval English (ICAME)
  • TELRI Research Archive of Computational Tools and Resources (TRACTOR)
  • Institute for the German Language (IDS)

  • Oliver Mason
  • Michael Oakes
  • Mike Scott
  • David Lee
  • Geoffrey Sampson
  • John Sinclair
  • Anke Lüdeling
  • Wolfgang Teubert
  • Michael Stubbs
  • Andrew Wilson
  • Tony McEnery
  • Adam Kilgarriff
  • Christopher Manning

  • Bank of English (COBUILD Corpus)
  • Global English Monitor Corpus
  • London-Lund Corpus (LLC)
  • Lancaster-Oslo Bergen (LOB) Corpus
  • SARA
  • Xkwic/CQP (IMS Corpus Workbench)
  • American National Corpus (ANC)
  • SUSANNE Corpus
  • Brown Corpus
  • Freiburg-LOB (FLOB) Corpus
  • Freiburg-Brown (FROWN) Corpus
  • WordSmith Tools
  • MonoConc
  • European Corpus Initiative Multilingual (ECI/MCI 1) Corpus

Any collection of more than one text can be called a corpus, (corpus being Latin for "body", hence a corpus is any body of text). But the term "corpus" when used in the context of modern linguistics means a machine-readable text collection which is representative for the language use under investigation.

Written Corpora