Written Language Corpora
http://www.lt-world.org/hlt_survey/ltw-chapter12-2.pdf
- Open Language Archives Community (OLAC)
- Centre for Corpus Research (CCR)
- Evaluations and Language Resources Distribution Agency (ELDA)
- Oxford Text Archive (OTA)
- Trans-European Language Resources Infrastructure (TELRI)
- University Centre for Computer Corpus Research on Language (UCREL)
- Tuscan Word Centre
- The Centre for English Corpus Linguistics (CECL)
- Linguistic Data Consortium (LDC)
- Electronic Text Center
- European Corpus Initiative (ECI)
- European Language Resources Association (ELRA)
- International Computer Archive of Modern and Medieval English (ICAME)
- TELRI Research Archive of Computational Tools and Resources (TRACTOR)
- Institute for the German Language (IDS)
- Oliver Mason
- Michael Oakes
- Mike Scott
- David Lee
- Geoffrey Sampson
- John Sinclair
- Anke Lüdeling
- Wolfgang Teubert
- Michael Stubbs
- Andrew Wilson
- Tony McEnery
- Adam Kilgarriff
- Christopher Manning
- The European Network of Excellence in Human Language Technologies (ELSNET)
- Multext-East
- Preparatory Action for Linguistic Resources Organisation for Language Engineering (PAROLE)
- Multilingual Text Tools and Corpora (Multext)
- Bank of English (COBUILD Corpus)
- Global English Monitor Corpus
- London-Lund Corpus (LLC)
- Lancaster-Oslo Bergen (LOB) Corpus
- SARA
- Xkwic/CQP (IMS Corpus Workbench)
- American National Corpus (ANC)
- SUSANNE Corpus
- Brown Corpus
- Freiburg-LOB (FLOB) Corpus
- COSMAS
- Freiburg-Brown (FROWN) Corpus
- WordSmith Tools
- MonoConc
- European Corpus Initiative Multilingual (ECI/MCI 1) Corpus
Any collection of more than one text can be called a corpus, (corpus being Latin for "body", hence a corpus is any body of text). But the term "corpus" when used in the context of modern linguistics means a machine-readable text collection which is representative for the language use under investigation.
Written Corpora