LT World

Sections
Personal tools
Log in

Skip to content. | Skip to navigation

Supporters

provided by

dfki logo

with support by

eu star logofp7 logo

through

meta logo
clarin logo

as well as by

bmbf logo

through

take logo

You are here: Home kb Resources & Tools Language Data USENET corpus

USENET corpus


collected from a great variety of Internet discussion boards last calendar year. contains anonymized postings collected from 47,860 USENET newsgroups

over 30 billion words,

  • English

untagged (raw text)

untagged (raw text)

untagged (raw text)

http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html

  • Web

over 34Gb, compressed (delivered as weekly bundles of about 150 Mb each.)

  • Creative Commens

[BETA VERSION]