The Brown University Standard Corpus of Present-Day American English (Corpus BROWN )
1,014,312 words sampled from 15 text categories
human annotation, native speakers of American English
syntactic dependencies, POS
- Brown University
- Department of Cognitive & Linguistic Sciences, Brown University
The Corpus consists of 500 samples, distributed across 15 genres in rough proportion to the amount published in 1961 in each of those genres. All works sampled were published in 1961; as far as could be determined they were first published then, and were written by native speakers of American English. Each sample began at a random sentence-boundary in the article or other unit chosen, and continued up to the first sentence boundary after 2,000 words. In a very few cases miscounts led to samples being just under 2,000 words.
- GNU GPL