The British component of the International Corpus of English (ICE-GB) — LT World

LT World

Supporters

provided by

dfki logo

with support by

eu star logofp7 logo

through

meta logo
clarin logo

as well as by

bmbf logo

through

take logo

N.B.

This site uses Google Analytics to record statistics about site visits - see Legal Information.

You are here: Home kb Resources & Tools Language Data The British component of the International Corpus of English (ICE-GB)

The British component of the International Corpus of English (ICE-GB)


morphosyntactically

syntactic trees, syntactic dependencies, POS


  • Speech Corpus
  • Treebank

ICE-GB is the British component of the International Corpus of English (ICE).

 

 

WHAT IS SPECIAL ABOUT ICE-GB?


ICE-GB is fully grammatically analysed. Like all the ICE corpora, ICE-GB consists of a million words of spoken and written English and adheres to the common corpus design. 200 written and 300 spoken texts make up the million words. Every text is grammatically annotated, permitting complex and detailed searches across the whole corpus. ICE-GB contains 83,394 parse trees, including 59,640 in the spoken part of the corpus. This is the biggest collection of parsed spoken material anywhere with the exception of DCPSE (which only contains spoken material).

The picture below shows ICECUP 3.1 displaying a single tree from the spoken part of the corpus.

 

 

 

 

ICE-GB has been fully checked. It was checked by linguists at several stages in its completion, using both a traditional ‘post-checking’ strategy and also by cross-sectional error-based searches. We do not believe that the analysis in the corpus is perfect, but it is not systematically imperfect - unlike the best parser output.

ICE-GB comes complete with ICECUP. ICECUP allows you to perform a variety of different queries, including using the parse analysis in the corpus to construct Fuzzy Tree Fragments to search the corpus.


  • English

  • Monolingual

  • Syntax
  • Phonology
  • Phonetics

Release 2

  • GNU GPL

free