LT World

You are here: Home kb Resources & Tools Language Data The Penn Treebank (PTB)

The Penn Treebank (PTB)


POS, skeletal syntactic bracketing, argument-predicate structure

morphosyntactically

syntactic trees, PTB Bracketing

PTB and then corrected by human annotators

  • University of Pennsylvania
  • Department of Information Engineering and Computer Science

The Penn Treebank Project annotates naturally-occuring text for linguistic structure. Most notably, he produces skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees. He also annotates text with part-of-speech tags, and for the Switchboard corpus of telephone conversations,dysfluency annotation.

 

The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation. These 2,499 stories have been distributed in both Treebank-2 (LDC1999T42) and Treebank-3 (LDC1999T42) releases of PTB. Treebank-2 includes the raw text for each story. Three "map" files are available in a compressed file via ftp and provide the relation between the 2,499 PTB filenames and the corresponding WSJ DOCNO strings in TIPSTER.

 

Other Web links:

The Penn Treebank Project: http://www.cis.upenn.edu/~treebank/

Penn Treebank Online: http://www.ldc.upenn.edu/ldc/online/treebank/


http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99T42

  • English

  • Monolingual

  • Syntax
  • Morphosyntax
  • Morphology

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC99T42

 1 Million texts


Member fee:$0 for 1999 members; Non-member Fee:US $3150.00; Reduced-License Fee:US $1575.00; Extra-Copy Fee:US $150.00

  • Audio CD