LT World

Sections
Personal tools
Log in

Skip to content. | Skip to navigation

Supporters

provided by

dfki logo

with support by

eu star logofp7 logo

through

meta logo
clarin logo

as well as by

bmbf logo

through

take logo

You are here: Home kb Resources & Tools Language Data BTB-POS Corpus I

BTB-POS Corpus I


2460 Bulgarian sentences marked-up with part of speech information

  • Bulgarian

  • Monolingual

2460 Bulgarian sentences marked-up with part of speech information

  • Linguistic Modelling Laboratory, Bulgarian Academy of Sciences
  • Institute of Information Technologies, Bulgarian Academy of Sciences

The corpus is in XML format, non-standard with respect to TEI or CES, DTD is included. Available in three different encodings of cyrillic letters: ISO 8879:1986, MS Windows, and Unicode.

 

An extraction from this corpus of Bulgarian sentences marked-up with part of speech information can download from here:

BTB-POS Corpus I (324 011 bytes) - ISO 8879:1986 encoding of the cyrillic letters (entities).

BTB-POS Corpus I (306 966 bytes) - MS Windows encoding of the cyrillic letters.

BTB-POS Corpus I (246 964 bytes) - Unicode encoding of the cyrillic letters.


http://tractor.bham.ac.uk/tractor/resources/SOF2/BTB-POS/

  • inline XML

  • Web