LT World

You are here: Home kb Resources & Tools Language Data Chinese SMS Corpus---Pinyin Information (King-NLP-003)

Chinese SMS Corpus---Pinyin Information (King-NLP-003)


All SMS sentences were manually proofread


  • Written

This data contains 1,200,000 SMS sentences collected from the real life of Chinese Native speakers. All short message sentences were proofread manually, repeated sentences were filtered in the pure word layer and all the sentences were annotated with Pinyin with tone information.


http://www.speechocean.com/en-Text-Corpora/680.html

  • Chinese

  • Monolingual

  • Information Retrieval System
  • Information Retrieval System
  • Information Extraction Applications
  • Mining Applications

Information retrieval/Extraction, Text Mining and etc.