External Links
Google Scholar
provided by
German Research Center for Artificial Intelligence
with support by
as well as by

Automatic Language Identification

abbreviation(s): LID
definition: Automatic Language Identification (LID) is the process of identifying the natural language of a sample of speech or written text by an unknown speaker. Several important applications exist for LID, viz., as a front-end to, e.g., a call router in a telephone-based application or a multi-lingual speech recognition system.
See also the corresponding HLT Survey chapter: http://www.lt-world.org/hlt_survey/ltw-chapter8-7.pdf
related project(s):
  • Multilingual Content for Flexible Format Internet Premium Services (MEMPHIS)
  • NEgotiating trough SPOken Language in e-commerce (NESPOLE!)
related organisation(s):
  • Department of Computer Science and Automation
  • MIT - Spoken Language Systems
  • Spoken Language Translation Research Laboratories (SLT)
  • Institute of Applied Informatics and Formal Description Methods (AIFB)
  • Interactive System Labs (ISL)
  • Watson Research Centre
  • Information Systems Technology Group
related person(s):
  • Kay Berkling
  • Timothy James Hazen
  • Yeshwant Muthusamy
  • Marc A. Zissman
related system(s) / resource(s):
  • Rosette Language Identifier
  • PetaMem Language Identification
  • Languid
  • Mguesser
  • TextCat
  • CA : Language Identifier
  • Polyglot 3000
relevant source(s):
related publication(s):

Language Trees and Zipping.
D. Benedetto, E. Caglioti & V. Loreto.
Physical Review Letters, vol. 88, No.4. 2002.

Applying Monte Carlo Techniques to Language Identification.
A. Poutsma.
Proceedings of Computational Linguistics in the Netherlands. 2001.

Variable N-grams and Extensions for Conversational Speech Language Modeling.
M.H. Siu & M. Ostendorf.
IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 1, pp. 63-75. 2000.

Confidence Measure Based Language Identification.
F. Metze, T. Kamp, T. Schaaf, T. Schultz & H. Soltau.
IEEE International Conference on Acoustics, Speech, and Signal Processing. 2000.

Multilingual Speech Recognition.
A. Waibel, H. Soltau, T. Schultz, T. Schaaf & F. Metze.
Verbmobil: Foundations of Speech-to-Speech Translation. 2000.

Multilinguale Spracherkennung. Kombination akustischer Modelle zur Portierung auf neue Sprachen.
T. Schultz.
Dissertation. 2000.

Automatic Language Identification.
M. Zissman & K. Berkling.
Proceedings of the ESCA-NATO Workshop on Multi-Lingual Interoperability in Speech Technology. 1999.

Automatic Language Identification.
M. Zissman.
Wiley Encyclopedia of Electrical and Electronics Engineering. 1999.

An efficient phonotactic-acoustic system for language identification.
J. Narvrátil & W. Zühlke.
Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 1998.

Segment-based automatic language identification.
T.J. Hazen & V.W. Zue.
Journal of the Acoustical Society of America, Vol. 101, No.4, pp. 2323-2331. 1997.

Comparing two language identification schemes.
G. Grefenstette.
Proceedings of the 3rd International Conference on the Statistical Analysis of Textual Data. 1995.

N-gram-based Text Categorization.
W. Cavnar & J. Trenkle.
Symposium on Document Analysis and Information Retrieval. 1994.

Statistical Identification of Language.
T. Dunning.
Technical report CRL MCCS-94-273, Computing Research Lab, New Mexico State University, 1994.

On the Use of Data-Driven Clustering Technique for Identification of Poly- and Mono-phonemes for four European Languages.
O. Andersen, P. Dalsgaard & W. Barry.
IEEE International Conference on Acoustics, Speech, and Signal Processing. 1994.

A Comparison of Approaches to Automatic Language Identification Using Telephone Speech.
Y.K. Muthusamy, K. Berkling, T. Arai, R. Cole & E. Barnard.
Proceedings 3rd European Conference on Speech Communication and Technology. 1993.