LT World

You are here: Home kb Information & Knowledge Technologies Acoustic Modelling in Speech Recognition

Acoustic Modelling in Speech Recognition

  • Multimodal Information Group
  • ICSI Speech Group
  • Speech at CMU

  • Henning Reetz
  • Steven Greenberg
  • Alex Waibel
  • Jacques Koreman
  • Tanja Schultz

  • The CMU Sphinx Group Open Source Speech Recognition Engines (CMU Sphinx)

Modelling of basic recognition units in the microphone signal. These units are often phones (esp. if a large vocabulary is used), while systems with a small vocabulary sometimes use larger units like words. The acoustic signal is not used directly, but represented by spectral parameters derived from it. Spectral parameters that are often used are mel-frequency cepstral coefficients (MFCC's) or RASTA PLP coefficients (noise-robust linear predictive coding parameters), although many other parameter types, including parameters based on auditory processing or phonetic features, are also used sometimes. The models in most state-of-the-art systems are obtained through hidden Markov modelling (HMM), although dynamic time warping and neural nets are also used for acoustic modelling (the latter also in combination with HMM). A limited number of systems exist in which the acoustic modelling is not stochastic, but knowledge-based.