Acoustic Modelling in Speech Recognition
Modelling of basic recognition units in the microphone signal. These units are often phones (esp. if a large vocabulary is used), while systems with a small vocabulary sometimes use larger units like words. The acoustic signal is not used directly, but represented by spectral parameters derived from it. Spectral parameters that are often used are mel-frequency cepstral coefficients (MFCC's) or RASTA PLP coefficients (noise-robust linear predictive coding parameters), although many other parameter types, including parameters based on auditory processing or phonetic features, are also used sometimes. The models in most state-of-the-art systems are obtained through hidden Markov modelling (HMM), although dynamic time warping and neural nets are also used for acoustic modelling (the latter also in combination with HMM). A limited number of systems exist in which the acoustic modelling is not stochastic, but knowledge-based.