# Statistical Modeling and Classification

definition: In most applications of human language technology some tasks cannot be solved by purely deductive (rule-based) approaches, but need quantitative mechanisms to pick the most plausible out of a larger set of potential outcomes, or rank a set of possibilities. Often, the required preferences can be extracted from training examples by suitable statistical techniques. Statistical language modeling for speech recognition and text retrieval and categorization have been among the earliest applications. today this also includes speech understanding, information extraction and word sense disambiguation. Recent work in many subfields of HLT focusses on the integration of statistical (implicit) and rule-based (explicit) knowledge.

See also the corresponding HLT Survey chapter: http://www.lt-world.org/hlt_survey/ltw-chapter11-2.pdf

related person(s):

- Eugene Charniak
- Steve J. Young
- Hermann Ney
- Robert Schapire

relevant source(s):

related publication(s):

*Foundations of Statistical Natural Language Processing.*

Christopher D. Manning and Hinrich Schütze.

MIT Press. Cambridge, MA.,1999.

*The Nature of Statistical Learning Theory.*

V. Vapnik.

Springer, NY. 199