Text Compression

T.C. Bell and J.G. Cleary and I.H. Witten.
Prentice-Hall. 1989.

Managing gigabytes: compressing and indexing documents and images.
I.H. Witten and A. Moffat and T.C. Bell .
Van Nostrand Reinhold. 1994.

Arithmetic Coding for Data Compression.
Ian H.Witten and Radford M. Neal and John G. Cleary.
Communications of the ACM. 1987. 520-540.

Methods for text compression identify and exploit redundancy in text documents in order to obtain a more condensed representation of the information, from which the original data can be recovered without modification (lossless compression). In theory, there is a close relation between compression and prediction: The better a statistical language model can estimate the probability of a word, given some context, the more the text as a whole can be compressed.