Preprocessing text to make it more compressible

from blog John D. Cook, | ↗ original
Repetitive text compresses efficiently. Text like the lyrics to Jingle Bells ought to compress more efficiently than ordinary prose, assuming the compression algorithm can exploit the repetition. The idea of the Burrows-Wheeler transform is to permute text in before compressing it. The hope is that the permutation will make the repetition in the...