xNBA/contrib/compressor/algorithm.doc

The  compressor achieves  an  average compression  rate of 60%  of the
original size which is on par with "gzip". It seems that you cannot do
much better for compressing  compiled  binaries.  This means that  the
break even  point  for using compressed  images is   reached, once the
uncompressed size approaches 1.5kB. We  can stuff more than 12kB  into
an 8kB EPROM and more than 25kB into an 16kB EPROM.   As there is only
32kB of RAM  for both the uncompressed  image  and its BSS  area, this
means that 32kB EPROMs will hardly ever be required.

The compression  algorithm uses a  4kB  ring buffer  for buffering the
uncompressed data. Before   compression starts,  the  ring buffer   is
filled  with spaces (ASCII  character  0x20).  The algorithm tries  to
find repeated  input sequences of a  maximum length of  60 bytes.  All
256 different input  bytes  plus the 58 (60   minus a threshold of  2)
possible  repeat lengths form a set  of 314 symbols. These symbols are
adaptively Huffman encoded.  The  algorithm starts out with a Huffmann
tree  that  assigns equal code lengths    to each of  the  314 symbols
(slightly favoring the repeat  symbols over symbols for regular  input
characters), but  it will be changed whenever  the frequency of any of
the symbols  changes. Frequency counts are  kept in 16bit  words until
the total number of compressed codes totals 2^15.  Then, all frequency
counts will be halfed (rounding to the bigger number).  For unrepeated
characters (symbols 0..255) the Huffman code  is written to the output
stream.  For repeated characters the  Huffmann code, which denotes the
length of the repeated character sequence, is written out and then the
index in the ring buffer is computed.   From this index, the algorithm
computes  the offset   relative to  the current  index  into  the ring
buffer. Thus,  for typical input data,  one would expect that short to
medium range offsets are more frequent  than extremely short or medium
range to long range offsets. Thus the  12bit (for a 4kB buffer) offset
value  is statically Huffman encoded  using a precomputed Huffman tree
that favors  those  offset  values    that  are deemed to   be    more
frequent. The  Huffman encoded offset  is  written to the output  data
stream,  directly  following the code  that   determines the length of
repeated characters.

This algorithm, as implemented in the  C example code, looks very good
and  its operating parameters are   already well optimized. This  also
explains   why  it achieves     compression ratios    comparable  with
"gzip". Depending on the input data, it sometimes excells considerably
beyond what "gzip -9" does, but this  phenomenon does not appear to be
typical. There are some flaws with  the algorithm, such as the limited
buffer  sizes, the  adaptive  Huffman tree  which takes  very  long to
change, if    the input  characters  experience   a sudden   change in
distribution, and the static Huffman   tree for encoding offsets  into
the  buffer.   The slow  changes of   the  adaptive  Huffman  tree are
partially counteracted by  artifically keeping  a 16bit precision  for
the frequency counts, but  this does not  come into play until 32kB of
compressed data is output, so  it does not  have any impact on our use
for "etherboot", because  the BOOT Prom  does not support uncompressed
data of more then 32kB (c.f. doc/spec.doc).

Nonetheless,  these problems  do  not  seem  to affect  compression of
compiled  programs very much.  Mixing  object code with English  text,
would not work too  well though, and  the algorithm should be reset in
between. Actually, we  might  gain a little  improvement, if  text and
data   segments    were compressed  individually,    but   I have  not
experimented with this option, yet.
Initial revision 2005-05-17 16:44:57 +00:00			`The compressor achieves an average compression rate of 60% of the`
			`original size which is on par with "gzip". It seems that you cannot do`
			`much better for compressing compiled binaries. This means that the`
			`break even point for using compressed images is reached, once the`
			`uncompressed size approaches 1.5kB. We can stuff more than 12kB into`
			`an 8kB EPROM and more than 25kB into an 16kB EPROM. As there is only`
			`32kB of RAM for both the uncompressed image and its BSS area, this`
			`means that 32kB EPROMs will hardly ever be required.`

			`The compression algorithm uses a 4kB ring buffer for buffering the`
			`uncompressed data. Before compression starts, the ring buffer is`
			`filled with spaces (ASCII character 0x20). The algorithm tries to`
			`find repeated input sequences of a maximum length of 60 bytes. All`
			`256 different input bytes plus the 58 (60 minus a threshold of 2)`
			`possible repeat lengths form a set of 314 symbols. These symbols are`
			`adaptively Huffman encoded. The algorithm starts out with a Huffmann`
			`tree that assigns equal code lengths to each of the 314 symbols`
			`(slightly favoring the repeat symbols over symbols for regular input`
			`characters), but it will be changed whenever the frequency of any of`
			`the symbols changes. Frequency counts are kept in 16bit words until`
			`the total number of compressed codes totals 2^15. Then, all frequency`
			`counts will be halfed (rounding to the bigger number). For unrepeated`
			`characters (symbols 0..255) the Huffman code is written to the output`
			`stream. For repeated characters the Huffmann code, which denotes the`
			`length of the repeated character sequence, is written out and then the`
			`index in the ring buffer is computed. From this index, the algorithm`
			`computes the offset relative to the current index into the ring`
			`buffer. Thus, for typical input data, one would expect that short to`
			`medium range offsets are more frequent than extremely short or medium`
			`range to long range offsets. Thus the 12bit (for a 4kB buffer) offset`
			`value is statically Huffman encoded using a precomputed Huffman tree`
			`that favors those offset values that are deemed to be more`
			`frequent. The Huffman encoded offset is written to the output data`
			`stream, directly following the code that determines the length of`
			`repeated characters.`

			`This algorithm, as implemented in the C example code, looks very good`
			`and its operating parameters are already well optimized. This also`
			`explains why it achieves compression ratios comparable with`
			`"gzip". Depending on the input data, it sometimes excells considerably`
			`beyond what "gzip -9" does, but this phenomenon does not appear to be`
			`typical. There are some flaws with the algorithm, such as the limited`
			`buffer sizes, the adaptive Huffman tree which takes very long to`
			`change, if the input characters experience a sudden change in`
			`distribution, and the static Huffman tree for encoding offsets into`
			`the buffer. The slow changes of the adaptive Huffman tree are`
			`partially counteracted by artifically keeping a 16bit precision for`
			`the frequency counts, but this does not come into play until 32kB of`
			`compressed data is output, so it does not have any impact on our use`
			`for "etherboot", because the BOOT Prom does not support uncompressed`
			`data of more then 32kB (c.f. doc/spec.doc).`

			`Nonetheless, these problems do not seem to affect compression of`
			`compiled programs very much. Mixing object code with English text,`
			`would not work too well though, and the algorithm should be reset in`
			`between. Actually, we might gain a little improvement, if text and`
			`data segments were compressed individually, but I have not`
			`experimented with this option, yet.`