name: CodePressor author: Eli-Jean R. Leyssens and Tony Haines size: 1180 bytes needs: RISC OS descr.: Pretty good ARM code compressor ----------------------------------------------------------------------------- Syntax: CodePressr Note that CodePressr can only handle UNTYPED files, with a LOAD address of &8100 OR HIGHER. When Tony Haines sent me his beta of his Flu entry he told me he had implemented a new type of compression algorithm targeted specifically at ARM code. I had a look at it and discovered that it did indeed perform very well. I played around with it for a bit and then asked him whether he'd mind if I turned the algorithm into a 2K entry for CC#2. He said he'd be enjoyed if I did, so I set out to create the program which we now call CodePressor, as suggested by Tony. Just so you don't think I just turned it into a 2K program: I did add some improvements, increasing the algorithms performance ;) So, how does it work? Well, the basic algorithm is to for each word look back in a range of 256 words and find the word which has the most identical nibbles. Identical to the nibbles in the current word of course. You can then store a byte, indicating which word had the most identical nibbles, called distance, and another byte of which each bit indicates whether the matching nibble is identical or not. After that you only store the non-matching nibbles. So, if the best matching word has 6 identical nibbles, then that could be stored as 8 + 8 + ( 8 - 2) * 4 bits = 16 + 8 = 24, instead of 32. If not enough nibbles match, then you store one byte indicating an uncompressed word followed by the 8 nibbles making up that word. As you might have figured out by now the above only generates real compression for words whose best matching word has at least 5 identical nibbles. Still, this results in a good overall compression as a lot of "best" matches have 4, 5, 6, 7 or even 8 identical nibbles. However, on closer inspection of all the data that was being generated by this algorithm I discovered a few things: - a lot of best matching words have a distance of 16 (words) or less - in lots of cases the identical nibbles included the 4 top nibbles So, instead of always using 8 bits to store the distance I now encode both the distance and nibbles-mask as follows: %0000 0000 - %0000 1111 -> %0 0000 - %0 1111 %xxxx 0000 - %xxxx 1111 -> %1 xxxx 0000 - %1 xxxx 1111 Where xxxx <> 0. So, if only the bottom 4 bits are set then a 0 followed by the lower 4 bits are stored, otherwise a 1 bit, followed by all 8 bits. This gave the compression ratio an extra boost :) Furthermore, the algorithm really only works well on code. So, if there are one or more data blocks in the program that you wish to compress then it's very likely that most words in those blocks can not be compressed by the algorithm and thus must be stored as "uncompressed words". However, as I already said, and uncompressed word is stored by first storing a byte indicating an uncompressed word and then the uncompressed word. If there are lots of uncompressable words then this marker per word would seriously increase the resulting file. So, I added code to detect runs, or sequences, of uncompressable words. It then stored a marker, followed by a count. After that all the uncompressed without any further markers follow. Well, that's it basically. If you want to find out the exact workings then you're free to squander through the source although I doubt anybody will find it at all readable ;0 Files included in this archive are: CodePressr ; The program itself ReadMe ; This file More.!SetPaths ; Sets the CodePressorWork$Dir More.!CPress ; CodePresses the CodePress program, erhm... ;) More.CPSRC ; Source code More.Build ; Build instructions (Sources->Final) Thanks to Tony Haines for his brilliant algorithm, letting me improve it and turn it into a 2K entry for CC#2 and being very supportive during the CC#2 competition in general. If you hadn't guess by now: it's late, I've had a couple of beers and definately not too much sleep during the last couple of days. I also desperately need to release this so others can still use it for their CC#2 entries so I'm in a bit of a hurry as well. All in all not the best situation for writing a clear and understandable ReadMe ;) If you want to contact me or want to know more about Topix or Topix productions: Pervect : pervect@topix.student.utwente.nl TopixWEB: http://www.dse.nl/~topix Tony Haines can be reached at: a.s.haines@bham.ac.uk