Buck Rogers - Countdown to Doomsday/ROM map

From Data Crystal
Jump to navigation Jump to search

Chip tiny.png The following article is a ROM map for Buck Rogers - Countdown to Doomsday.

Compression

A compression algorithm is used for most data (gfx, tilemaps, text). It's a variable length encoding, where a code can refer to a regular unencrypted data, or a reference to previous decrypted data.

Description on an example

Data stream reading

At the beginning, a new element is 8 or 9 bits. If the value on 8 bits is in a certain range, it's considered as a single regular character. If it's outside, it's an entry in a dictionary coded on 9 bits, so one more bit is read and placed before the 8 bits already read.

The range for regular characters shifts at each new character read. It starts with 0x02 - 0xff.

Let's take the piece of text at 0x433bf:

0x433bf: 59       4F       55       20       4A       4F       49       4E       45       44       20       09      A7       90 
         01011001|01001111|01010101|00100000|01001010|01001111|01001001|01001110|01000101|01000100|00100000|000010011|01001111|0010000
  • 1st 8bits are 0b01011001=0x59 ; range is 0x02 - 0xff so it's a regular character: 0x59
  • 2nd 8bits are 0b01001111=0x4f ; range is 0x03 - 0xff so it's a regular character: 0x4f
  • 3rd 8bits are 0b01010101=0x55 ; range is 0x04 - 0xff so it's a regular character: 0x55
  • 4th 8bits are 0b00100000=0x20 ; range is 0x05 - 0xff so it's a regular character: 0x20
  • 5th 8bits are 0b01001010=0x4a ; range is 0x06 - 0xff so it's a regular character: 0x4a
  • 6th 8bits are 0b01001111=0x4f ; range is 0x07 - 0xff so it's a regular character: 0x4f
  • 7th 8bits are 0b01001001=0x49 ; range is 0x08 - 0xff so it's a regular character: 0x49
  • 8th 8bits are 0b01001110=0x4e ; range is 0x09 - 0xff so it's a regular character: 0x4e
  • 9th 8bits are 0b01000101=0x45 ; range is 0x0a - 0xff so it's a regular character: 0x45
  • 10th 8bits are 0b01000100=0x44 ; range is 0x0b - 0xff so it's a regular character: 0x44
  • 11th 8bits are 0b00100000=0x20 ; range is 0x0c - 0xff so it's a regular character: 0x20
  • 12th 8bits are 0b00001001=0x09 ; range is 0x0d - 0xff, we're outside: it's a special code, and one more bit is read and placed before the 8 bits. Next bit is 1 (first bit of 0xa7) so the special code is 0b100001001=0x109
  • 13th 8 bits are 0b01001111=0x4f (last 7 bits of 0xa7 and first bit of 0x90) ; range is 0x0e-0xff so it's a regular character: 0x4f

Eventually, the special code range will be 0xff-0xff.

At this moment the data will be read by 9 bits pieces (or 10 bits for special codes), and the range will be reset to -1 - 0x1ff. The value -1 being unsigned, it's coded as 0xffff and consequently, the value at this moment is automatically special.

For this example, it happens at 0x434d6, but let's start a little earlier:

0x434d3: 56       CA       78       98       CF      94
         0101|011011001|010011110|0010011000|110011111|0010100
  • 253th 8bits are 0b01101100=0xcf ; range is 0xfe - 0xff, we're outside: it's a special code, and one more bit is read and placed before the 8 bits. Next bit is 1 so the special code is 0b101101100=0x1cf
  • 254th 8bits are 0b01001111=0x4f ; range is 0xff - 0xff, we're outside: it's a special code, and one more bit is read and placed before the 8 bits. Next bit is 0 so the special code is 0b001001111=0x4f
  • entries are now 9 bits and range is 0xffff-0x1ff, so the next entry is necessarily special. Next 9 bits are 0b001001100 and the 10th bit 0 is placed before, so the entry is 0b0001001100=0x4c
  • range is now 0x0000-0x1ff, so the next entry is normal. Next 9 bits are 0b110011111=0x19f

When the range will be 0x1ff-0x1ff, entries will be read by 10 bits (or 11 for special codes), and range will be reset to 0-0x3ff, and so on...

Decoding data

Code 00 to ff are regular data.

Code 100 and 101 are special code (for example 101 is the end of a text chunk)

From 102, code refers to previous decoded pair of decoded data :

  • code 102 are decoded data at pos 0 and 1 (so here 59 and 4F)
  • code 103 are decoded data at pos 1 and 2 (so here 4F and 55)
  • code 104 are decoded data at pos 1 and 2 (so here 55 and 20)
  • and so on

It may happen that previously decoded data is itself >= 102. In this case, the code is recursively decoded in the following way:

  • first code is completely decoded
  • only first totally decoded byte of second code is decoded

For example, if a code 200 refers to [102, 104], then 102 will be totally decoded as (59, 4F), but only first byte of 104 will be concatenated, resulting to (59, 4F, 55).

If later code 300 refers to [200, 200], then the first 200 will be totally decoded to (59, 4F, 55), but only first byte (59) of the second 200 will be concatenated, resulting to (59, 4F, 55, 59).

Let's look at the already decoded data at 0x433bf:

position     : 00 01 02 03 04 05 06 07 08 09 0A 0B    0C 0D 0E 0F    10 11 12 13 14 15    
decoded data : 59 4F 55 20 4A 4F 49 4E 45 44 20 109   4F 20 54 10E   46 49 47 48 54 10F   
output stream: 59 4F 55 20 4A 4F 49 4E 45 44 20 4E 45 4F 20 54 4F 20 46 49 47 48 54 20 54 
ascii        :  Y  O  U     J  O  I  N  E  D     N  E  O     T  O     F  I  G  H  T     T

Decoder in Python

# Buffer is a custom class, used for both source data and decompressed data
# 2 needed methods:
#    * read_bits(n): read n bits from the current position in the buffer 
#      (and updates current position)
#    * write(list): write a list of bytes to the current position in the buffer
#      (and updates current position)


# resolve recursively dictionary entry
def get_code(c, dictionary):
    # stop condition:
    if c < 0x100:
        return [c]

    elif c in dictionary:
        # entry in dictionary
        # recursively decode both arguments
        first, second = dictionary[c]
        # keep only first byte of second
        return get_code(first, dictionary) + [get_code(second, dictionary)[0]]
    else:
        raise Exception()

def decode(source):
    range_min = 2
    range_max = 0x1ff

    current_dictionary_entry = 0x102

    dictionary = {}
    # current entry in the dictionary. 
    current_entry = []

    decoded = Buffer()

    
    size = 8
    while True:

        code = source.read_bits(size)
        
        if code <= range_min:
            # code not in range: we read another bit and place it before
            # already read bits
            msb = source.read_bits(1)
            code |= (msb << size)
    
        if current_dictionary_entry == code:
            # special case: if code refers to the current location,
            # the last read value is repeated
            current_entry.append(current_entry[-1])
        else:
            current_entry.append(code)
    
        # if current_entry contains less than 2 values, that's because
        # we're still in the 2 first bytes of the encoded stream, so
        # no need to store in the dictionary
        if len(current_entry) >= 2:
            # store new entry in dictionary
            dictionary[current_dictionary_entry] = current_entry.copy()

            # remove first value (pos n - 2) in the current entry
            del current_entry[0]

            # increments range_min but emulating a 16 bits undisgned value
            range_min = (range_min + 1) & 0xffff   
            current_dictionary_entry += 1
       
            if current_dictionary_entry == range_max:
                # code are 1 bit longer now
                size += 1
                range_max = 2**(size + 1) - 1
                # special value to force next value to (size + 1) bits 
                range_min = 0xffff
                        
        if code in [0x100, 0x101]:
            break

        # write a sequence of bytes to Buffer    
        decoded.write(get_code(code, dictionary))

    return decoded