EarthBound/ASM/Decompression Routine

From Data Crystal
Jump to navigation Jump to search

This is a sub-page of EarthBound/ASM.

This subroutine takes a section of ROM that is compressed and outputs decompressed data to WRAM.

Subroutine Inputs

  • 0E-11 = Compressed data start (long pointer)
  • 12-15 = Decompressed output WRAM location (long pointer)

Note: There is no input parameter for size of compressed input or length available for decompressed output. The routine will read compressed data until encountering an encoded end, and it is up to the caller to understand the decompressed size of the data to ensure there is enough space at the output location.

Compression Format

Data is compressed using a hybrid compression scheme common to many HAL Laboratories games of the NES/SNES/GB era.

The compressed data is stored in blocks of data, each beginning in a 1 or 2 byte header that takes one of the following three forms:

  • 11111111
    End header. Marks the end of compressed data.
  • 111cccLLLLLLLLLL
    2-byte extended header, indicated by the initial 3 bits of 1. Contains a 3-bit control code, followed by a 10-bit length value.
  • cccLLLLL
    1-byte header. Contains a 3-bit control code, followed by a 5-bit length value.

The data block is then decompressed according to the control code:

  • 000
    Raw copy. Length + 1 bytes are copied from the input to the output.
  • 001
    8-bit run-length encoding. The next input byte is repeated to the output length + 1 times.
  • 010
    16-bit run-length encoding. The next 2 input bytes are repeated to the output length + 1 times.
  • 011
    Incrementing 8-bit run-length encoding. The next input byte is repeated to the output length + 1 times, incrementing the byte each time.
  • 100
    Repeat of previously-decoded data. The next two input bytes form a big endian offset from the start of decompressed data. Length + 1 bytes are copied from that offset.
  • 101
    Repeat of previously-decoded data with per-byte bit reversal. The next two input bytes form a big endian offset from the start of decompressed data. Length + 1 bytes are copied from that offset, but each byte has all bits reversed in order (0b10000000 -> 0b00000001, 0b01100010 -> 0b01000110, etc).
  • 110
    Backwards repeat of previously-decoded data. The next two input bytes form a big endian offset from the start of decompressed data. Length + 1 bytes are copied starting from that offset and then iterating backwards from that offset.
  • 111
    Not possible using a 1-byte header because the control code would be indistinguishable from the flag to indicate an extended 2-byte header. If indicated in an extended 2-byte header, this control code decompresses the same as code 100. It is possible that this is not intended to be a valid control code and the resulting behavior comes only from an unintentional fallthrough to the code 100 decompression logic.

Libraries/Tools

  • exhal/inhal by Devin Acker (Revenant). Decompression and compression tools, generally able to compress data to at least as small of a data size as that found in the EarthBound ROM. As of 2023, used internally by CoilSnake.
  • KompreSS by Sukasa. One of the oldest libraries (a .NET DLL), mostly only supporting decompression.
  • kirbyLzRle by Griever. Decompression and compression tools, written in Haskell.