2021/08/15

It's the little things ...

Ecstasy compiles to an Intermediate Representation (IR), which is a binary format intended to carry a rich set of information that can be used by a compiler back-end to generate the actual native code that will run on a machine's CPU. In the language and compiler field, Intermediate Representation is also know as: Intermediate Language (IL), byte code, bit code, p-code, and several other names.

Ecstasy IR was designed to be extremely tight (small), but it was also designed to support extremely large projects. To that end, all of the operators and addressing use 64 bit values; for example, the body of function can be 2^63-1 bytes long, and so the address (in either a relative or absolute form) specified by any jump operation must be able to encode 64-bit address values.

These two requirements are in natural conflict. The first says "keep it small", while the latter says "make it large". And this problem applies to many things besides jump addresses: register numbers, parameter counts, references into the constant pool, and so on. We recognized that solving this conflict in a uniform and efficient manner, and doing so up front, would be critical in keeping the design simple and consistent.

Using a byte-by-byte format, such as UTF-8, Portable Object Format (POF) encoded integers, or LEB-128 was deemed to be too computationally inefficient, in that the loop condition for the variable length encoding is re-evaluated within a tight loop (once per byte), completely trashing the CPU's branch predictor, and almost certainly killing instruction pipelining and speculative execution. Since these integers are used quite literally everywhere, a better encoding scheme was required.

Since a variable length encoding would be used, the resulting format was carefully designed to make the exact length of the entire value apparent from examining only the first byte, allowing all of the branch prediction penalties and possibilities for pipeline stalls to be lumped together at the front edge of the decoding process. Additionally, most CPUs have the ability to read even unaligned power-of-two-length integers from memory into a register and sign extend (either as part of the read, or with an inexpensive register-based op), so the design of the format allows for this possible optimization.

The XVM Integer Packing (XIP, pronounced "zip") format employs four encoding modes:

  • Tiny: For a value in the range -64..63 (7 bits), the value can be encoded in one byte. The least significant 7 bits of the value are shifted left by 1 bit, and the 0x1 bit is set to 1. When reading a packed integer, if bit 0x1 of the first byte is 1, then it's Tiny.
  • Small: For a value in the range -4096..4095 (13 bits), the value can be encoded in two bytes. The first byte contains the value 0x2 (010) in the least significant 3 bits, and bits 8-12 of the integer in bits 3-7; the second byte contains bits 0-7 of the integer.
  • Medium: For a value in the range -1048576..1048575 (21 bits), the value can be encoded in three bytes. The first byte contains the value 0x6 (110) in the least significant 3 bits, and bits 16-20 of the integer in bits 3-7; the second byte contains bits 8-15 of the integer; the third byte contains bits 0-7 of the integer.
  • Large: For a value in the range -(2^511)..2^511-1 (up to 512 bits), a value with s significant bits can be encoded in no less than 1+max(1,(s+7)/8) bytes; let b be the selected encoding length, in bytes. The first byte contains the value 0x0 in the least significant 2 bits (00), and the least 6 significant bits of (b-2) in bits 2-7. The following (b-1) bytes contain the least significant (b-1)*8 bits of the integer. 

The Ecstasy implementation for writing XIP'd integers is found in the DataOutput class, and the implementation for reading XIP'd integers is found in the DataInput class.

The Java implementations for reading and writing XIP'd integers can be found in the PackedInteger class.

In summary: Having a consistent and efficient mechanism to encode arbitrary 64-bit integers in a byte stream is a fundamental boost for an IR designer, because they no longer worry about the ugly trade-off between "will this support big enough structures in the real world?" and "will this waste space?"