EcstasyLang: 2021

2021/10/05

The Ecstasy discussion forum

Fairly recently, we've switched over to primarily using the GitHub discussion forums for public discussions of Ecstasy work. We'll be de-emphasizing our Slack channel.

The forums are a much easier place to manage and read discussion threads than moderated comments on a blog, so we'll be relying on it more and more for conversations around articles that are posted here, as well as for group design work, long range planning, etc.

If you have ideas for new blog posts, articles, design ideas, and so on, then please post a message on the forums. Thanks!

2021/09/05

How to use the Ecstasy Debugger

Until recently, debugging Ecstasy code was an exercise in frustration, because there was no debugger. One could step through the interpreter's operations in a Java debugger, which was not for the faint of heart. The alternative, not surprisingly, was to sprinkle one's code with print statements, such as:

@Inject Console console;
console.println($"The value of i={i}");

Need to see another variable? Change the code, recompile, and run again. Repeat until everything works.

This is, of course, how all programmers used to debug code, back between the invention of the printing press and the invention of the interactive debugger.

So it should come as no surprise that Ecstasy now has an interactive debugger. Albeit, currently prototyped in text mode.

To enable the debugger, add a single line to your test:

assert:debug;

Or to enable it on a particular condition:

assert:debug i != 3;

That will pop you into the interactive debugger mode:

Call stack frames:              | Variables and watches:
--------------------------------|---------------------------------
>0  TestSimple.run():10         | 0  this = (TestSimple.test.org)
 1  Service TestSimple.test.org | 1  i = (Int) 3

Now you can type '?' for help, which displays all of the debugger commands, including breakpoint management, code stepping (in/out/thru), watches, and so on.

And yes, it is text mode, so it looks like some legit UNIX tool from 1995. Need a GUI? We'll need to finish project X-wing to build that! 🤣

2021/08/15

It's the little things ...

Ecstasy compiles to an Intermediate Representation (IR), which is a binary format intended to carry a rich set of information that can be used by a compiler back-end to generate the actual native code that will run on a machine's CPU. In the language and compiler field, Intermediate Representation is also know as: Intermediate Language (IL), byte code, bit code, p-code, and several other names.

Ecstasy IR was designed to be extremely tight (small), but it was also designed to support extremely large projects. To that end, all of the operators and addressing use 64 bit values; for example, the body of function can be 2^63-1 bytes long, and so the address (in either a relative or absolute form) specified by any jump operation must be able to encode 64-bit address values.

These two requirements are in natural conflict. The first says "keep it small", while the latter says "make it large". And this problem applies to many things besides jump addresses: register numbers, parameter counts, references into the constant pool, and so on. We recognized that solving this conflict in a uniform and efficient manner, and doing so up front, would be critical in keeping the design simple and consistent.

Using a byte-by-byte format, such as UTF-8, Portable Object Format (POF) encoded integers, or LEB-128 was deemed to be too computationally inefficient, in that the loop condition for the variable length encoding is re-evaluated within a tight loop (once per byte), completely trashing the CPU's branch predictor, and almost certainly killing instruction pipelining and speculative execution. Since these integers are used quite literally everywhere, a better encoding scheme was required.

Since a variable length encoding would be used, the resulting format was carefully designed to make the exact length of the entire value apparent from examining only the first byte, allowing all of the branch prediction penalties and possibilities for pipeline stalls to be lumped together at the front edge of the decoding process. Additionally, most CPUs have the ability to read even unaligned power-of-two-length integers from memory into a register and sign extend (either as part of the read, or with an inexpensive register-based op), so the design of the format allows for this possible optimization.

The XVM Integer Packing (XIP, pronounced "zip") format employs four encoding modes:

Tiny: For a value in the range -64..63 (7 bits), the value can be encoded in one byte. The least significant 7 bits of the value are shifted left by 1 bit, and the 0x1 bit is set to 1. When reading a packed integer, if bit 0x1 of the first byte is 1, then it's Tiny.
Small: For a value in the range -4096..4095 (13 bits), the value can be encoded in two bytes. The first byte contains the value 0x2 (010) in the least significant 3 bits, and bits 8-12 of the integer in bits 3-7; the second byte contains bits 0-7 of the integer.
Medium: For a value in the range -1048576..1048575 (21 bits), the value can be encoded in three bytes. The first byte contains the value 0x6 (110) in the least significant 3 bits, and bits 16-20 of the integer in bits 3-7; the second byte contains bits 8-15 of the integer; the third byte contains bits 0-7 of the integer.
Large: For a value in the range -(2^511)..2^511-1 (up to 512 bits), a value with s significant bits can be encoded in no less than 1+max(1,(s+7)/8) bytes; let b be the selected encoding length, in bytes. The first byte contains the value 0x0 in the least significant 2 bits (00), and the least 6 significant bits of (b-2) in bits 2-7. The following (b-1) bytes contain the least significant (b-1)*8 bits of the integer.

The Ecstasy implementation for writing XIP'd integers is found in the DataOutput class, and the implementation for reading XIP'd integers is found in the DataInput class.

The Java implementations for reading and writing XIP'd integers can be found in the PackedInteger class.

In summary: Having a consistent and efficient mechanism to encode arbitrary 64-bit integers in a byte stream is a fundamental boost for an IR designer, because they no longer worry about the ugly trade-off between "will this support big enough structures in the real world?" and "will this waste space?"

2021/06/12

What is a type?

Another Coffee Compiler Club call, another concept to explain.

What is a type?

It seems like such a simple question, until you try to answer it. What is a type?

When we designed Ecstasy, we were intent on boiling down types to some pure form, some simple atomic model of constructing everything in our programming universe. Types are, in some way, the periodic table of program elements. The protons, neutrons, and electrons of program existence. If you can't explain the basic building blocks of your universe, how can you explain how anything works?

Let's start with the conclusion: An object's type is the sum of its behavior.

Not having any real background in type theory, I have no idea whether this is an obvious truism or a nutty novel notion. I'm going to assume the latter, only because I've never used a language with a type definition like this, and also because it provides an excellent opportunity to explain the concept.

Most type systems that I've known and used are built around some combination of identity, state, and behavior. Java object types, for example, are based entirely on identity: The identity of a class is its type; the identity includes the name of the class, and its ancestors, in terms of super classes and implemented interfaces. Java types do carry detailed information about state (fields) and behavior (methods), but those details don't define the type; only the identity defines the type. The question "Is some object reference o of type T?" is never answered by what fields the object contains, nor by what methods it has, but rather by its identity, and solely by its identity.

Of course, this was quite a leap forward from the answer in C (and by extension, C++); in C, the question "Is some object reference o of type T?" always has the answer "Yes". Got a pointer? It turns out that your pointer points to whatever type you tell the compiler it points to. Is it a Cat? Is it a Dog? Is it a Car? Is it a House? The answer is always "Yes!" One must admit that C is quite an agreeable language when it comes to types. (With this in mind, it's also easy to understand how C code is responsible for so many security flaws.)

But let's drop all of these notions on the floor, and start over. Let's start with a made-up syntax for defining a type:

type
    {
    // things that define what the type is
    }

Looks kind of like a C structure. And we often think about types in exactly this way: They have a name (identity), and they have structure (fields).

type Person
    {
    String firstName;
    String lastName;
    }

But by our definition, this is completely wrong! We claimed that a type is only the sum of its behavior, and this example has only identity and state instead. So let's fix this, temporarily, and in the ugliest manner possible:

type // if we could name it, we'd call it "Person"
    {
    String getFirstName();
    void setFirstName(String);
    String getLastName();
    void setLastName(String);
    }

Interesting. Ugly, but interesting. We've turned state into behavior. And we turned the identity into a comment. But let's try an experiment: We'll allow a type to have an optional identity (including type name, type parameters, and ancestor types), just to make it easier to describe the types, and we'll create a few types that we can re-use to avoid some of that awful boilerplate:

type Ref<T>
    {
    T get();
    }

type Var<T> : Ref<T>
    {
    void set(T);
    }

type Person
    {
    Var<String> firstName();
    Var<String> lastName();
    }

So there is no state, per se, and the identity exists solely as a convenience for us, the reader, but we're almost back to where we started. In fact, if we introduce a short-hand notation for a zero-parameter method that returns a Var<T>, we are back where we started:

type Person
    {
    String firstName;  // this just means "Var<String> firstName()"
    String lastName;   // this just means "Var<String> lastName()"
    }

We're close, but not done, because we have referenced another type, "String". And what is a "String"? It's just another type, that has to be defined in the exact same way as "Person". To define a String, it helps to have an Array. To define an Array, which has a size, it helps to have an Int64. To define an Int64, it helps to have another Array of Bit. To define a Bit, it helps to have an IntLiteral to represent a 0 or 1. And to define an IntLiteral, it helps to have a String.

In other words, the type system forms a closed loop. All types are defined from other types, and types are defined solely by their behavior.

And behavior? Behavior is simply defined as a set of named methods, each taking zero or more typed parameters, and returning zero or more typed results.

So what does this mean?

To oversimplify the conclusion, it means that a mathematician can use set theory to implement a type calculus for such a type system. Really, that's it. You know, like curing cancer, or finding the holy grail.

In Ecstasy, all types are defined like this. Even Type is a Type.

Riddle me this

It should be pretty obvious now why we refer to this as the "Turtles Type System", since it's turtles the whole way down. One of the interesting riddles we encountered early on looked something like this:

type A
    {
    B foo();
    }
    
type B
    {
    A foo();
    }

Question: What is the difference between an A and a B?

Rules

One of the interesting things with such a type system is how easy it is to construct recursive rules from it. For example, we say that a method m consumes type T if any of the following holds true:

m has a parameter type declared as T;
m has a parameter type that produces T;
m has a return type that consumes T.

Similarly, we say that a method m produces type T if any of the following holds true:

m has a return type declared as T;
m has a return type that produces T;
m has a parameter type that consumes T.

These rules form the basis for checking the legality of things like method variance, such as co-variance and contra-variance, which in turn allows the type system to intelligently enforce type safety.

2021/05/21

What is a Property?

On a recent Cliff Click Coffee Compiler Club call, this question came up: What exactly is an Ecstasy property? It turns out that a property is a very obvious and simple thing, yet explaining it is not so simple.

Developers have different expectations when they hear the word "property", including:

It's just a named field in a structure.
It's something that has a getter and a setter.

These are logical expectations, because in languages like C++ and Java, "object properties" are just fields in structures, and in Java, the getter and setter methods are a well-known way to expose private fields as public virtual methods.

But unfortunately, starting with this train of thought takes us in the wrong direction, so let's forget all of this historical context, and back up to the beginning: What exactly is an Ecstasy property?

First, it is important to appreciate where an Ecstasy property exists:

A property can be declared inside any class, including module and package classes;
A property can be declared inside a property;
A property can be declared inside a method.

That a property can exist inside a class is not unusual, but it is a bit unusual that a property can exist inside another property, or even inside of a method.

In Ecstasy, everything is an object, so it follows that a property is an object. Objects have types. So what is the type of a property? A property, like a local variable, is an instance of Ref, a reference. If the property is mutable, then (also like a local variable), it is an instance of Var, which extends Ref.

Since a property is an object, and objects are instances of a class, then what is the class of a property? The class of a property is unknowable within Ecstasy. That does not mean that the property does not have a class; it simply means that the class is not visible from within the running code. Let's take a simple example:

  class Person
      {
      Int age;
      }

When we have an instance of Person, we can ask that object's reference for its actual class, and if it was created within the current container, it will return the Person class. But it's also possible that an object reference comes from outside of the container, in which case asking for the actual class will not return the actual class, but will instead return just the interface type through which the object can be viewed; this is the basis for container security, and is a fundamental building block of Ecstasy's strong security model.

When the Ecstasy runtime starts up, and an Ecstasy application is loaded and starts running, it is running in the outermost Ecstasy container, called "container 0", which is the container within which the application's module was loaded, and within which all other containers and objects are created, so one would think that the applications' properties would also be created within that "container 0" ... but that would be incorrect. In order for the initial application "container 0" to be created, there had to already be a Container class, and since that class comes from the core Ecstasy module, that means that the core Ecstasy module was already loaded in some container before "container 0" was created. And since it's "turtles the whole way down", it should be obvious that "container 0" is itself actually sitting on top of an infinite stack of turtles, which for purposes of keeping this short, we will simply refer to as "container -1".

"Wait ... what?!?" I can almost hear the WTFs being hurled at computer screens everywhere. But here's the simple truth: Anything outside of the container that the application is loaded within is simply unknowable. So if the application is started in something that we call "container 0", and a container always exists within a container, then we know that there must be some "container -1", if only because otherwise there couldn't be a "container 0". And just to keep this short and as-simple-as-possible, the runtime itself is that unknowable outer container, and the runtime itself is the container that loaded the Ecstasy module, and the runtime itself is the thing that knows how to "new" a class, and to automatically "new" whatever class is automatically used for each property as well. And in reality, that doesn't actually happen -- each property couldn't actually be a new object, right?

As with many things in Ecstasy, the answer is purposefully unknowable. If you ask for a property's reference, you do get back a usable object -- one that you can reflect on, pass around, store in a property somewhere, or whatever it is that you do with objects -- so obviously the property object "exists", by some definition; but like all turtles, it may not have existed before you looked at it, and it may not exist when you're not looking at it.

But here's where things get seriously cool: Since a property does have a class, we can augment that class! Of course, we don't use the extend keyword like when we sub-class (because we don't know what class to extend) ... but we can write a mixin for the property, because we do know the type to mix into! In fact, lots of functionality in Ecstasy is built by writing mixins that can be mixed into properties and local variables, such as futures, lazily calculated values, and watched values.

Furthermore, we can augment a property where we define it, as if it were a class. Here's a silly example:

  module Test
      {
      @Inject Console console;
      Log log = new ecstasy.io.ConsoleLog(console);

      void run()
          {
          log.add("Simple property example!");

          val o = new TestClass();
          for (Int i : 0..5)
              {
              val n = o.x;
              }

          o.&x.foo(); // &x gets the property, instead of de-referencing it
          }

      class TestClass
          {
          Int x
              {
              @Override Int get()
                  {
                  ++count;
                  return super();
                  }

              void foo()
                  {
                  log.add($"Someone accessed this property {count} times!");
                  }

              private Int count;
              }
          }
      }

And when we run it:

++++++ Loading module: Test +++++++

Simple property example!
Someone accessed this property 6 times!

Process finished with exit code 0

So we can augment our property with code where we define the property, we can mix in predefined functionality into a property (again, it's not magic, because you could have written those mixins yourself!), and we can even modify a property's behavior on a sub-class (assuming that the property wasn't private), because the subclass' property's class implicitly extends the super-class' property's class.

Okay, that was a lot of information, but it conveys an important point: A property isn't just some field in a structure. It's a real object, with a real class, and it behaves like a real object, with a real class.

But what about the property's value? Where is it stored? The Ecstasy type system determines which properties require a field for their storage, and automatically includes those fields in the underlying structure that is defined for each class, and thus exists for each object. In other words, all of a property's state is stored in whatever class the property "rolls up" into. Here are two simple examples:

  class Example
      {
      Int x; // this has a field on Example:struct

      Int y.get()
          {
          return 7; // this does not have a field
          }
      }

The type system has rules that determine when a field is required. The compiler uses these rules. The runtime uses these same rules. If a field is required, then the field will exist. If the field is not required, then the field will not exist.

So how is that field accessed? Well normally, we don't even think about that. If there's an object o with a property x, we just dereference the property o.x, and we never think about the field. But conceptually, the field is accessed by the last method in the call chain for the get() method on the property, so if you don't override the get() method, then accessing the property goes straight to the field.

Alternatively, sometimes it is necessary to work with an object's structure directly. A serialization library, for example, may need to access the field values to store them off, and subsequently build a new structure using that stored-off data to re-instantiate the corresponding object. Instead of trying to paste an example here, it makes more sense just to point to the JSON serialization implementation that does exactly this. While it might seem like a common thing (directly accessing a field), it turns out that the only places in the entire Ecstasy code base where fields are being accessed directly are (i) serialization implementations and (ii) tests of the compiler and runtime itself.

So, back to the initial question: What is an Ecstasy property?

A property has a name.
A property represents state with a value, of a type.
A property is contained within a class, a property, or a method.
A property is a container of classes, properties, and methods.
A property is itself a class.
A property can be customized, mixed into, inherited, and overridden.
Properties are virtual.