2021/06/12

What is a type?

Another Coffee Compiler Club call, another concept to explain.

What is a type?

It seems like such a simple question, until you try to answer it. What is a type?

When we designed Ecstasy, we were intent on boiling down types to some pure form, some simple atomic model of constructing everything in our programming universe. Types are, in some way, the periodic table of program elements. The protons, neutrons, and electrons of program existence. If you can't explain the basic building blocks of your universe, how can you explain how anything works?

Let's start with the conclusion: An object's type is the sum of its behavior.

Not having any real background in type theory, I have no idea whether this is an obvious truism or a nutty novel notion. I'm going to assume the latter, only because I've never used a language with a type definition like this, and also because it provides an excellent opportunity to explain the concept.

Most type systems that I've known and used are built around some combination of identity, state, and behavior. Java object types, for example, are based entirely on identity: The identity of a class is its type; the identity includes the name of the class, and its ancestors, in terms of super classes and implemented interfaces. Java types do carry detailed information about state (fields) and behavior (methods), but those details don't define the type; only the identity defines the type. The question "Is some object reference o of type T?" is never answered by what fields the object contains, nor by what methods it has, but rather by its identity, and solely by its identity.

Of course, this was quite a leap forward from the answer in C (and by extension, C++); in C, the question "Is some object reference o of type T?" always has the answer "Yes". Got a pointer? It turns out that your pointer points to whatever type you tell the compiler it points to. Is it a Cat? Is it a Dog? Is it a Car? Is it a House? The answer is always "Yes!" One must admit that C is quite an agreeable language when it comes to types. (With this in mind, it's also easy to understand how C code is responsible for so many security flaws.)

But let's drop all of these notions on the floor, and start over. Let's start with a made-up syntax for defining a type:

type
{
// things that define what the type is
}

Looks kind of like a C structure. And we often think about types in exactly this way: They have a name (identity), and they have structure (fields).

type Person
{
String firstName;
String lastName;
}

But by our definition, this is completely wrong! We claimed that a type is only the sum of its behavior, and this example has only identity and state instead. So let's fix this, temporarily, and in the ugliest manner possible:

type // if we could name it, we'd call it "Person"
{
String getFirstName();
void setFirstName(String);
String getLastName();
void setLastName(String);
}

Interesting. Ugly, but interesting. We've turned state into behavior. And we turned the identity into a comment. But let's try an experiment: We'll allow a type to have an optional identity (including type name, type parameters, and ancestor types), just to make it easier to describe the types, and we'll create a few types that we can re-use to avoid some of that awful boilerplate:

type Ref<T>
{
T get();
}

type Var<T> : Ref<T>
{
void set(T);
}

type Person
{
Var<String> firstName();
Var<String> lastName();

So there is no state, per se, and the identity exists solely as a convenience for us, the reader, but we're almost back to where we started. In fact, if we introduce a short-hand notation for a zero-parameter method that returns a Var<T>, we are back where we started:

type Person
{
String firstName; // this just means "Var<String> firstName()"
String lastName; // this just means "Var<String> lastName()"
}

We're close, but not done, because we have referenced another type, "String". And what is a "String"? It's just another type, that has to be defined in the exact same way as "Person". To define a String, it helps to have an Array. To define an Array, which has a size, it helps to have an Int64. To define an Int64, it helps to have another Array of Bit. To define a Bit, it helps to have an IntLiteral to represent a 0 or 1. And to define an IntLiteral, it helps to have a String.

In other words, the type system forms a closed loop. All types are defined from other types, and types are defined solely by their behavior.

And behavior? Behavior is simply defined as a set of named methods, each taking zero or more typed parameters, and returning zero or more typed results.

So what does this mean?

To oversimplify the conclusion, it means that a mathematician can use set theory to implement a type calculus for such a type system. Really, that's it. You know, like curing cancer, or finding the holy grail.

In Ecstasy, all types are defined like this. Even Type is a Type.

Riddle me this

It should be pretty obvious now why we refer to this as the "Turtles Type System", since it's turtles the whole way down. One of the interesting riddles we encountered early on looked something like this:

type A
{
B foo();
}

type B
{
A foo();
}

Question: What is the difference between an A and a B?

Rules

One of the interesting things with such a type system is how easy it is to construct recursive rules from it. For example, we say that a method m consumes type T if any of the following holds true:

  1. m has a parameter type declared as T;
  2. m has a parameter type that produces T;
  3. m has a return type that consumes T.

Similarly, we say that a method m produces type T if any of the following holds true:

  1. m has a return type declared as T;
  2. m has a return type that produces T;
  3. m has a parameter type that consumes T.

These rules form the basis for checking the legality of things like method variance, such as co-variance and contra-variance, which in turn allows the type system to intelligently enforce type safety.