2021/06/12

What is a type?

Another Coffee Compiler Club call, another concept to explain.

What is a type?

It seems like such a simple question, until you try to answer it. What is a type?

When we designed Ecstasy, we were intent on boiling down types to some pure form, some simple atomic model of constructing everything in our programming universe. Types are, in some way, the periodic table of program elements. The protons, neutrons, and electrons of program existence. If you can't explain the basic building blocks of your universe, how can you explain how anything works?

Let's start with the conclusion: An object's type is the sum of its behavior.

Not having any real background in type theory, I have no idea whether this is an obvious truism or a nutty novel notion. I'm going to assume the latter, only because I've never used a language with a type definition like this, and also because it provides an excellent opportunity to explain the concept.

Most type systems that I've known and used are built around some combination of identity, state, and behavior. Java object types, for example, are based entirely on identity: The identity of a class is its type; the identity includes the name of the class, and its ancestors, in terms of super classes and implemented interfaces. Java types do carry detailed information about state (fields) and behavior (methods), but those details don't define the type; only the identity defines the type. The question "Is some object reference o of type T?" is never answered by what fields the object contains, nor by what methods it has, but rather by its identity, and solely by its identity.

Of course, this was quite a leap forward from the answer in C (and by extension, C++); in C, the question "Is some object reference o of type T?" always has the answer "Yes". Got a pointer? It turns out that your pointer points to whatever type you tell the compiler it points to. Is it a Cat? Is it a Dog? Is it a Car? Is it a House? The answer is always "Yes!" One must admit that C is quite an agreeable language when it comes to types. (With this in mind, it's also easy to understand how C code is responsible for so many security flaws.)

But let's drop all of these notions on the floor, and start over. Let's start with a made-up syntax for defining a type:

type
{
// things that define what the type is
}

Looks kind of like a C structure. And we often think about types in exactly this way: They have a name (identity), and they have structure (fields).

type Person
{
String firstName;
String lastName;
}

But by our definition, this is completely wrong! We claimed that a type is only the sum of its behavior, and this example has only identity and state instead. So let's fix this, temporarily, and in the ugliest manner possible:

type // if we could name it, we'd call it "Person"
{
String getFirstName();
void setFirstName(String);
String getLastName();
void setLastName(String);
}

Interesting. Ugly, but interesting. We've turned state into behavior. And we turned the identity into a comment. But let's try an experiment: We'll allow a type to have an optional identity (including type name, type parameters, and ancestor types), just to make it easier to describe the types, and we'll create a few types that we can re-use to avoid some of that awful boilerplate:

type Ref<T>
{
T get();
}

type Var<T> : Ref<T>
{
void set(T);
}

type Person
{
Var<String> firstName();
Var<String> lastName();

So there is no state, per se, and the identity exists solely as a convenience for us, the reader, but we're almost back to where we started. In fact, if we introduce a short-hand notation for a zero-parameter method that returns a Var<T>, we are back where we started:

type Person
{
String firstName; // this just means "Var<String> firstName()"
String lastName; // this just means "Var<String> lastName()"
}

We're close, but not done, because we have referenced another type, "String". And what is a "String"? It's just another type, that has to be defined in the exact same way as "Person". To define a String, it helps to have an Array. To define an Array, which has a size, it helps to have an Int64. To define an Int64, it helps to have another Array of Bit. To define a Bit, it helps to have an IntLiteral to represent a 0 or 1. And to define an IntLiteral, it helps to have a String.

In other words, the type system forms a closed loop. All types are defined from other types, and types are defined solely by their behavior.

And behavior? Behavior is simply defined as a set of named methods, each taking zero or more typed parameters, and returning zero or more typed results.

So what does this mean?

To oversimplify the conclusion, it means that a mathematician can use set theory to implement a type calculus for such a type system. Really, that's it. You know, like curing cancer, or finding the holy grail.

In Ecstasy, all types are defined like this. Even Type is a Type.

Riddle me this

It should be pretty obvious now why we refer to this as the "Turtles Type System", since it's turtles the whole way down. One of the interesting riddles we encountered early on looked something like this:

type A
{
B foo();
}

type B
{
A foo();
}

Question: What is the difference between an A and a B?

Rules

One of the interesting things with such a type system is how easy it is to construct recursive rules from it. For example, we say that a method m consumes type T if any of the following holds true:

  1. m has a parameter type declared as T;
  2. m has a parameter type that produces T;
  3. m has a return type that consumes T.

Similarly, we say that a method m produces type T if any of the following holds true:

  1. m has a return type declared as T;
  2. m has a return type that produces T;
  3. m has a parameter type that consumes T.

These rules form the basis for checking the legality of things like method variance, such as co-variance and contra-variance, which in turn allows the type system to intelligently enforce type safety.





3 comments:

  1. I think you took a wrong turn at the start:

    "An object's type is the sum of its behavior"

    Probably something worth saying about starting with conclusions, but...

    A type has nothing to do with behavior.

    Consider the duck. Whether it can walk or quack has nothing to do with its duckness. A duck is still a duck, even if it loses it's ability to speak and waddle.

    But those behaviors are actually attributes. Parts of what makes it s duck.

    Now let's look at your Person. Your example with getters and setters does not have any behavior. It just has Java (yes, still Java) idioms for accessing its attributes. Its data attributes.

    Let me ask you a question -- what type is your firstName attribute
    I won't leave you in suspense -- it's a String.

    What is a String? It's an ordered collection of characters. It isn't the thing that reverses itself or capitalizes itself, divides itself into smaller Strings, or tells you it's length. Those are things that may be done to a String, not core attributes of the type.

    A String can be divided, or reversed, etc. Just like a duck can waddle or fly. But they are not what defines it. They are what can happen to it. And other things than Ducks and Strings can fly and be divided in half.

    The point is that the type is actually the name -- a type doesn't exist until it has a name. The ability to assign something to a type requires the type to be named -- even if it it's behavior (or attributes) are not yet fully defined.

    And that gets to my final point, and first quibble.

    A type is not the sum of either the behavior or attributes of a thing. It is a subset. Just like a duck is a bird and , and a String is a list of characters and a non-numeric type.

    So my definition is more like this:

    A type is a set of attributes that an object can be classified in.

    And yes, I'm afraid that means multiple inheritance.

    If we stop thinking of behavior as something an object does and start thinking it as something that may be done to (or with) an object, then that might help.

    ReplyDelete
  2. Hi Aaron -

    (This is the third time that I've written a response to your comment, but the first two got sent to dev/null by Firefox in order to protect me from Google's strange reliance on cross-site scripting hacks and x-frames. Hopefully, I've finally slain that dragon by adding some Firefox security exceptions for this blog. Please trust me when I say that my first two, now forever-lost, comments were far better than this one.)

    > "I think you took a wrong turn at the start"

    Reading the totality of your comment, you're not actually very far away from where we ended. Let me see if I can clarify a bit ...

    > "A type has nothing to do with behavior."

    When we use the abstract term "behavior", we are referring to its call level interface (CLI), i.e. the callable signatures of the methods of the type.

    > "If we stop thinking of behavior as something an object does and start thinking it as something that may be done to (or with) an object, then that might help."

    Indeed. That is how we think of it.

    > "The point is that the type is actually the name -- a type doesn't exist until it has a name."

    The Ecstasy type system is, at its foundation, a pure type system, and its pure types are composed entirely, and only, of the behaviors of those types.

    A nominal type system is layered over those pure types, which means that every nominal type can also be used (if desired) as a pure type, but the nominal aspect allows a great deal of potential optimization and a "strong" typing capability.

    The Ecstasy type algebra/calculus also allows for (among other things) type addition, subtraction, "and", "or", and negation/"and-not".

    ReplyDelete
  3. "An object's type is the sum of its behavior"

    That sounds a lot like protocols to me.

    https://en.wikipedia.org/wiki/Protocol_(object-oriented_programming)

    Popular in the Clojure world.

    ReplyDelete

All comments are subject to the Ecstasy code of conduct. To reduce spam, comments on old posts are queued for review before being published.