EcstasyLang: 2019

2019/10/13

More turtles

The Ecstasy type system is called a Turtles Type System, because "it's turtles, the whole way down". This is, in many ways, a revolutionary approach to type systems. Object type systems have traditionally had primitive types (like a "periodic table of elements" for the language) from which all other types are built, but in Ecstasy things are a bit different. For example, an Ecstasy integer is composed of an array of bits, each of which is composed of an integer literal (0 or 1), which is in turn composed of a string, which is composed of an array of characters, each of which is composed of an integer. So we're right back where we started, with an integer -- and you can just recurse infinitely, because the types are all turtles.

Making a type system like this actually work is a challenge, so it didn't appear all at once. Recently, support for recursive type definitions was added, in order to support the JSON parsing project. Consider the following Ecstasy code:

/**
 * JSON primitive types are all JSON values except for arrays and objects.
 */
typedef (Nullable | Boolean | IntLiteral | FPLiteral | String) Primitive;

/**
 * JSON types include primitive types, array types, and map types.
 */
typedef (Primitive | Map<String, Doc> | Array<Doc>) Doc;

Here, in two lines of code (which could even be simplified to a single line, if we didn't want to split out primitive JSON values), we see a complete Ecstasy mapping of the JSON specification. That second typedef, though, is a doozy, because it refers to itself. If you stop and read it carefully it makes a lot of sense: A JSON document is either a primitive value, a map of string keys to JSON values (each of which could be an entire recursive document structure), or an array of JSON values (each of which could be an entire recursive document structure).

To keep it simple, consider the following example:

typedef (Int | List<Manifold>) Manifold;

Manifold m1 = 9;
Manifold m2 = [m1];
Manifold m3 = [m2];

console.println(m1);
console.println(m2);
console.println(m3);

When executed, this code will print:

9
[9]
[[9]]

But the amazing thing isn't that it works at all, but rather that it works with full type safety.

2019/08/01

Hello World!

In retrospect, the most obvious missing feature of Ecstasy is the prototypical "Hello World!" example.

The earliest adopters / experimenters / hackers who have been playing with Ecstasy for some time now were somehow able to divine the magic incantations necessary to get get code compiling and running (sometimes with help from our team), but it's time to make this process much easier.

This won't be a single update; rather, it is a process -- of moving the project from a small team that knows all of the undocumented nooks and crannies, out into the public sphere. The initial experience with Ecstasy should not be as soul-sucking and psychologically scarring as a Google job interview. For a new user, it should be straight-forward to get started, and not some experience like an "obstacle course" or "running the gauntlet".

To that end, we introduce step one, the Hello World:

module HelloWorld
    {
    void run()
        {
        @Inject Console console;
        console.println("Hello World!");
        }
    }

Here's a short explanation of the code, which is found in ./xdk/src/main/resources/xdk/examples/HelloWorld.x:

A module is the unit of compilation, loading, linking, and execution, so we need to write one of those. Don't worry -- as you can see, it's easy.
The xec command (which we'll cover below) looks for a method on the module called "run" that takes no parameters. (The module is a class, so "void run()" on the module is just a normal method.)
Ecstasy code is purposefully incapable of doing any I/O; for security reasons, there is nothing in the language (or in the compiled form of the language) that has any access to any hardware or OS resource. As a result, the code must depend on its container to provide something that implements the Console interface; this is called injection. The behavior of the console that is injected by the TestConnector is to print to stdout.
The declaration "@Inject Console console;" declares a read-only variable called console, and when it is de-referenced. it will always have a value that is a Console. (It is a contract; if the container could not -- or chose not to -- provide a Console, then the creation of the container itself would have failed.)
Hopefully, the line that prints out "Hello World!" is self-explanatory.

Here are the steps to getting this running:

The git utility is used for downloading project code. Open a terminal window (aka command window aka shell) and type "git" to verify that you have it installed and working. If you don't, then you can get git; if you develop on a Mac, git is already included in the Command Line Tools for XCode.
Java is used to run the current Ecstasy toolchain, and version 11 (or later) of the JDK is required. Open a terminal window, and type "java -version" to verify that you have the necessary version of Java installed. If necessary, you can download the free JDK 11 from the Amazon Corretto project, for example.
We strongly encourage you to download IntelliJ IDEA, if you don't already use it. (Or update it to the latest version, if you already use it.) Since Ecstasy is an open source project on GitHub, you can use the "Community" edition of IDEA. (We do think that it is an IDE worth paying for, so don't be afraid to splurge on the "Ultimate" edition!)
Determine a location to create a local repository for the XVM project. The rest of these instructions will assume Unix style paths and an installation location of ~/Development/xvm, but if you're on Windows, just create an XVM project directory somewhere, e.g. Development\xvm under your user directory.
From the terminal, in that XVM directory, execute: git clone https://github.com/xtclang/xvm.git This will take a few seconds (maybe minutes) to completely clone the project into your XVM directory.
Next, use the Gradle wrapper to build a local copy of the Ecstasy development kit (XDK) with the following command: ./gradlew build (or gradlew.exe build on Windows) This will take a minute or so to completely build the XDK.
The XDK is built under the ./xdk project directory under your XVM directory, specifically ./xdk/build/xdk sub-directory. You can copy the built XDK to a location of your choosing, but for these instructions, we will leave it in the location in which it was built.
To configure the toolchain for your OS, execute the appropriate command in the bin directory of the XDK; for example, on macOS, execute . ./xdk/build/xdk/bin/cfg_macos.sh. (Notice the dot and space at the beginning of the command; this is called a "source command" in Bash. Unfortunately, this does not work with the zsh shell that is now the default on macOS, so you have to run ./xdk/build/xdk/bin/cfg_macos.sh without the preceding source command, and then manually update the PATH to add ./xdk/build/xdk/bin.)
Now you can use the xtc, xec, and xam commands from the terminal. On some operating systems, if these executable files are not signed and/or notarized, you may get an error or a warning the first time that you run them. For example, macOS includes a program called GateKeeper that may need to be configured to allow these programs to be executed.
Each time that you open a new terminal window, you will need to execute the OS-specific configuation script to update the PATH variable; alternatively, you can configure your OS to automatically update the PATH for you, but the complexity of that topic is immense, and beyond the scope of this document. (On macOS and Linux, one normally would create a .profile file in one's home directory and add one line that says e.g. export PATH=$PATH:~/xvm/xdk/build/xdk/bin, but there are pages of conversation to read through on StackOverflow for when this simple approach fails to work for your configuration.)
To compile the HelloWorld example, use the xtc command: xtc ./xdk/build/xdk/examples/HelloWorld.x
The compiler places the compiled .xtc file into the current directory (which in this case is probably where you don't want it, but for the sake of this example, we'll ignore this detail). To execute the program: xec HelloWorld.xtc

And if all went well, you should see:

Hello World!

2019/07/14

Composition

In an OO language, one of the first questions to ask is how classes and types are composed. Sometimes, when looking at a new language, it's easy to get side-tracked by clever syntax, or syntactic "features", but while these are ultimately important, there is nothing more important in a language than being able to describe the shape of what one is building.

Ecstasy provides three basic shapes from which classes are composed:

Classes, which (just like in Java and C#) are useful for defining instantiable combinations of state and behavior.
Interfaces, which (just like in Java and C#) are useful for defining contracts, and may allow default behavior to be defined.
Mixins, which are used to define cross-cutting functionality.

One example of each of these from the core library is the Range class, the Sequential interface, and the Interval mixin. Consider this simple example:

for (Int i : 10..20)
    {
    // do something
    }

The expression "10..20" is an Range; it defines a "from value" and a "to value". The only requirement of a range is that its type must be Orderable, which is the funky interface that allows two objects to be compared for purposes of ordering.

The ability of a type to be ordered is a necessary but insufficient capability for iteration, which is what the for loop requires, and if you examine the Range class closely, you will notice that it does not implement Iterable. What it does, instead, is this:

const Range<Element extends Orderable>
        incorporates conditional Interval<Element extends Sequential>

Translated into English, that reads: "A range is a constant that contains elements, which must be of an orderable type. Additionally, for ranges whose elements are of a sequential type, the range will automatically incorporate the capabilities of an interval."

Think of the Sequential interface as the type that is necessary to support the "++" and "--" operators (pre-/post- increment/decrement). When a range of a sequential type is constructed, the composition of the range incorporates the Interval mixin, which in turn, being iterable, provides an iterator that can be used by the for loop.

A range of a non-sequential type cannot be iterated over, and an attempt to do so is detected by the compiler:

for (String s : "hello".."world")   // compiler error
    {
    // do something
    }

In the "const Interval" declaration shown above, the keyword used to declare the class was "const". To declare a class (in the abstract sense of the term), Ecstasy provides eight keywords:

module is used to declare a unit of compilation, or a unit of deployment. Java has a related concept, also called a module, and C# uses the term assembly. A module is a singleton const class; see Modules Overview.
package is used to declare a namespace within a module, which is kind of like creating a directory within a file system. Like module, a package is also a singleton const class.
class is used to declare any class that is not specialized as either a const or a service. Classes may be made immutable at run-time, but may not be singletons. For example, see ListMap.
const is used to declare a class that is immutable by the time that it finishes construction. Furthermore, it automatically provides implementations of a number of common interfaces, including both Orderable, Hashable, and Stringable. Consts can be singletons, and are always immutable. For example, see Int64, aka Int.
enum is used to declare an enumeration of values. The enumeration itself is an abstract const, and each enum value is a singleton const. For example, see Boolean.
service is used to declare a potentially asynchronous object, conceptually similar to a Java or C# thread, but in many ways, much closer to an Erlang process. Services may be singletons, and may not be immutable. There aren't any good examples of service in the core library, but the services.x test highlights the asynchronous and continuation-based behaviors of the service, using both an explicit Future-style programming model, and the implicit async/await style.
interface defines just the surface area (the API) of a class, and may include default implementations of that API.
mixin declares a cross-cutting composition that can be incorporated into another composition.

Each of these, and the forms of composition available to each, will be covered in more detail in subsequent articles. In the meantime, if you're curious about the raw syntax, see bnf.x, and if you're curious about how the parsing of the syntax works, see parseTypeCompositionComponent() in the Parser. The AST node for type compositions is TypeCompositionStatement.

2019/07/13

A pane in the glass

One of the best visual metaphors for object systems is a pane of glass. Imagine having a pane of glass and a red dry-erase marker; use that marker to draw and fill in some small circles on the glass. Now, take another pane of glass, and using a blue dry-erase marker this time, do the same thing, and then set that second pane on top of the first one. When you look through the panes from above, you see these small circles circles from both panes, almost as if they were on one pane.

One of these circles represents a virtual behavior. A virtual method, for example. Imagine that the pane of glass is magically subdivided as a grid, and those circles were magically located within the cells of that grid. Now, as you look through those two panes of glass together, those red circles that you see are methods implemented on the base class, and the blue circles are methods implemented on the derived class, because we put the red on the bottom (the base), and the blue overlaid it. And perhaps you might see some purple circles, representing methods that exist on the base class and are overridden on the derived class.

We can repeat the experiment with another pane of glass, and a yellow marker, but at this point, it's getting very difficult to hold and juggle all of this glass, so we need a special holder for these panes of glass. Since this experiment is in our mind's eye, we can instantly build whatever we need to hold these (and many more) panes of glass. We need something almost like the adjustable shelves in an oven -- a sort of "glass shelf system" that allows us to slide any pane-of-glass into, and pull any pane-of-glass out of this holder. We construct it to be free standing, so that we can look through it from above, and we build it with a light source beneath it that helps to provide illumination through our panes of glass. What we have now is our collection of panes of glass, any of which might have those colored circles laid out in a grid, and we can now appreciate our beautiful coloring job when we look down through all of that glass, from above.

It is tedious to build such a thing in our head, but it serves a most excellent purpose, for now we are all looking at the same thing together, and sharing words that have meaning because we are looking at the same thing.

For example, when we use the term method identity, we are referring to a pair (x,y) of coordinates that identify a location (a cell) in the grid on the glass. (In the real world, we know that a method has other means of identifying itself, such as a name and perhaps some information about its parameters, but in our mind's eye, it's far simpler to draw circles on a grid on glass with dry erase markers.)

And when we say that there is no such method, we mean that we look from above through the grid and there is no color in a particular cell -- not on the top pane of glass, but also not on any pane of glass under it.

And when we talk about a virtual method invocation, we mean that for a method identity (a cell location in the grid) in which we can see a color circle from above, we slide out the top pane of glass and see if the cell in question has a circle on this top pane of glass, and if it does, that circle represents the behavior (the code) of that method to execute. If on the other hand, it does not have a circle in that cell, then we slide the glass back into its shelf, we pull out the next piece of glass, we examine it, and we keep repeating this process until we find the first piece of glass that has a circle in that cell.

And when we talk about a super method invocation, what we mean is that during the execution of that code, if the code refers to its super, it is referring to the next circle that we would find if we were to go back and continue pulling out those glass shelves one by one and examining them, as we were when we performed the original virtual method invocation. If in doing so, we get to the last piece of glass and we have not found that circle, we would say that there is no super. If on the other hand we find that circle on a subsequent piece of glass, then that circle represents the super -- the code of the super method to execute.

And when we talk about a method chain, we are referring to that first circle that we found for the virtual method invocation, and also its super, and also its super, until we get to the end and there are no more supers. That sequence of circles is the method chain.

That's quite a vocabulary that we have developed, but it is invaluable in designing and discussing how an object system works. More importantly, it's fundamental to understanding the next concept, because the concept we're about to describe doesn't yet exist outside of the Ecstasy language. In The Quest for Equality, we introduced a notion of equality that is unlike any that can be found today in other languages. It is neither virtual (since it based on a function, not a virtual method), nor is it static (since it is based on run-time type information), nor is it dynamic (since the type information reflects the declared compile-time type information). So what is it, then?

Equality is an example of a funky interface. It is like a interface type (a C++ pure virtual class, or a Java/C# interface) in many ways, except that we do not stand over the panes of glass and look for a method in the manner that we described above. No, a funky interface is different, because it knows which pane of glass it refers to.

First, in terms of implementing a funky interface, what it means is that the pane of glass on which the implementation occurs will (and must) contain a colored-in circle for each of the functions (not methods) of the funky interface. (By now, you must be realizing how the funky interface got its name.) So when we slide out the pane of glass for a an Orderable implementation, we will see both the equals and the compare function circles colored in.

Second, in terms of using a funky interface, that compile-time type being compared-for-order (the Orderable interface) represents the slot of the pane of glass that we will find that compare function on. But it is not necessary that every pane of glass have that function for the type to be orderable. Instead, we begin at that pane of glass, and check if it implements that funky interface, and if it does not, then we proceed to the next lower pane of glass, until we find the one that does.

Remember when we said that an implementation of a funky interface must contain a colored-in circle for every one of the functions from the funky interface? That is because a funky interface represents a tight coupling of related functions. The Orderable example above is a good one, because both the equals and the compare function use some concept, some definition of equality, and thus if one changes, we must expect that the other changes as well.

But there are a few other obvious examples, such as Hashable: If you change the definition of equals, then you must also make sure that the hash code calculation for two equal objects produces the same result for each, and vice versa. Thus Hashable is a funky interface with those two functions, equals and hashCode.

And since equals shows up in both Hashable and Orderable, if one is to implement those two funky interfaces, then it is clear that filling in any one of those circles on a pane of glass requires filling in them all. And this allows the compiler to detect when a higher pane of glass attempts to fill in just some subset of those circles, which would naturally lead to errors in the running program. In other words, derived types must continue to respect the contracts from the funky interfaces of the base types.

Because to do otherwise would be a pane.

2019/07/11

If it Quacks

Ecstasy supports both type tests and type assertions. In most languages, a type assertion is called a cast, which may result in a compiler error or (in the case of languages with run-time type information) a run-time exception. A type assertion in Ecstasy is a run-time, type-safe operation, and importantly, can be expressed as an explicitly-left-associative operation:

String foo(Object o)
{
return o.as(String); // could throw TypeMismatch
}

Languages with run-time type information usually provide an additional, non-asserting means to test for a particular type at run-time, such as the Java instanceof binary operator, or the C# is binary operator. A type test in Ecstasy can be performed as an explicitly-left-associative operation:

String foo(Object o)
{
return o.is(String) ? o : "hello, world!";
}

Normally, an object cannot be of a certain type, unless it is explicitly declared to be of that type. For example, even if the imaginary class FakeString has all of the same properties and methods as the String class, instances of FakeString cannot be cast to a String.

Some languages do support such a thing, however. It's called duck typing, because "if it walks like a duck, and quacks like a duck, then it's a duck". An early prototype of Ecstasy had this feature for all types, and even provided a composition keyword, impersonates, to automate the composition of ducks and duck-like creatures. However, the capability did not mesh well with the design of the class-based portion of the type system, and was ultimately rejected for its unanswerable questions and potential incompatibilities.

Because duck typing is so useful, especially when working across the boundaries of loosely-coupled modules, one aspect of duck typing was explicitly retained: The ability to duck-type an interface. In many ways, an Ecstasy interface is simply a named type, i.e. a type plus a name. (This is not strictly true, but for this conversation, it will suffice.) And thus, in Ecstasy, one can make a Gosling Duck:

interface Duck
    {
    void waddle();
    void quack();
    }

class Gosling
    {
    void waddle() {}
    void quack() {}
    }

Duck foo(Gosling james)
    {
    return james;
    }

(No ducks were harmed in the making of this blog entry.)

2019/07/10

Literally awesome!

Ecstasy supports a rich set of literals -- too much to cover in a single post, so consider this the first installment. It's important to lay out, up front, why a language supports literals, and what its goals are for the design in doing so.

First, when building a language, literals are often terminal constructs, such that other things in the language can be composed of them.

Second, a literal allows an efficient, human-readable encoding of information. For example, for most of us, it is far easier to read the number 42 than something like:

new Byte(false, false, true, false, true, false, true, false)

Ecstasy's design goals for literals are fairly straight-forward:

Common constant types supported by the core runtime library should have a literal form. Examples include: Bits, nibbles, bytes, binary strings, integers, characters, character strings, dates, times, date/times, time durations, etc.
Common complex types supported by the core runtime library should have a literal form. Examples include: Tuples, arrays, lists, sets, and maps (aka directories).
Literal formats should emphasize readability, and the formats should be fairly obvious to a programmer.
It should be easy to work with literal formats using only a text editor.
Literals should make common programming tasks simpler, where possible.

Integers
A "whole number", or an integer, starts with an optional sign, followed by an optional radix indicator (such as "0b" for binary, "0o" for octal, or "0x" for hex), followed by the digits of the appropriate radix, with optional underscores between digits to separate digits as desired. The BNF is in the language specification, but the simple explanation above should suffice. Here are some examples:

0
-1
42
0xFF
0b10_1010_1010_1010_1010
12345678901234567890123456789012345678901234567890

So, what is the type of each of the above? A 32-bit "int"? A 64-bit "int"? No. Each of the above is an IntLiteral, a const class. Just think of IntLiteral as an object that has a good idea how to look on the screen, and simultaneously knows what values of various numeric types it can represent. The benefits are fairly obvious, in terms of support for arbitrary integer sizes (without weird type casting or literal suffixes like "L"), and support for other numeric types whose range may be far beyond the range of any arbitrary fixed-length integer type.

Characters
A character is a single-quoted Unicode code point, with predictable support for escapes using the backslash. If necessary to encode Unicode characters in the range up to U+FFFF, the format \u1234 can be used; beyond that range, the format \U12345678 can be used. Here are some examples:

'a'
' '
'\''
'\t'

This literal type is implemented by Char, a const class.

Strings
A (character) string is a double-quote enclosed sequence of characters, supporting the same escapes as are supported for character literals. Here are some examples:

""
"Hello, world!"
"This is an example of \"quotes\" inside \"quotes\""
"Multiple\nlines\nof\ntext."

Multi-line strings are freeform, which means that character escapes are not processed; Unicode escapes, on the other hand, are supported, because they are handled by the earlier "lexer" stage of the compilation. Multi-line strings use a hard left border, defined by the "pipe" ("|") character; the first line of a multi-line string begins with a back-tick ("`") followed by a pipe. Here is an example:

String s = `|This is a test of
            |a "multiline" string
            |containing | and \ and ` and ' and " etc.
            ; // <--- look at this

Like an end-of-line comment, the multi-line string takes everything from the pipe to the end of the line, as-is, which is why the semicolon in the example above has to be placed on the following line.

A template allows a string to be formed dynamically from any valid expression. The format of the template string is the same as a normal string, except prefixed by the dollar sign ("$"); expressions inside the string are prefixed by dollar-sign + open-curly ("${") and suffixed by close-curly ("}"). Here are a few examples:

$"Hello, ${name}!"
$"2 + 2 = ${2 + 2}."
$"Finished in ${timer.elapsed.milliseconds}ms."
$"Finished in ${{timer.stop(); return timer.elapsed;}}"

Templates are handy, and making up good examples is challenging, but we already use templates all over the place. The last example is quite interesting, in that it shows a statement expression (syntactically, a lambda body) inside of the template expression.

Templates can also be used with multi-line strings, which is denoted by using a dollar sign instead of the opening back-tick:

String s = $|# TOML doc
            |[name]
            |first = "{person.firstname}"
            |last = "{person.lastname}"
            ;

Finally, if the string you need to glue into your code is too big and ugly to put into the source file, then don't. Just stick it in its own file in the same directory; for example, in a file named "ugly.txt":

String s = $./ugly.txt;

Yeah. That was easy.

The literal type for all of these forms of string is implemented by String, a const class.

Arrays
An array literal is a square-bracket enclosed list of values.

Here are some examples:

[]
['a', 'b', 'c']
[1, 2, 3]

This literal type is implemented by the Array class, which is variably mutable: Array literals are either Persistent (if they contain any values that are not compile-time constants) or Constant (if they contain only compile-time constants).

Summary
This was just a brief introduction to literals in Ecstasy. Each of these literal forms has many more rules than we covered here, but those rules are there to allow for more expression (readability) in the source code, and not to restrict it. The forms for these literals are designed to make it super easy to write and very pleasant to read.

The rules do make the lexer and the parser more complex, but we look at it this way: The compiler only has to get written twice (one prototype to bootstrap the language, and then the real one written in natural Ecstasy code), so no matter how much work it is to make the language easier to use, we get to amortize that cost across many, many users over many, many years.

Literally.

2019/07/09

Null is no Exception

Since Tony Hoare formalized the concept of "null" in 1965, we have lived with an entire family of languages (including C, C++, Java, and C#) that made it possible for a pointer to contain a purposefully illegal address, for the purpose of representing the lack of a value. That purposefully illegal address is called null. (Or NULL, NIL, nil, etc.)

The reasoning was simple: Without using any additional memory, a pointer could be made to serve two purposes: First, to indicate whether or not a value exists, and secondly, what that value is if and only if it exists.

As type systems advanced, such that pointers became type-safe references instead of arbitrary integers, it became necessary to represent null as a typed value. To make it possible to assign the value null to any reference, these type systems made null a sub-type of all other types; otherwise, null would not be assignment compatible with any type other than the null type itself.

The use of null in all of these languages had an unfortunate side effect: Because null could be assigned to any type, it logically followed that each and every value might be null. That means that every single access to a value requires a null check, which in turn generates an exception in languages like Java (NullPointerException) and C# (NullReferenceException). In C, such code just segfaults (aka "Access Violation" in Windows) and core-dumps. Yay!

To avoid segfaults and exceptions, it became necessary to sprinkle code with lots of these:

if (s != null)
    {
    ...
    }

(Yay!)

There is an elegant solution to this ugliness, which is to make null into its own normal type, and not some magical "subclass of all classes" class, or "subtype of all types" type. In other words, simply by making the null value into an object reference of some normal class, it prevents that reference from being assigned willy-nilly to references of any other random type.

The complete code for the Nullable type (found in Ecstasy's module.x) is:

enum Nullable { Null }

That one line of code declares an enumeration class, called Nullable, with one value, called Null.

(Advanced: From an inheritance point of view, Null extends Nullable implements Object. From a composition point-of-view, as an enum, the class for Nullable incorporates the Enumeration mixin while the Nullable class itself is an abstract enum, and Null is an enum value. An enum value is a singleton const, which automatically implements both the Enum and Const interfaces. See source files: module.x, Const.x, Enum.x, Class.x, and Enumeration.x.)

This approach introduces some new requirements for the language's type system. First, a type system must be able to represent composite types, such as intersection types, union types, and difference types. Ecstasy represents intersection types with the "or" ("|") operator, because the code "(A|B)" reads "either type A or type B", which means that only the intersection of those two types can be assumed. (Apologies to any mathematicians reading this, but the "U" on our keyboard was stuck in the right-side-up position.)

Thus, to declare a type that can hold a value of either Nullable or String, and assign it a predictable value, one could write:

Nullable | String s = "Hello, world!";

This would quickly get old, so a short-hand notation for the "Nullable|" portion is the type-postfix "?"; here is the rewritten form of the above declaration, using the short-hand notation:

String? s = "Hello, world!";

Since the variable "s" is either a String value or a Nullable value, one can not ask it for its size:

Int len = s.size;    // compiler error!

The reason that "s" does not have a size is that its type is “Nullable or String”, and the Nullable type does not have a size property. This allows the compiler to know that the size property cannot be requested; this is an example of compile-time type safety. (Run-time type safety is exhibited by throwing a NullPointerException, etc.; Ecstasy has no equivalent to this exception, because such an exception cannot occur! "And there was much rejoicing.")

Compile-type type safety allows the compiler to know when a value might be Null. By checking if the type is a String, the compiler subsequently knows that the value cannot be Null, and specifically that the value is a String, after which it is safe to obtain the String size:

if (s.is(String))
    {
    console.println($"String s is ${s.size} characters long.");
    }

Similarly, if the code explicitly compares to the Null value, then compiler can know when the value is or is not Null. The above code could be modified slightly by first testing if the value is not Null, so that the compiler subsequently knows (by process of elimination) that the value is a String, after which it is safe to obtain the String size:

if (s != Null)
    {
    console.println($"String s is ${s.size} characters long.");
    }

The postfix "?" operator is a short-circuiting operator that performs the same not-Null test, so the above code could be written instead as:

console.println($"String s is ${s?.size} characters long.");

The short-circuiting "?" operator can be grounded using the else (":") operator. In the following example, if "a" is Null, or if "a.b" is Null, or if "a.b.c" is Null, then the result is the predictable value of "Hello, world!", otherwise the result is the value of "a.b.c":

String s = a?.b?.c? : "Hello, world!";

Ecstasy combines the postfix "?" and the else (":") operator into the elvis ("?:") operator:

String s = a ?: "Hello, world!";

The above code has the same effect as:

String s = a? : "Hello, world!";

As with most binary operators, it is possible to combine the operator with the assigment operator, such that:

x = x ?: y;

... can be rewritten using the elvis assignment operator as:

x ?:= y;

Notice the similarity in the postfix "?" operator, the else (":") operator, and the elvis operator, with the ternary operator; each of the four following lines of code has the same result:

x = x!=Null ? x : y;    
x = x? : y;    
x = x ?: y;
x ?:= y;

That is a lot to wrap one's head around, but there is a simple logic behind it.

Finally, there is a special Null-aware assignment operator that splits a nullable type, such as "String?", into a tuple of Boolean and the (conditional) non-nullable portion of the type (e.g. "String"). This can be used wherever a condition can be used, such as in an "if" or "while" statement. For example, imagine some method or function that can return a nullable string value:

String? foo();

Other than the operator, this example should seem quite familiar by now:

if (String s ?= foo())
    {
    console.println($"String s is ${s.size} characters long.");
    }

In the above example, if the function returns Null, then the result is the tuple (False), which is consumed by the if, causing the "else" branch of the if statement to be executed. Conversely, if the function returns a String value, then the result is the tuple (True, string-value), of which the (True) is consumed by the if, causing the string-value to be consumed by the assignment, and causing the "then" branch of the if statement to be executed.

Thus, it should be obvious that these two statements will have the same result:

s ?= foo();
s = foo()?;

As just another normal value, and thus without any mind-bendingly-crazy type system rules to accomodate some magical null value, Null is simultaneously less troublesome and more useful.

The null is dead. Long live the Null.

2019/06/20

The Quest for Equality

It's shocking how difficult equality is to get right in software. What does equality even mean?

Equality used to be so simple in the days of assembly language, when "bitwise equality" was all that existed, and the only things that could be compared were fixed-size CPU registers. Things got more and more complex, until we ended up with Object Oriented languages, in which each class may have its own idea of what equality means.

On top of that, the most popular languages today each have multiple forms of equality. Python offers "is" versus "==". Java and C# offer "==" and "o1.equals(o2)" -- which can easily differ from "o2.equals(o1)" because it is neither symmetric nor transitive. Javascript has both an "is" and "==", and just keeps adding equals signs when new meanings are desired; so far, they're up to "===", but anyone who denies the likelihood of a future "=====" operator is just kidding themselves.

Obviously, equality is not a simple problem. Most of the existing solutions are broken, because they cannot provide either symmetric or transitive behavior. Many of these problems are the direct result of having multiple type systems within a single language; that helps to explain why many of these languages provide two forms of equality: One for the primitive "value" type system, and one for the object type (reference-based) system.

It's obvious that in an object type system, a class needs to be able to define equality for that class, replacing any superclass definition of the same, which means that the definition of equality is virtual, as in "virtual method invocation". It's also obvious that such a definition cannot be a single dispatch method, because then "o1.equals(o2)" may provide a completely different answer from "o2.equals(o1)". There have been languages that attempt to address this type of conundrum with multiple dispatch support, but by the same token, one can swat a fly with an atomic bomb.

Generic types further add complexity to the notion of equality, because each type parameter may contribute its own potentially conflicting notion of equality to the equation.

So what's the answer?

Intent. Intent matters. The developer's intent when they write a line of code is important, but somehow that intent manages to be erased by a language compiler. (It's no mistake that the term type erasure is used to describe a generic type system that is implemented by forgetting what those types actually were.)

Ecstasy captures the developer intent by capturing the compile time type of the references in question, and retaining that information in the compiled code, and using that information for hard problems like equality.

Consider the following example:

Collection<String> c1 = foo();
Collection<String> c2 = bar();
if (c1 == c2)
    {
    // ...
    }

It doesn't matter what the actual runtime type of the object is that c1 or c2 refers to, because the developer clearly explained that they are Collections of String objects. The class of the collection is unknown here at compile time, and instead the Collection interface is used, which defines equality in a certain way. In theory, those collections could contain objects of various sub-classes of String, but when those contents are compared, they will be compared as Strings. Because that was the intent of the developer.

To accomplish this, equality is implemented non-virtually; here is the equals function on Collection:

/**
 * Two collections are equal iff they are they contain the same values.
 */
static <CompileType extends Collection>
        Boolean equals(CompileType collection1, CompileType collection2)
    {
    // they must be of the same arity
    if (collection1.size != collection2.size)
        {
        return False;
        }

    if (collection1.sortedBy() || collection2.sortedBy())
        {
        // if either is sorted, then both must be of the same order;
        // the collections were of the same arity, so the second iterator
        // shouldn't run out before the first
        Iterator<CompileType.Element> iter1 = collection1.iterator();
        Iterator<CompileType.Element> iter2 = collection2.iterator();
        for (val value1 : iter1)
            {
            assert val value2 := iter2.next();
            if (value1 != value2)
                {
                return False;
                }
            }

        // the collections were of the same arity, so the first iterator
        // shouldn't run out before the second
        assert !iter2.next();
        return True;
        }
    else
        {
        return collection1.containsAll(collection2);
        }
    }

What makes this work is that the compile-time type is passed to the equals function. Not the name of the type. Not the "type value". The type. The actual, for-real, usable-as-a-type-name-in-your-code type. A formal type. (Yes, types are objects. But they're also types.)

So how did this function get called? Well, because the compiler determined, and retained, and used, the compile-time type. Both the "==" operator and the "!=" operator ultimately cause a call to this function to occur.

And what if, instead, the developer had specified these variables as being Lists?

List<String> c1 = foo();
List<String> c2 = bar();
if (c1 == c2)
    {
    // ...
    }

Again, it doesn't matter what the actual runtime types of those object are, because the developer clearly explained that they must be Lists of String, and the result of the "==" operator is a call to List:

/**
* Two lists are equal iff they are of the same size, and
* they contain the same values, in the same order.
*/
static <CompileType extends List> Boolean equals(CompileType a1, CompileType a2)
    {
    Int c = a1.size;
    if (c != a2.size)
        {
        return False;
        }

    for (Int i = 0; i < c; ++i)
        {
        if (a1[i] != a2[i])
            {
            return False;
            }
        }

    return True;
    }

Even more obvious is that the equals functions themselves simply use the "==" and "!=" operators on the values with the Collections or Lists, because the compile time type of those contents has not been erased, and was encoded into the compiled code, and was used in the call to the Collection or List equals function. And this works for arbitrarily deep nesting of generic types, because turtles.

And it's type safe.
And it's symmetric.
And it's transitive.
And it's readable.
And it's obvious.
And it's correct.

One more thing. What if you really want to know if two objects are actually "the same object"? In Ecstasy, as described previously, the reference (Ref) to an object contains the type and the identity of the object that is referred to. Sameness refers to that identity, and the equals function on Ref tests for the equality of that identity:

List<String> c1 = foo();
List<String> c2 = bar();
if (&c1 == &c2)
    {
    // ...
    }

... which is exactly what the Object.equals function does itself:

static <CompileType extends Object>
        Boolean equals(CompileType o1, CompileType o2)
    {
    return &o1 == &o2;
    }