2019/07/14

Composition

In an OO language, one of the first questions to ask is how classes and types are composed. Sometimes, when looking at a new language, it's easy to get side-tracked by clever syntax, or syntactic "features", but while these are ultimately important, there is nothing more important in a language than being able to describe the shape of what one is building.


Ecstasy provides three basic shapes from which classes are composed:
  • Classes, which (just like in Java and C#) are useful for defining instantiable combinations of state and behavior.
  • Interfaces, which (just like in Java and C#) are useful for defining contracts, and may allow default behavior to be defined.
  • Mixins, which are used to define cross-cutting functionality.
One example of each of these from the core library is the Range class, the Sequential interface, and the Interval mixin. Consider this simple example:

for (Int i : 10..20)
    {
    // do something
    }
The expression "10..20" is an Range; it defines a "from value" and a "to value". The only requirement of a range is that its type must be Orderable, which is the funky interface that allows two objects to be compared for purposes of ordering.

The ability of a type to be ordered is a necessary but insufficient capability for iteration, which is what the for loop requires, and if you examine the Range class closely, you will notice that it does not implement Iterable. What it does, instead, is this:
const Range<Element extends Orderable>
        incorporates conditional Interval<Element extends Sequential>
Translated into English, that reads: "A range is a constant that contains elements, which must be of an orderable type. Additionally, for ranges whose elements are of a sequential type, the range will automatically incorporate the capabilities of an interval."

Think of the Sequential interface as the type that is necessary to support the "++" and "--" operators (pre-/post- increment/decrement). When a range of a sequential type is constructed, the composition of the range incorporates the Interval mixin, which in turn, being iterable, provides an iterator that can be used by the for loop.

A range of a non-sequential type cannot be iterated over, and an attempt to do so is detected by the compiler:
for (String s : "hello".."world")   // compiler error
    {
    // do something
    }
In the "const Interval" declaration shown above, the keyword used to declare the class was "const". To declare a class (in the abstract sense of the term), Ecstasy provides eight keywords:
  • module is used to declare a unit of compilation, or a unit of deployment. Java has a related concept, also called a module, and C# uses the term assembly. A module is a singleton const class; see Modules Overview.
  • package is used to declare a namespace within a module, which is kind of like creating a directory within a file system. Like module, a package is also a singleton const class.
  • class is used to declare any class that is not specialized as either a const or a service. Classes may be made immutable at run-time, but may not be singletons. For example, see ListMap.
  • const is used to declare a class that is immutable by the time that it finishes construction. Furthermore, it automatically provides implementations of a number of common interfaces, including both Orderable, Hashable, and Stringable. Consts can be singletons, and are always immutable. For example, see Int64, aka Int.
  • enum is used to declare an enumeration of values. The enumeration itself is an abstract const, and each enum value is a singleton const. For example, see Boolean.
  • service is used to declare a potentially asynchronous object, conceptually similar to a Java or C# thread, but in many ways, much closer to an Erlang process. Services may be singletons, and may not be immutable. There aren't any good examples of service in the core library, but the services.x test highlights the asynchronous and continuation-based behaviors of the service, using both an explicit Future-style programming model, and the implicit async/await style.
  • interface defines just the surface area (the API) of a class, and may include default implementations of that API.
  • mixin declares a cross-cutting composition that can be incorporated into another composition.
Each of these, and the forms of composition available to each, will be covered in more detail in subsequent articles. In the meantime, if you're curious about the raw syntax, see bnf.x, and if you're curious about how the parsing of the syntax works, see parseTypeCompositionComponent() in the Parser. The AST node for type compositions is TypeCompositionStatement.

2019/07/13

A pane in the glass

One of the best visual metaphors for object systems is a pane of glass. Imagine having a pane of glass and a red dry-erase marker; use that marker to draw and fill in some small circles on the glass. Now, take another pane of glass, and using a blue dry-erase marker this time, do the same thing, and then set that second pane on top of the first one. When you look through the panes from above, you see these small circles circles from both panes, almost as if they were on one pane.

One of these circles represents a virtual behavior. A virtual method, for example. Imagine that the pane of glass is magically subdivided as a grid, and those circles were magically located within the cells of that grid. Now, as you look through those two panes of glass together, those red circles that you see are methods implemented on the base class, and the blue circles are methods implemented on the derived class, because we put the red on the bottom (the base), and the blue overlaid it. And perhaps you might see some purple circles, representing methods that exist on the base class and are overridden on the derived class.

We can repeat the experiment with another pane of glass, and a yellow marker, but at this point, it's getting very difficult to hold and juggle all of this glass, so we need a special holder for these panes of glass. Since this experiment is in our mind's eye, we can instantly build whatever we need to hold these (and many more) panes of glass. We need something almost like the adjustable shelves in an oven -- a sort of "glass shelf system" that allows us to slide any pane-of-glass into, and pull any pane-of-glass out of this holder. We construct it to be free standing, so that we can look through it from above, and we build it with a light source beneath it that helps to provide illumination through our panes of glass. What we have now is our collection of panes of glass, any of which might have those colored circles laid out in a grid, and we can now appreciate our beautiful coloring job when we look down through all of that glass, from above.

It is tedious to build such a thing in our head, but it serves a most excellent purpose, for now we are all looking at the same thing together, and sharing words that have meaning because we are looking at the same thing.

For example, when we use the term method identity, we are referring to a pair (x,y) of coordinates that identify a location (a cell) in the grid on the glass. (In the real world, we know that a method has other means of identifying itself, such as a name and perhaps some information about its parameters, but in our mind's eye, it's far simpler to draw circles on a grid on glass with dry erase markers.)

And when we say that there is no such method, we mean that we look from above through the grid and there is no color in a particular cell -- not on the top pane of glass, but also not on any pane of glass under it.

And when we talk about a virtual method invocation, we mean that for a method identity (a cell location in the grid) in which we can see a color circle from above, we slide out the top pane of glass and see if the cell in question has a circle on this top pane of glass, and if it does, that circle represents the behavior (the code) of that method to execute. If on the other hand, it does not have a circle in that cell, then we slide the glass back into its shelf, we pull out the next piece of glass, we examine it, and we keep repeating this process until we find the first piece of glass that has a circle in that cell.

And when we talk about a super method invocation, what we mean is that during the execution of that code, if the code refers to its super, it is referring to the next circle that we would find if we were to go back and continue pulling out those glass shelves one by one and examining them, as we were when we performed the original virtual method invocation. If in doing so, we get to the last piece of glass and we have not found that circle, we would say that there is no super. If on the other hand we find that circle on a subsequent piece of glass, then that circle represents the super -- the code of the super method to execute.

And when we talk about a method chain, we are referring to that first circle that we found for the virtual method invocation, and also its super, and also its super, until we get to the end and there are no more supers. That sequence of circles is the method chain.

That's quite a vocabulary that we have developed, but it is invaluable in designing and discussing how an object system works. More importantly, it's fundamental to understanding the next concept, because the concept we're about to describe doesn't yet exist outside of the Ecstasy language. In The Quest for Equality, we introduced a notion of equality that is unlike any that can be found today in other languages. It is neither virtual (since it based on a function, not a virtual method), nor is it static (since it is based on run-time type information), nor is it dynamic (since the type information reflects the declared compile-time type information). So what is it, then?

Equality is an example of a funky interface. It is like a interface type (a C++ pure virtual class, or a Java/C# interface) in many ways, except that we do not stand over the panes of glass and look for a method in the manner that we described above. No, a funky interface is different, because it knows which pane of glass it refers to.

First, in terms of implementing a funky interface, what it means is that the pane of glass on which the implementation occurs will (and must) contain a colored-in circle for each of the functions (not methods) of the funky interface. (By now, you must be realizing how the funky interface got its name.) So when we slide out the pane of glass for a an Orderable implementation, we will see both the equals and the compare function circles colored in.

Second, in terms of using a funky interface, that compile-time type being compared-for-order (the Orderable interface) represents the slot of the pane of glass that we will find that compare function on. But it is not necessary that every pane of glass have that function for the type to be orderable. Instead, we begin at that pane of glass, and check if it implements that funky interface, and if it does not, then we proceed to the next lower pane of glass, until we find the one that does.

Remember when we said that an implementation of a funky interface must contain a colored-in circle for every one of the functions from the funky interface? That is because a funky interface represents a tight coupling of related functions. The Orderable example above is a good one, because both the equals and the compare function use some concept, some definition of equality, and thus if one changes, we must expect that the other changes as well.

But there are a few other obvious examples, such as Hashable: If you change the definition of equals, then you must also make sure that the hash code calculation for two equal objects produces the same result for each, and vice versa. Thus Hashable is a funky interface with those two functions, equals and hashCode.

And since equals shows up in both Hashable and Orderable, if one is to implement those two funky interfaces, then it is clear that filling in any one of those circles on a pane of glass requires filling in them all. And this allows the compiler to detect when a higher pane of glass attempts to fill in just some subset of those circles, which would naturally lead to errors in the running program. In other words, derived types must continue to respect the contracts from the funky interfaces of the base types.

Because to do otherwise would be a pane.

2019/07/11

If it Quacks

Ecstasy supports both type tests and type assertions. In most languages, a type assertion is called a cast, which may result in a compiler error or (in the case of languages with run-time type information) a run-time exception. A type assertion in Ecstasy is a run-time, type-safe operation, and importantly, can be expressed as an explicitly-left-associative operation:
String foo(Object o)
    {
    return o.as(String); // could throw TypeMismatch
    }
Languages with run-time type information usually provide an additional, non-asserting means to test for a particular type at run-time, such as the Java instanceof binary operator, or the C# is binary operator. A type test in Ecstasy can be performed as an explicitly-left-associative operation:
String foo(Object o)
    {
    return o.is(String) ? o : "hello, world!";
    }
Normally, an object cannot be of a certain type, unless it is explicitly declared to be of that type. For example, even if the imaginary class FakeString has all of the same properties and methods as the String class, instances of FakeString cannot be cast to a String.

Some languages do support such a thing, however. It's called duck typing, because "if it walks like a duck, and quacks like a duck, then it's a duck". An early prototype of Ecstasy had this feature for all types, and even provided a composition keyword, impersonates, to automate the composition of ducks and duck-like creatures. However, the capability did not mesh well with the design of the class-based portion of the type system, and was ultimately rejected for its unanswerable questions and potential incompatibilities.

Because duck typing is so useful, especially when working across the boundaries of loosely-coupled modules, one aspect of duck typing was explicitly retained: The ability to duck-type an interface. In many ways, an Ecstasy interface is simply a named type, i.e. a type plus a name. (This is not strictly true, but for this conversation, it will suffice.)  And thus, in Ecstasy, one can make a Gosling Duck:
interface Duck
    {
    void waddle();
    void quack();
    }

class Gosling
    {
    void waddle() {}
    void quack() {}
    }

Duck foo(Gosling james)
    {
    return james;
    }
(No ducks were harmed in the making of this blog entry.)

2019/07/10

Literally awesome!

Ecstasy supports a rich set of literals -- too much to cover in a single post, so consider this the first installment. It's important to lay out, up front, why a language supports literals, and what its goals are for the design in doing so.

First, when building a language, literals are often terminal constructs, such that other things in the language can be composed of them.

Second, a literal allows an efficient, human-readable encoding of information. For example, for most of us, it is far easier to read the number 42 than something like: 

new Byte(false, false, true, false, true, false, true, false)

Ecstasy's design goals for literals are fairly straight-forward:
  1. Common constant types supported by the core runtime library should have a literal form. Examples include: Bits, nibbles, bytes, binary strings, integers, characters, character strings, dates, times, date/times, time durations, etc.
  2. Common complex types supported by the core runtime library should have a literal form. Examples include: Tuples, arrays, lists, sets, and maps (aka directories).
  3. Literal formats should emphasize readability, and the formats should be fairly obvious to a programmer.
  4. It should be easy to work with literal formats using only a text editor.
  5. Literals should make common programming tasks simpler, where possible.
Integers
A "whole number", or an integer, starts with an optional sign, followed by an optional radix indicator (such as "0b" for binary, "0o" for octal, or "0x" for hex), followed by the digits of the appropriate radix, with optional underscores between digits to separate digits as desired. The BNF is in the language specification, but the simple explanation above should suffice. Here are some examples:
0
-1
42
0xFF
0b10_1010_1010_1010_1010
12345678901234567890123456789012345678901234567890
So, what is the type of each of the above? A 32-bit "int"? A 64-bit "int"? No. Each of the above is an IntLiteral, a const class. Just think of IntLiteral as an object that has a good idea how to look on the screen, and simultaneously knows what values of various numeric types it can represent. The benefits are fairly obvious, in terms of support for arbitrary integer sizes (without weird type casting or literal suffixes like "L"), and support for other numeric types whose range may be far beyond the range of any arbitrary fixed-length integer type.

Characters
A character is a single-quoted Unicode code point, with predictable support for escapes using the backslash. If necessary to encode Unicode characters in the range up to U+FFFF, the format \u1234 can be used; beyond that range, the format \U12345678 can be used. Here are some examples:
'a'
' '
'\''
'\t'
This literal type is implemented by Char, a const class.

Strings
A (character) string is a double-quote enclosed sequence of characters, supporting the same escapes as are supported for character literals. Here are some examples:
""
"Hello, world!"
"This is an example of \"quotes\" inside \"quotes\""
"Multiple\nlines\nof\ntext."
Multi-line strings are freeform, which means that character escapes are not processed; Unicode escapes, on the other hand, are supported, because they are handled by the earlier "lexer" stage of the compilation. Multi-line strings use a hard left border, defined by the "pipe" ("|") character; the first line of a multi-line string begins with a back-tick ("`") followed by a pipe. Here is an example:
String s = `|This is a test of
            |a "multiline" string
            |containing | and \ and ` and ' and " etc.
            ;  // <--- look at this
Like an end-of-line comment, the multi-line string takes everything from the pipe to the end of the line, as-is, which is why the semicolon in the example above has to be placed on the following line.

A template allows a string to be formed dynamically from any valid expression. The format of the template string is the same as a normal string, except prefixed by the dollar sign ("$"); expressions inside the string are prefixed by dollar-sign + open-curly ("${") and suffixed by close-curly ("}"). Here are a few examples:
$"Hello, ${name}!"
$"2 + 2 = ${2 + 2}."
$"Finished in ${timer.elapsed.milliseconds}ms."
$"Finished in ${{timer.stop(); return timer.elapsed;}}"
Templates are handy, and making up good examples is challenging, but we already use templates all over the place. The last example is quite interesting, in that it shows a statement expression (syntactically, a lambda body) inside of the template expression.

Templates can also be used with multi-line strings, which is denoted by using a dollar sign instead of the opening back-tick:
String s = $|# TOML doc
            |[name]
            |first = "{person.firstname}"
            |last = "{person.lastname}"
            ;
Finally, if the string you need to glue into your code is too big and ugly to put into the source file, then don't. Just stick it in its own file in the same directory; for example, in a file named "ugly.txt":
String s = $./ugly.txt;
Yeah. That was easy.

The literal type for all of these forms of string is implemented by String, a const class.

Arrays
An array literal is a square-bracket enclosed list of values.

Here are some examples:
[]
['a', 'b', 'c']
[1, 2, 3]
This literal type is implemented by the Array class, which is variably mutable: Array literals are either Persistent (if they contain any values that are not compile-time constants) or Constant (if they contain only compile-time constants).

Summary
This was just a brief introduction to literals in Ecstasy. Each of these literal forms has many more rules than we covered here, but those rules are there to allow for more expression (readability) in the source code, and not to restrict it. The forms for these literals are designed to make it super easy to write and very pleasant to read.

The rules do make the lexer and the parser more complex, but we look at it this way: The compiler only has to get written twice (one prototype to bootstrap the language, and then the real one written in natural Ecstasy code), so no matter how much work it is to make the language easier to use, we get to amortize that cost across many, many users over many, many years.

Literally.

2019/07/09

Null is no Exception

Since Tony Hoare formalized the concept of "null" in 1965, we have lived with an entire family of languages (including C, C++, Java, and C#) that made it possible for a pointer to contain a purposefully illegal address, for the purpose of representing the lack of a value. That purposefully illegal address is called null. (Or NULL, NIL, nil, etc.)

The reasoning was simple: Without using any additional memory, a pointer could be made to serve two purposes: First, to indicate whether or not a value exists, and secondly, what that value is if and only if it exists.

As type systems advanced, such that pointers became type-safe references instead of arbitrary integers, it became necessary to represent null as a typed value. To make it possible to assign the value null to any reference, these type systems made null a sub-type of all other types; otherwise, null would not be assignment compatible with any type other than the null type itself.


The use of null in all of these languages had an unfortunate side effect: Because null could be assigned to any type, it logically followed that each and every value might be null. That means that every single access to a value requires a null check, which in turn generates an exception in languages like Java (NullPointerException) and C# (NullReferenceException). In C, such code just segfaults (aka "Access Violation" in Windows) and core-dumps. Yay!

To avoid segfaults and exceptions, it became necessary to sprinkle code with lots of these:
if (s != null)
    {
    ...
    }
(Yay!)

There is an elegant solution to this ugliness, which is to make null into its own normal type, and not some magical "subclass of all classes" class, or "subtype of all types" type. In other words, simply by making the null value into an object reference of some normal class, it prevents that reference from being assigned willy-nilly to references of any other random type.

The complete code for the Nullable type (found in Ecstasy's module.x) is:
enum Nullable { Null }
That one line of code declares an enumeration class, called Nullable, with one value, called Null.
(Advanced: From an inheritance point of view, Null extends Nullable implements Object. From a composition point-of-view, as an enum, the class for Nullable incorporates the Enumeration mixin while the Nullable class itself is an abstract enum, and Null is an enum value. An enum value is a singleton const, which automatically implements both the Enum and Const interfaces. See source files: module.x, Const.x, Enum.x, Class.x, and Enumeration.x.)
This approach introduces some new requirements for the language's type system. First, a type system must be able to represent composite types, such as intersection types, union types, and difference types. Ecstasy represents intersection types with the "or" ("|") operator, because the code "(A|B)" reads "either type A or type B", which means that only the intersection of those two types can be assumed. (Apologies to any mathematicians reading this, but the "U" on our keyboard was stuck in the right-side-up position.)

Thus, to declare a type that can hold a value of either Nullable or String, and assign it a predictable value, one could write:
Nullable | String s = "Hello, world!";
This would quickly get old, so a short-hand notation for the "Nullable|" portion is the type-postfix "?"; here is the rewritten form of the above declaration, using the short-hand notation:
String? s = "Hello, world!";
Since the variable "s" is either a String value or a Nullable value, one can not ask it for its size:
Int len = s.size;    // compiler error!
The reason that "s" does not have a size is that its type is “Nullable or String”, and the Nullable type does not have a size property. This allows the compiler to know that the size property cannot be requested; this is an example of compile-time type safety. (Run-time type safety is exhibited by throwing a NullPointerException, etc.; Ecstasy has no equivalent to this exception, because such an exception cannot occur! "And there was much rejoicing.")

Compile-type type safety allows the compiler to know when a value might be Null. By checking if the type is a String, the compiler subsequently knows that the value cannot be Null, and specifically that the value is a String, after which it is safe to obtain the String size:
if (s.is(String))
    {
    console.println($"String s is ${s.size} characters long.");
    }
Similarly, if the code explicitly compares to the Null value, then compiler can know when the value is or is not Null. The above code could be modified slightly by first testing if the value is not Null, so that the compiler subsequently knows (by process of elimination) that the value is a Stringafter which it is safe to obtain the String size:
if (s != Null)
    {
    console.println($"String s is ${s.size} characters long.");
    } 
The postfix "?" operator is a short-circuiting operator that performs the same not-Null test, so the above code could be written instead as:
console.println($"String s is ${s?.size} characters long.");
The short-circuiting "?" operator can be grounded using the else (":") operator. In the following example, if "a" is Null, or if "a.b" is Null, or if "a.b.c" is Null, then the result is the predictable value of "Hello, world!", otherwise the result is the value of "a.b.c":
String s = a?.b?.c? : "Hello, world!";
Ecstasy combines the postfix "?" and the else (":") operator into the elvis ("?:") operator:
String s = a ?: "Hello, world!";
The above code has the same effect as:
String s = a? : "Hello, world!";
As with most binary operators, it is possible to combine the operator with the assigment operator, such that:
x = x ?: y;
... can be rewritten using the elvis assignment operator as:
x ?:= y;
Notice the similarity in the postfix "?" operator,  the else (":") operator, and the elvis operator, with the ternary operator; each of the four following lines of code has the same result:
x = x!=Null ? x : y;    
x = x? : y;    
x = x ?: y;
x ?:= y;
That is a lot to wrap one's head around, but there is a simple logic behind it.


Finally, there is a special Null-aware assignment operator that splits a nullable type, such as "String?", into a tuple of Boolean and the (conditional) non-nullable portion of the type (e.g. "String"). This can be used wherever a condition can be used, such as in an "if" or "while" statement. For example, imagine some method or function that can return a nullable string value:
String? foo();
Other than the operator, this example should seem quite familiar by now:
if (String s ?= foo())
    {
    console.println($"String s is ${s.size} characters long.");
    } 
In the above example, if the function returns Null, then the result is the tuple (False), which is consumed by the if, causing the "else" branch of the if statement to be executed. Conversely, if the function returns a String value, then the result is the tuple (True, string-value), of which the (True) is consumed by the if, causing the string-value to be consumed by the assignment, and causing the "then" branch of the if statement to be executed.

Thus, it should be obvious that these two statements will have the same result:
s ?= foo();
s = foo()?;
As just another normal value, and thus without any mind-bendingly-crazy type system rules to accomodate some magical null value, Null is simultaneously less troublesome and more useful.

The null is dead. Long live the Null.