EcstasyLang

Language Guide

2023-01-25T15:00:00.002-05:00

The Ecstasy Language Guide is slowly taking shape. You may recognize some of the topics and material from this blog. Comments, suggestions, requests, and other feedback are welcome.

Documentation for Ecstasy is located on the Github xtclang wiki.

Links to XDK downloads are located on the xtclang.org website.

Presenting Ecstasy at the NY Java SIG tonight, 10 November 2022

2022-11-10T09:43:00.002-05:00

Cameron Purdy will be talking about the future of computing using the Ecstasy programming language.

Ecstasy at the NY Java SIG (meetup link) (EventBrite link)

The Ecstasy discussion forum

2021-10-05T14:20:00.001-04:00

Fairly recently, we've switched over to primarily using the GitHub discussion forums for public discussions of Ecstasy work. We'll be de-emphasizing our Slack channel.

The forums are a much easier place to manage and read discussion threads than moderated comments on a blog, so we'll be relying on it more and more for conversations around articles that are posted here, as well as for group design work, long range planning, etc.

If you have ideas for new blog posts, articles, design ideas, and so on, then please post a message on the forums. Thanks!

How to use the Ecstasy Debugger

2021-09-05T18:00:00.001-04:00

Until recently, debugging Ecstasy code was an exercise in frustration, because there was no debugger. One could step through the interpreter's operations in a Java debugger, which was not for the faint of heart. The alternative, not surprisingly, was to sprinkle one's code with print statements, such as:

@Inject Console console;
console.println($"The value of i={i}");

Need to see another variable? Change the code, recompile, and run again. Repeat until everything works.

This is, of course, how all programmers used to debug code, back between the invention of the printing press and the invention of the interactive debugger.

So it should come as no surprise that Ecstasy now has an interactive debugger. Albeit, currently prototyped in text mode.

To enable the debugger, add a single line to your test:

assert:debug;

Or to enable it on a particular condition:

assert:debug i != 3;

That will pop you into the interactive debugger mode:

Call stack frames:              | Variables and watches:
--------------------------------|---------------------------------
>0  TestSimple.run():10         | 0  this = (TestSimple.test.org)
 1  Service TestSimple.test.org | 1  i = (Int) 3

Now you can type '?' for help, which displays all of the debugger commands, including breakpoint management, code stepping (in/out/thru), watches, and so on.

And yes, it is text mode, so it looks like some legit UNIX tool from 1995. Need a GUI? We'll need to finish project X-wing to build that! 🤣

It's the little things ...

2021-08-15T17:00:00.001-04:00

Ecstasy compiles to an Intermediate Representation (IR), which is a binary format intended to carry a rich set of information that can be used by a compiler back-end to generate the actual native code that will run on a machine's CPU. In the language and compiler field, Intermediate Representation is also know as: Intermediate Language (IL), byte code, bit code, p-code, and several other names.

Ecstasy IR was designed to be extremely tight (small), but it was also designed to support extremely large projects. To that end, all of the operators and addressing use 64 bit values; for example, the body of function can be 2^63-1 bytes long, and so the address (in either a relative or absolute form) specified by any jump operation must be able to encode 64-bit address values.

These two requirements are in natural conflict. The first says "keep it small", while the latter says "make it large". And this problem applies to many things besides jump addresses: register numbers, parameter counts, references into the constant pool, and so on. We recognized that solving this conflict in a uniform and efficient manner, and doing so up front, would be critical in keeping the design simple and consistent.

Using a byte-by-byte format, such as UTF-8, Portable Object Format (POF) encoded integers, or LEB-128 was deemed to be too computationally inefficient, in that the loop condition for the variable length encoding is re-evaluated within a tight loop (once per byte), completely trashing the CPU's branch predictor, and almost certainly killing instruction pipelining and speculative execution. Since these integers are used quite literally everywhere, a better encoding scheme was required.

Since a variable length encoding would be used, the resulting format was carefully designed to make the exact length of the entire value apparent from examining only the first byte, allowing all of the branch prediction penalties and possibilities for pipeline stalls to be lumped together at the front edge of the decoding process. Additionally, most CPUs have the ability to read even unaligned power-of-two-length integers from memory into a register and sign extend (either as part of the read, or with an inexpensive register-based op), so the design of the format allows for this possible optimization.

The XVM Integer Packing (XIP, pronounced "zip") format employs four encoding modes:

Tiny: For a value in the range -64..63 (7 bits), the value can be encoded in one byte. The least significant 7 bits of the value are shifted left by 1 bit, and the 0x1 bit is set to 1. When reading a packed integer, if bit 0x1 of the first byte is 1, then it's Tiny.
Small: For a value in the range -4096..4095 (13 bits), the value can be encoded in two bytes. The first byte contains the value 0x2 (010) in the least significant 3 bits, and bits 8-12 of the integer in bits 3-7; the second byte contains bits 0-7 of the integer.
Medium: For a value in the range -1048576..1048575 (21 bits), the value can be encoded in three bytes. The first byte contains the value 0x6 (110) in the least significant 3 bits, and bits 16-20 of the integer in bits 3-7; the second byte contains bits 8-15 of the integer; the third byte contains bits 0-7 of the integer.
Large: For a value in the range -(2^511)..2^511-1 (up to 512 bits), a value with s significant bits can be encoded in no less than 1+max(1,(s+7)/8) bytes; let b be the selected encoding length, in bytes. The first byte contains the value 0x0 in the least significant 2 bits (00), and the least 6 significant bits of (b-2) in bits 2-7. The following (b-1) bytes contain the least significant (b-1)*8 bits of the integer.

The Ecstasy implementation for writing XIP'd integers is found in the DataOutput class, and the implementation for reading XIP'd integers is found in the DataInput class.

The Java implementations for reading and writing XIP'd integers can be found in the PackedInteger class.

In summary: Having a consistent and efficient mechanism to encode arbitrary 64-bit integers in a byte stream is a fundamental boost for an IR designer, because they no longer worry about the ugly trade-off between "will this support big enough structures in the real world?" and "will this waste space?"

What is a type?

2021-06-12T02:33:00.002-04:00

Another Coffee Compiler Club call, another concept to explain.

What is a type?

It seems like such a simple question, until you try to answer it. What is a type?

When we designed Ecstasy, we were intent on boiling down types to some pure form, some simple atomic model of constructing everything in our programming universe. Types are, in some way, the periodic table of program elements. The protons, neutrons, and electrons of program existence. If you can't explain the basic building blocks of your universe, how can you explain how anything works?

Let's start with the conclusion: An object's type is the sum of its behavior.

Not having any real background in type theory, I have no idea whether this is an obvious truism or a nutty novel notion. I'm going to assume the latter, only because I've never used a language with a type definition like this, and also because it provides an excellent opportunity to explain the concept.

Most type systems that I've known and used are built around some combination of identity, state, and behavior. Java object types, for example, are based entirely on identity: The identity of a class is its type; the identity includes the name of the class, and its ancestors, in terms of super classes and implemented interfaces. Java types do carry detailed information about state (fields) and behavior (methods), but those details don't define the type; only the identity defines the type. The question "Is some object reference o of type T?" is never answered by what fields the object contains, nor by what methods it has, but rather by its identity, and solely by its identity.

Of course, this was quite a leap forward from the answer in C (and by extension, C++); in C, the question "Is some object reference o of type T?" always has the answer "Yes". Got a pointer? It turns out that your pointer points to whatever type you tell the compiler it points to. Is it a Cat? Is it a Dog? Is it a Car? Is it a House? The answer is always "Yes!" One must admit that C is quite an agreeable language when it comes to types. (With this in mind, it's also easy to understand how C code is responsible for so many security flaws.)

But let's drop all of these notions on the floor, and start over. Let's start with a made-up syntax for defining a type:

type
    {
    // things that define what the type is
    }

Looks kind of like a C structure. And we often think about types in exactly this way: They have a name (identity), and they have structure (fields).

type Person
    {
    String firstName;
    String lastName;
    }

But by our definition, this is completely wrong! We claimed that a type is only the sum of its behavior, and this example has only identity and state instead. So let's fix this, temporarily, and in the ugliest manner possible:

type // if we could name it, we'd call it "Person"
    {
    String getFirstName();
    void setFirstName(String);
    String getLastName();
    void setLastName(String);
    }

Interesting. Ugly, but interesting. We've turned state into behavior. And we turned the identity into a comment. But let's try an experiment: We'll allow a type to have an optional identity (including type name, type parameters, and ancestor types), just to make it easier to describe the types, and we'll create a few types that we can re-use to avoid some of that awful boilerplate:

type Ref<T>
    {
    T get();
    }

type Var<T> : Ref<T>
    {
    void set(T);
    }

type Person
    {
    Var<String> firstName();
    Var<String> lastName();
    }

So there is no state, per se, and the identity exists solely as a convenience for us, the reader, but we're almost back to where we started. In fact, if we introduce a short-hand notation for a zero-parameter method that returns a Var<T>, we are back where we started:

type Person
    {
    String firstName;  // this just means "Var<String> firstName()"
    String lastName;   // this just means "Var<String> lastName()"
    }

We're close, but not done, because we have referenced another type, "String". And what is a "String"? It's just another type, that has to be defined in the exact same way as "Person". To define a String, it helps to have an Array. To define an Array, which has a size, it helps to have an Int64. To define an Int64, it helps to have another Array of Bit. To define a Bit, it helps to have an IntLiteral to represent a 0 or 1. And to define an IntLiteral, it helps to have a String.

In other words, the type system forms a closed loop. All types are defined from other types, and types are defined solely by their behavior.

And behavior? Behavior is simply defined as a set of named methods, each taking zero or more typed parameters, and returning zero or more typed results.

So what does this mean?

To oversimplify the conclusion, it means that a mathematician can use set theory to implement a type calculus for such a type system. Really, that's it. You know, like curing cancer, or finding the holy grail.

In Ecstasy, all types are defined like this. Even Type is a Type.

Riddle me this

It should be pretty obvious now why we refer to this as the "Turtles Type System", since it's turtles the whole way down. One of the interesting riddles we encountered early on looked something like this:

type A
    {
    B foo();
    }
    
type B
    {
    A foo();
    }

Question: What is the difference between an A and a B?

Rules

One of the interesting things with such a type system is how easy it is to construct recursive rules from it. For example, we say that a method m consumes type T if any of the following holds true:

m has a parameter type declared as T;
m has a parameter type that produces T;
m has a return type that consumes T.

Similarly, we say that a method m produces type T if any of the following holds true:

m has a return type declared as T;
m has a return type that produces T;
m has a parameter type that consumes T.

These rules form the basis for checking the legality of things like method variance, such as co-variance and contra-variance, which in turn allows the type system to intelligently enforce type safety.

What is a Property?

2021-05-21T13:00:00.001-04:00

On a recent Cliff Click Coffee Compiler Club call, this question came up: What exactly is an Ecstasy property? It turns out that a property is a very obvious and simple thing, yet explaining it is not so simple.

Developers have different expectations when they hear the word "property", including:

It's just a named field in a structure.
It's something that has a getter and a setter.

These are logical expectations, because in languages like C++ and Java, "object properties" are just fields in structures, and in Java, the getter and setter methods are a well-known way to expose private fields as public virtual methods.

But unfortunately, starting with this train of thought takes us in the wrong direction, so let's forget all of this historical context, and back up to the beginning: What exactly is an Ecstasy property?

First, it is important to appreciate where an Ecstasy property exists:

A property can be declared inside any class, including module and package classes;
A property can be declared inside a property;
A property can be declared inside a method.

That a property can exist inside a class is not unusual, but it is a bit unusual that a property can exist inside another property, or even inside of a method.

In Ecstasy, everything is an object, so it follows that a property is an object. Objects have types. So what is the type of a property? A property, like a local variable, is an instance of Ref, a reference. If the property is mutable, then (also like a local variable), it is an instance of Var, which extends Ref.

Since a property is an object, and objects are instances of a class, then what is the class of a property? The class of a property is unknowable within Ecstasy. That does not mean that the property does not have a class; it simply means that the class is not visible from within the running code. Let's take a simple example:

  class Person
      {
      Int age;
      }

When we have an instance of Person, we can ask that object's reference for its actual class, and if it was created within the current container, it will return the Person class. But it's also possible that an object reference comes from outside of the container, in which case asking for the actual class will not return the actual class, but will instead return just the interface type through which the object can be viewed; this is the basis for container security, and is a fundamental building block of Ecstasy's strong security model.

When the Ecstasy runtime starts up, and an Ecstasy application is loaded and starts running, it is running in the outermost Ecstasy container, called "container 0", which is the container within which the application's module was loaded, and within which all other containers and objects are created, so one would think that the applications' properties would also be created within that "container 0" ... but that would be incorrect. In order for the initial application "container 0" to be created, there had to already be a Container class, and since that class comes from the core Ecstasy module, that means that the core Ecstasy module was already loaded in some container before "container 0" was created. And since it's "turtles the whole way down", it should be obvious that "container 0" is itself actually sitting on top of an infinite stack of turtles, which for purposes of keeping this short, we will simply refer to as "container -1".

"Wait ... what?!?" I can almost hear the WTFs being hurled at computer screens everywhere. But here's the simple truth: Anything outside of the container that the application is loaded within is simply unknowable. So if the application is started in something that we call "container 0", and a container always exists within a container, then we know that there must be some "container -1", if only because otherwise there couldn't be a "container 0". And just to keep this short and as-simple-as-possible, the runtime itself is that unknowable outer container, and the runtime itself is the container that loaded the Ecstasy module, and the runtime itself is the thing that knows how to "new" a class, and to automatically "new" whatever class is automatically used for each property as well. And in reality, that doesn't actually happen -- each property couldn't actually be a new object, right?

As with many things in Ecstasy, the answer is purposefully unknowable. If you ask for a property's reference, you do get back a usable object -- one that you can reflect on, pass around, store in a property somewhere, or whatever it is that you do with objects -- so obviously the property object "exists", by some definition; but like all turtles, it may not have existed before you looked at it, and it may not exist when you're not looking at it.

But here's where things get seriously cool: Since a property does have a class, we can augment that class! Of course, we don't use the extend keyword like when we sub-class (because we don't know what class to extend) ... but we can write a mixin for the property, because we do know the type to mix into! In fact, lots of functionality in Ecstasy is built by writing mixins that can be mixed into properties and local variables, such as futures, lazily calculated values, and watched values.

Furthermore, we can augment a property where we define it, as if it were a class. Here's a silly example:

  module Test
      {
      @Inject Console console;
      Log log = new ecstasy.io.ConsoleLog(console);

      void run()
          {
          log.add("Simple property example!");

          val o = new TestClass();
          for (Int i : 0..5)
              {
              val n = o.x;
              }

          o.&x.foo(); // &x gets the property, instead of de-referencing it
          }

      class TestClass
          {
          Int x
              {
              @Override Int get()
                  {
                  ++count;
                  return super();
                  }

              void foo()
                  {
                  log.add($"Someone accessed this property {count} times!");
                  }

              private Int count;
              }
          }
      }

And when we run it:

++++++ Loading module: Test +++++++

Simple property example!
Someone accessed this property 6 times!

Process finished with exit code 0

So we can augment our property with code where we define the property, we can mix in predefined functionality into a property (again, it's not magic, because you could have written those mixins yourself!), and we can even modify a property's behavior on a sub-class (assuming that the property wasn't private), because the subclass' property's class implicitly extends the super-class' property's class.

Okay, that was a lot of information, but it conveys an important point: A property isn't just some field in a structure. It's a real object, with a real class, and it behaves like a real object, with a real class.

But what about the property's value? Where is it stored? The Ecstasy type system determines which properties require a field for their storage, and automatically includes those fields in the underlying structure that is defined for each class, and thus exists for each object. In other words, all of a property's state is stored in whatever class the property "rolls up" into. Here are two simple examples:

  class Example
      {
      Int x; // this has a field on Example:struct

      Int y.get()
          {
          return 7; // this does not have a field
          }
      }

The type system has rules that determine when a field is required. The compiler uses these rules. The runtime uses these same rules. If a field is required, then the field will exist. If the field is not required, then the field will not exist.

So how is that field accessed? Well normally, we don't even think about that. If there's an object o with a property x, we just dereference the property o.x, and we never think about the field. But conceptually, the field is accessed by the last method in the call chain for the get() method on the property, so if you don't override the get() method, then accessing the property goes straight to the field.

Alternatively, sometimes it is necessary to work with an object's structure directly. A serialization library, for example, may need to access the field values to store them off, and subsequently build a new structure using that stored-off data to re-instantiate the corresponding object. Instead of trying to paste an example here, it makes more sense just to point to the JSON serialization implementation that does exactly this. While it might seem like a common thing (directly accessing a field), it turns out that the only places in the entire Ecstasy code base where fields are being accessed directly are (i) serialization implementations and (ii) tests of the compiler and runtime itself.

So, back to the initial question: What is an Ecstasy property?

A property has a name.
A property represents state with a value, of a type.
A property is contained within a class, a property, or a method.
A property is a container of classes, properties, and methods.
A property is itself a class.
A property can be customized, mixed into, inherited, and overridden.
Properties are virtual.

2020-12-31T00:01:00.001-05:00

Welcome to the Ecstasy Language, the first programming language built for the cloud!

This is an official blog of the Ecstasy Language open source project at xtclang.org.

Table of Contents:

Introduction
Modules Overview
Security Model
Design Priorities
Hierarchical Organization
How to assert yourself more
Conditional Methods
An Introduction to the Ecstasy Type System
A pointed paean for C
The Quest for Equality
Null is no Exception
Literally awesome!
Duck!
A Pane in the Glass
Composition
Hello World! (a working example!)
More Turtles
Coming from Java & Part 2 & Part 3

Blogs:

https://xtclang.blogspot.com/ (this blog) - technical topics
https://ecstasylang.blogspot.com/ - announcements and project-related topics

Repository:

xtclang @ GitHub - Source repository

Twitter:

@xtclang - Official Twitter account for the xtclang project.

Current Status:

Early developer access / prototype runtime

Email Inquiries: info at xtclang dot org

Coming from Java, Part III (Singletons)

2020-01-25T13:30:00.000-05:00

(This entry is the third installment of "Coming from Java". This is Part III; here is a link to Part II.)

There is a common pattern in Java, which is the singleton pattern. Basically, it allows you to create a single instance of a class, and then to be able to find that one same instance from anywhere in your code

public class Singleton
    {
    public static final Singleton INSTANCE = new Singleton();

    private Singleton()
        {
        // initialization stuff goes here
        }
    }

Creating a singleton in Ecstasy is accomplished by using the static keyword for the class:

static const Singleton
    {
    construct()
        {
        // initialization stuff goes here
        }
    }

There is one tiny detail, though: In Ecstasy, only const classes and service classes can be singletons, and the reason is quite fundamental to the purposes for which Ecstasy was designed.

To begin with, consider the differences between high-end computers that Java was designed for, and the high-end computers that Ecstasy is designed for. At the time that Java was initially being designed in the early 1990s, very few computers had more than one CPU -- Sun hadn't yet even released its first multiprocessor workstation! A server or a high-end workstation might have had 8MB of RAM, and -- still hard to believe! -- the IBM 370 mainframe of the day topped out at 16MB of RAM. That's megabytes -- a new notebook today has a thousand times that much memory!

Over a decade later, with dual-CPU servers now the norm, Java would get its first working memory model specification, specifying the JVM's guarantees for reads and writes occurring across multiple CPUs and among multiple threads.

Ecstasy, in contrast, was designed explicitly to take advantage of computers with potentially many thousands of cores, and with potentially many terabytes of main memory. To accomplish this, the design focused on disentangling threads from each other, and disentangling the memory -- what Java calls "the heap". The rationale is simple: In a modern computer, a single thread of code can perform on the order of 1-10 billion instructions per second, if and only if the thread does not share read/write memory with any other threads. The moment that a thread starts to use read/write memory that is being used by other threads, the performance (and the predictability of the performance) drops like a lead balloon.

To avoid the lead balloon effect, Ecstasy carves out exclusive zones of mutable memory, each with its own single conceptual thread. Each of these is called a service. An Ecstasy service can be thought of as a simple Turing Machine, or a simple von Neumann machine. And an Ecstasy service can be thought of as a boundary for mutability, because all mutation of a service's memory occurs within that service, and only immutable data can permeate that boundary. Services can communicate with other services, but that communication is conceptually asynchronous in nature, the communication is in the form of invocation, and only immutable data is exchanged.

Since a singleton is, by its nature, visible to all code running in an application, it therefore stands to reason that the singleton must be immutable -- so that it can be used by code running in any service -- or the singleton must itself be a service -- so that it can be invoked by code running in any other service.

And here is a straight-forward example in Ecstasy:

static service PageCounter
    {
    Int count = 0;
    Int hit()
        {
        return ++count;
        }
    }

Using the singleton is equally simple:

PageCounter.hit();

(Since it is a singleton, the name of the class implies the singleton instance.)

The same example can be constructed in Java, but thread safety is the responsibility of the programmer:

public class PageCounter
    {
    public static final PageCounter INSTANCE = new PageCounter();

    private PageCounter() {}
    
    private int count;
    
    synchronized public void setCount(int count)
        {
        this.count = count;
        }
    
    synchronized public int getCount()
        {
        return count;
        }
    
    synchronized public int hit()
        {
        return ++count;
        }
    }

// how to call the singleton
PageCounter.INSTANCE.hit();

There are a variety of ways to implement the counter in Java in order to make it more concurrent; for example, an atomic integer class can be used, or an atomic updater on a volatile field can be used, and so on. In Ecstasy, on the other hand, the choice of how to make the counter more concurrent is left completely up to the run-time implementation. The choice to allow the run-time to optimize this facet of execution is based on what we learned from Java's own HotSpot JVM -- which is that only the run-time has enough information to know which parts of the application would actually benefit from optimization in the first place, and which optimizations would work best, based on the actual run-time profiling information!

A few miscellaneous notes to wrap up this singular topic:

In Ecstasy, every module, package, and enum is a "static const" class, automatically. That means that modules and packages are all singleton objects, and every enum value is a singleton object.
Ecstasy does not require a memory model for explaining the order of reads and writes of mutable data among threads, because (as explained above) the Ecstasy design does not have mutable shared state among threads. (The Ecstasy design also uses services in lieu of explicit developer-managed threads, but that is a topic for another blog entry.)
There is no "global heap" in Ecstasy, so there is no "stop the world" garbage collection. Ecstasy is automatically garbage-collected, but each service can manage its own memory. The Ecstasy design effectively eliminates the "GC pause" problem, even for programs that use terabytes of RAM.

Coming from Java, Part II

2020-01-23T23:00:00.001-05:00

(This topic is large, so this entry is just the second installment. This is Part II; here is a link to Part I.)

In Object Oriented languages, objects represent the combination of related state and behavior. Java classes declare fields to hold state, and methods to provide behavior. Java fields and the methods are nested immediately within the class that contains them. This is an example of a common pattern for a Java class exposing state, stored in fields, via property accessors:

public class Person
    {
    public Person(String name, String phone)
        {
        setName(name);
        setPhone(phone);
        }
    
    private String name;
    private String phone;

    public String getName()
        {
        return name;
        }

    public void setName(String name)
        {
        assert name != null;
        this.name = name;
        }

    public String getPhone()
        {
        return phone;
        }

    public void setPhone(String phone)
        {
        this.phone = phone;
        }
    }

Ecstasy classes do not declare fields; instead, Ecstasy classes have properties that represent object state. A property is like an object, in that it can have its own nested state, and its own nested behavior. For example, to obtain the value of a property, one can invoke the get() method on the property. If the property is writable, then one can modify the value of the property by invoking the set() method on the property. (Of course, it is possible to use the simple dot notation for property access, which means that explicit calls to get() and set()are unnecessary.) Here is the above class, re-written in Ecstasy:

class Person(String name, String? phone);

You could also write it out in long-hand if you prefer; the following code compiles to the same exact result as the above code:

class Person
    {
    construct(String name, String? phone = Null)
        {
        this.name  = name;
        this.phone = phone;
        }
        
    String name;
    String? phone;
    }

It's possible in Java to make the "getter" and "setter" have different access, such as:

public String getName()
    {
    return name;
    }

private void setName(String name)
    {
    assert name != null;
    this.name = name;
    }

To accomplish this in Ecstasy, the equivalent is:

public/private String name;

The first access, "public", specifies that the property shows up in the public type as a Ref<String>; a Ref represents a read-only reference to a value. The second access, "private", specifies that the property shows up in the private type as a Var<String>; a Var represents both read and write access to the value.

Remember, though, that a property is like an object. Let's expand the Java example slightly, to validate that the name is not an empty String:

public void setName(String name)
    {
    assert name != null && name.length() > 0;
    this.name = name;
    }

In Ecstasy, the property contains a method called set(String) that we can override:

public/private String name.set(String name)
    {
    assert:arg name.size > 0;
    super(name);
    }

The above is just short-hand notation for:

public/private String name
    {
    @Override void set(String name)
        {
        assert:arg name.size > 0;
        super(name);
        }    
    }

There are a couple of important points here:

It's not the set() method that is private. The set() method is public, because it is part of the Var interface, as explained above.
Instead, the public Person type (known as Person:public or Person.PublicType) has a property that does not have a set() method, while the private Person type (known as Person:private or Person.PrivateType) has a property that does have a set() method.
In Java, "super" refers to the super-class. In Ecstasy, super is a reference to the function (like a function pointer) that is next in line to invoke in the virtual method's invocation chain. In other words, super is a function.
While it's not directly related, you can read more about the various specializations of the assert statement on this blog. The assert:arg statement produces an IllegalArgument exception if the assertion fails.

The interesting thing, though, is that we never have to deal with the field. We know it's there, because it has to be in order to hold the value, but the field doesn't have a name, we don't access it, and we don't modify it. Instead, we just call the super function for get() or set(), and at the end of that chain there is some implementation of the method (that we didn't have to write!) that accesses or stores the value for us using the field.

But what if we made it so that we could never reach the end of those method chains?

public/private String name
    {
    String get()
        {
        return "Bob";
        }

    void set(String name)
        {
        assert:arg name.size > 0;
        // do nothing with the name ... do not store it!
        }
    }

In this case, there would be no field for the name property, because it's obvious to the compiler that one is not needed!

So now it should be obvious that fields exist in Ecstasy, but that we never really have to mess around with them. Where are those fields actually held, though?

In one of the examples above, we talked about how the Person class has a public type and a private type, so you probably already guessed that the Person class also has a protected type, and you would be correct!

But the Person class has one more type: the Person:struct type. The Person:struct type has one property for each property of the Person class that needs a field. We call each property on the struct type a "field"; in Ecstasy, a field is just a property on a class' struct type.

The struct type is not user-definable. The struct type is automatically calculated by the compiler at compile-time, and by linker/loader at run-time. While the public, protected, and private types all refer to the same underlying object -- as if they were three different lenses through which you can view the same object -- the struct, on the other hand, is a separate object that is an implementation of the Struct interface.

For the purpose of this article, this is already way too much low-level information about structs, but the details are important for one reason: To understand constructors.

Constructors are weird. They live in a zone between non-existence and existence. They play by some extraordinary rules in Java, and the same is true in Ecstasy, because they fit into a zone of unknowns. Here's what a constructor looks like in Java, just to pick one at random from our own prototype compiler that was written in Java:

public StringConstant(ConstantPool pool, String sVal)
    {
    super(pool);

    assert sVal != null;
    m_sVal = sVal;
    }

First, in Java, a constructor must call either a different constructor on this class, or a constructor on the super class. Then, it is free to do other stuff, like checking parameters and initializing fields. Fields that aren't explicitly initialized are all set to their defaults, which is easy when null is a sub-class of everything.

Ecstasy is different. Not necessarily simpler or more complicated. Not necessarily better or worse. But it is different for very purposeful reasons:

Contruction is treated as a finite state automaton. Eliminating unknowns and improving predictability of execution is extremely important, and that is exactly what a finite state automaton does.
There is a period of time before the object is constructed. The developer gets complete control over that process.
There is a period of time after the object is constructed. The developer gets complete control over that process.
In between the before and the after, the developer is completely absent, and completely out of the picture for the moment of creation. During that moment, all the rules of object instantiation can be verified, and the object is created. We say that "the this becomes existent".

In that period before the object creation, there are two phases that the developer can implement:

The construct(...) function(s) allows the developer to specify what information is needed to initialize the state of the object, and the developer can validate that information and initialize the structure of the object, which is the aforementioned struct.
The assert() function allows the developer to collect, in one place, any assertions (or any other last-second work) that must occur before the object is created.

Here is an example of a constructor, from the Date class, which simply delegates to another constructor using the construct keword:

construct (Int year, Int month, Int day)
    {
    construct Date(calcEpochOffset(year, month, day));
    }

For both construct(...) and assert(), the this variable is the struct -- not the object, because it has not yet been created! After that code has all completed successfully, the struct is checked by the system to make sure that all necessary fields have been assigned a value, and then the object is instantiated based on the struct. Then -- after the moment of creation, and before the newly created "this" reference is returned to the code that invoked the new operator -- one more step occurs: The corresponding finally(...) function for each previously invoked construct(...) function is executed, so that the object itself gets to see itself (and finish anything that it needs to) before being returned to the code that requested it.

Here's an example from the Array class:

protected construct(ArrayDelegate<Element> delegate)
    {
    this.delegate = delegate;
    }
finally
    {
    if (mutability == Constant)
        {
        makeImmutable();
        }
    }

In this example, the construct(...) function fills in the fields of the struct, but because the Array object does not yet exist at this point, the construct(...) function can not call the makeImmutable() method on the Array -- until the Array actually exists! And that is the purpose of the finally function -- to allow the new Array object to perform behavior that must occur as if it were part of the instantiation of the object, before that object is returned to the code that requested the object to be created.

A class can define an assert() function (with no parameters) as well. Regardless of whether any particular construct(...) function is invoked by the new operator, or by a sub-class -- and note that a sub-class is not required to invoke any construct(...) function on its super-class! -- the assert() function will be invoked before the object is created.

There are many details regarding the specific order of execution, handling of exceptions, and so on, but this post hopefully has given you a glimpse into how object structure works in Ecstasy, and how Ecstasy objects are created.

(Continue to Part III.)

Coming from Java

2020-01-22T16:30:00.001-05:00

The first question that we get from new developers working on Ecstasy is how it is similar to, and how it is different from the languages that they already know and use. One of the goals of Ecstasy was to make the language instantly accessible to programmers who already were comfortable with any of the C family of languages, such as C++, Java, and C#. We'll start by looking at one such language, Java, which is one of the most widely used languages today.

(This topic is large, so this entry is just the first installment.)

Here's the pocket translation guide from Java to Ecstasy with respect to the type system:

Java's type system is a combination of primitive (machine) types and class-based types, with a few "hybrid" types, such as arrays, that fit neither category. Ecstasy's type system is simply class-based; there are no primitive types.
Java's null type has one value, null, that is assignment compatible with any reference type. The Ecstasy Nullable enumeration defines the value Null, although the lower-case null is also supported by alias. In Ecstasy, the Null enum value is only assignable to a Nullable type, or a super-type thereof such as Object.
Java's boolean type has two values, true and false. The Ecstasy Boolean enumeration defines the values False and True, although the lower-case false and true are also supported by alias.
Java's char type is a 2-byte unsigned integer that represents a common sub-set of Unicode characters. Ecstasy's Char class represents any Unicode code-point.
Java's int type is a 32-bit unchecked signed integer value; Java additionally has byte, short and long types for 8-bit, 16-bit, and 64-bit unchecked signed integer values. Ecstasy provides both checked and unchecked, and both signed and unsigned implementations for 8-bit, 16-bit, 32-bit, 64-bit, 128-bit, and variable-length integers (conceptually similar to Java's BigInteger class). For example, UInt32 is a checked unsigned 32-bit integer, and @Unchecked Int128 is an unchecked signed 128-bit integer. Additionally, the alias Int maps to the 64-bit signed integer, Int64, and the alias Byte maps to the 8-bit unsigned integer, UInt8. (In Java, the byte type is signed.)
Java also has some proprietary support for decimal values via the BigDecimal class. Ecstasy provides standard 32-bit, 64-bit, 128-bit, and variable-length decimal value support via the Dec32, Dec64, Dec128, and VarDec classes; these are implementations of the IEEE 754-2008 standard for decimal floating point.
Java's float and double represent 32-bit and 64-bit IEEE 754 binary floating point values. Ecstasy provides standard 16-bit, 32-bit, 64-bit, 128-bit, and variable-length IEEE 754 binary floating point values via the Float16, Float32, Float64, Float128, and VarFloat classes. Additionally, Ecstasy provides the ML- and AI-optimized "brain float 16" type, via the BFloat16 class.
Java's primitive type system is based on a 32-bit word size. Ecstasy's does not have a primitive type system, and thus does not have a "word size", but in practice, Ecstasy defaults to using 64-bit integer, decimal, and binary floating point values.
In Java, the value "0" is an int. The compiler converts it, if necessary, to other types. In Ecstasy, the value "0" is an IntLiteral, which has the ability (both at compile-time and run-time) to convert to any numeric type. Unlike Java, there is no need for an "l"/"L" suffix on integers to inform the compiler that a value is a long.
In Java, the value "0.0" is a double. In Ecstasy, the value "0.0" is an FPLiteral, which has the ability (both at compile-time and run-time) to convert to any decimal or binary floating point type. Unlike Java, there is no need for an "f"/"F" or "d"/"D" suffix to inform the compiler that a value is a 32-bit or 64-bit value.
Java supports the class, enum, and interface keywords for declaring classes. Ecstasy supports these three keywords, plus: module, package, service, mixin, and typedef.
Classes such as Int64, Float64, Dec64, Char, and String that are used to hold constant values are implemented in Ecstasy using the const keyword instead of the class keyword. Instances of a const class are automatically made immutable as part of their construction; specifically, no reference to an object of a const class becomes visible until after the object is made immutable.
Ecstasy classes such as Nullable and Boolean are enumerations; enumerations are abstract classes that contain enum values, such as False and True. Enum values are singleton const classes.
Ecstasy module and package classes are singleton const classes, and are written like any other classes would be. Declarative modularity was recently introduced into Java via Project Jigsaw, with some similar goals. You can read more about Ecstasy modules on this blog.
Java does not have any language capabilities similar to a service, a mixin, or a typedef in Ecstasy. A service class provides a boundary for concurrent and/or asynchronous behavior, so it can be thought of in the same manner as a Java thread; however, an Ecstasy application may have millions of service objects, while it is unlikely that so many threads would be desirable in any language. An Ecstasy mixin provides cross-cutting functionality; in Java, some combination of boilerplate, delegation, and cut & paste would be used instead. An Ecstasy typedef is a means to provide a name to a type that itself can be expressed using the type algebra of the Ecstasy language. You can read more about class composition on this blog.

To put this into practice, consider this Java example:

package com.mycompany.myproduct.gui;

class Point
        implements Comparable<Point>
    {
    public Point(int x, int y)
        {
        this.x = x;
        this.y = y;
        }

    private final int x;
    private final int y;

    public int getX()
        {
        return x;
        }

    public int getY()
        {
        return y;
        }

    @Override
    public int hashCode()
        {
        return x ^ y;
        }

    @Override
    public boolean equals(Object obj)
        {
        if (obj instanceof Point)
            {
            Point that = (Point) obj;
            return this.x == that.x && this.y == that.y;
            }

        return false;
        }

    @Override
    public String toString()
        {
        return "Point{x=" + x + ", y=" + y + "}";
        }

    @Override
    public int compareTo(Point that)
        {
        int n = this.x - that.x;
        if (n == 0)
            {
            n = this.y - that.y;
            }
        return n;
        }
    }

And here is the corresponding Ecstasy code:

const Point(Int x, Int y);

This particular example is dramatic, because the const class declaration in Ecstasy implies automatic implementations of the Comparable, Hashable, Orderable, and Stringable interfaces. Furthermore, the parameters specified at the class level declare two properties, and a constructor.

Local variable declarations are similar, but the use of the comma as a general purpose separator (as in C) is not permitted. For example, in Java:

int a=0, b=0, c=0;

In Ecstasy, these would likely become separate declarations:

Int a=0;
Int b=0;
Int c=0;

It is also possible (and occasionally necessary) to declare and initialize multiple left-hand-side variable ("L-values"); the above example could be written as:

(Int a, Int b, Int c) = (0, 0, 0);

Note that the left-hand-side is in the form of a tuple, and the right-hand-side has a corresponding tuple type. In this form, the type of each left-hand-side variable can differ, and a type is only specified when declaring a variable. For example, if a function "foo()" exists that returns both an Int and a String, then the above-defined variable "c" and a new String variable can be assigned as follows:

(c, String d) = foo();

This introduces a dramatic difference in Ecstasy: Methods and functions can return more than one value, and those return values can be treated either as individual values, or as a tuple of values.

Furthermore, method and function parameters can also be provided either as individual values, or as a tuple of values, or as named values. Consider this example in Java that uses multiple delegating constructors:

class ErrorList
    {
    public ErrorList(int maxErrors)
        {
        this(maxErrors, false);
        }

    public ErrorList(boolean abortOnError)
        {
        this(0, abortOnError);
        }

    public Example(int max, boolean abortOnError)
        {
        this.max = max;
        this.abortOnError = abortOnError;
        // ...        
        }

    private int max;
    private boolean abortOnError;
    }

Using default parameter values, the Ecstasy equivalent of this example would not need all of those redundant constructors, each with slightly different signatures:

class ErrorList(Int max=0, Boolean abortOnError=False)
    {
    // ...    
    }

And the class could then be constructed using any combination of named parameters, as in the following example:

ErrorList errs = new ErrorList(abortOnError=True);

For the most part, though, the Ecstasy syntax is designed to maintain a high level of compatibility with Java (and C#) syntax. One area in which the syntax differs is with respect to type assertions and type tests. In Java, the type test uses the relational operator "instanceof", and the type assertion uses the C-style cast syntax, which often requires two sets of parenthesis, as in this Java code:

if (x instanceof List)
    {
    ((List) x).add(item);
    }

Ecstasy simplifies this syntax dramatically by employing the dot notation that is already so naturally used for property access and method invocation. The "is" keyword replaces "instanceof", and the "as" keyword replaces the awkward use of parenthesis for the type assertion (aka "type casting"):

if (x.is(List))
    {
    x.as(List).add(item);
    }

This approach is far easier to read, because it follows left-to-right, with no precedence concerns. Furthermore, if the compiler determines that the value "x" is not subject to concurrent modification, then type inference obviates the need for the type-assertion altogether:

if (x.is(List))
    {
    x.add(item);
    }

Operator precedence also differs slightly from Java, in order to simplify more operators into left-to-right ordering and to resolve a number of cases in which parenthesis were awkwardly required in Java:

Operator        Description             Level   Associativity       
--------------  ----------------------  -----   -------------       
&               reference-of              1                         
                                                                    
++              post-increment            2     left to right       
--              post-decrement                                      
()              invoke a method                                     
[]              access array element                                
?               conditional                                         
.               access object member                                
.new            postfix object creation                             
.as             postfix type assertion                              
.is             postfix type comparison                             
                                                                    
++              pre-increment             3     right to left       
--              pre-decrement                                       
+               unary plus                                          
-               unary minus                                         
!               logical NOT                                         
~               bitwise NOT                                         
                                                                    
?:              conditional elvis         4     right to left       
                                                                    
*               multiplicative            5     left to right       
/                                                                   
%               (modulo)                                            
/%              (divide with remainder)                             
                                                                    
+               additive                  6     left to right       
-                                                                   
                                                                    
<< >>           bitwise                   7     left to right       
>>>                                                                 
&                                                                   
^                                                                   
|                                                                   
                                                                    
..              range/interval            8     left to right       
                                                                    
<  <=           relational                9     left to right       
>  >=                                                               
<=>             order ("star-trek")                                 
                                                                    
==              equality                 10     left to right       
!=                                                                  
                                                                    
&&              conditional AND          11     left to right       
                                                                    
^^              conditional XOR          12     left to right       
||              conditional OR                                      
                                                                    
? :             conditional ternary      13     right to left       
                                                                    
:               conditional ELSE         14     right to left

As you can see, a number of operators are grouped together, which previously each had their own precedence level; this implicitly employs left-to-right precedence for all operators within that grouping. Bitwise operators also have been moved to a significantly higher precedence level, which reduces the need for unnecessarily awkward parenthesization. Additionally, almost all operators map directly to methods, which means that explicit left-to-right behavior can be achieved by replacing a relational operator with the corresponding method invocation.

(Continue to Part II.)

More turtles

2019-10-13T17:30:00.000-04:00

The Ecstasy type system is called a Turtles Type System, because "it's turtles, the whole way down". This is, in many ways, a revolutionary approach to type systems. Object type systems have traditionally had primitive types (like a "periodic table of elements" for the language) from which all other types are built, but in Ecstasy things are a bit different. For example, an Ecstasy integer is composed of an array of bits, each of which is composed of an integer literal (0 or 1), which is in turn composed of a string, which is composed of an array of characters, each of which is composed of an integer. So we're right back where we started, with an integer -- and you can just recurse infinitely, because the types are all turtles.

Making a type system like this actually work is a challenge, so it didn't appear all at once. Recently, support for recursive type definitions was added, in order to support the JSON parsing project. Consider the following Ecstasy code:

/**
 * JSON primitive types are all JSON values except for arrays and objects.
 */
typedef (Nullable | Boolean | IntLiteral | FPLiteral | String) Primitive;

/**
 * JSON types include primitive types, array types, and map types.
 */
typedef (Primitive | Map<String, Doc> | Array<Doc>) Doc;

Here, in two lines of code (which could even be simplified to a single line, if we didn't want to split out primitive JSON values), we see a complete Ecstasy mapping of the JSON specification. That second typedef, though, is a doozy, because it refers to itself. If you stop and read it carefully it makes a lot of sense: A JSON document is either a primitive value, a map of string keys to JSON values (each of which could be an entire recursive document structure), or an array of JSON values (each of which could be an entire recursive document structure).

To keep it simple, consider the following example:

typedef (Int | List<Manifold>) Manifold;

Manifold m1 = 9;
Manifold m2 = [m1];
Manifold m3 = [m2];

console.println(m1);
console.println(m2);
console.println(m3);

When executed, this code will print:

9
[9]
[[9]]

But the amazing thing isn't that it works at all, but rather that it works with full type safety.

Hello World!

2019-08-01T18:45:00.001-04:00

In retrospect, the most obvious missing feature of Ecstasy is the prototypical "Hello World!" example.

The earliest adopters / experimenters / hackers who have been playing with Ecstasy for some time now were somehow able to divine the magic incantations necessary to get get code compiling and running (sometimes with help from our team), but it's time to make this process much easier.

This won't be a single update; rather, it is a process -- of moving the project from a small team that knows all of the undocumented nooks and crannies, out into the public sphere. The initial experience with Ecstasy should not be as soul-sucking and psychologically scarring as a Google job interview. For a new user, it should be straight-forward to get started, and not some experience like an "obstacle course" or "running the gauntlet".

To that end, we introduce step one, the Hello World:

module HelloWorld
    {
    void run()
        {
        @Inject Console console;
        console.println("Hello World!");
        }
    }

Here's a short explanation of the code, which is found in ./xdk/src/main/resources/xdk/examples/HelloWorld.x:

A module is the unit of compilation, loading, linking, and execution, so we need to write one of those. Don't worry -- as you can see, it's easy.
The xec command (which we'll cover below) looks for a method on the module called "run" that takes no parameters. (The module is a class, so "void run()" on the module is just a normal method.)
Ecstasy code is purposefully incapable of doing any I/O; for security reasons, there is nothing in the language (or in the compiled form of the language) that has any access to any hardware or OS resource. As a result, the code must depend on its container to provide something that implements the Console interface; this is called injection. The behavior of the console that is injected by the TestConnector is to print to stdout.
The declaration "@Inject Console console;" declares a read-only variable called console, and when it is de-referenced. it will always have a value that is a Console. (It is a contract; if the container could not -- or chose not to -- provide a Console, then the creation of the container itself would have failed.)
Hopefully, the line that prints out "Hello World!" is self-explanatory.

Here are the steps to getting this running:

The git utility is used for downloading project code. Open a terminal window (aka command window aka shell) and type "git" to verify that you have it installed and working. If you don't, then you can get git; if you develop on a Mac, git is already included in the Command Line Tools for XCode.
Java is used to run the current Ecstasy toolchain, and version 11 (or later) of the JDK is required. Open a terminal window, and type "java -version" to verify that you have the necessary version of Java installed. If necessary, you can download the free JDK 11 from the Amazon Corretto project, for example.
We strongly encourage you to download IntelliJ IDEA, if you don't already use it. (Or update it to the latest version, if you already use it.) Since Ecstasy is an open source project on GitHub, you can use the "Community" edition of IDEA. (We do think that it is an IDE worth paying for, so don't be afraid to splurge on the "Ultimate" edition!)
Determine a location to create a local repository for the XVM project. The rest of these instructions will assume Unix style paths and an installation location of ~/Development/xvm, but if you're on Windows, just create an XVM project directory somewhere, e.g. Development\xvm under your user directory.
From the terminal, in that XVM directory, execute: git clone https://github.com/xtclang/xvm.git This will take a few seconds (maybe minutes) to completely clone the project into your XVM directory.
Next, use the Gradle wrapper to build a local copy of the Ecstasy development kit (XDK) with the following command: ./gradlew build (or gradlew.exe build on Windows) This will take a minute or so to completely build the XDK.
The XDK is built under the ./xdk project directory under your XVM directory, specifically ./xdk/build/xdk sub-directory. You can copy the built XDK to a location of your choosing, but for these instructions, we will leave it in the location in which it was built.
To configure the toolchain for your OS, execute the appropriate command in the bin directory of the XDK; for example, on macOS, execute . ./xdk/build/xdk/bin/cfg_macos.sh. (Notice the dot and space at the beginning of the command; this is called a "source command" in Bash. Unfortunately, this does not work with the zsh shell that is now the default on macOS, so you have to run ./xdk/build/xdk/bin/cfg_macos.sh without the preceding source command, and then manually update the PATH to add ./xdk/build/xdk/bin.)
Now you can use the xtc, xec, and xam commands from the terminal. On some operating systems, if these executable files are not signed and/or notarized, you may get an error or a warning the first time that you run them. For example, macOS includes a program called GateKeeper that may need to be configured to allow these programs to be executed.
Each time that you open a new terminal window, you will need to execute the OS-specific configuation script to update the PATH variable; alternatively, you can configure your OS to automatically update the PATH for you, but the complexity of that topic is immense, and beyond the scope of this document. (On macOS and Linux, one normally would create a .profile file in one's home directory and add one line that says e.g. export PATH=$PATH:~/xvm/xdk/build/xdk/bin, but there are pages of conversation to read through on StackOverflow for when this simple approach fails to work for your configuration.)
To compile the HelloWorld example, use the xtc command: xtc ./xdk/build/xdk/examples/HelloWorld.x
The compiler places the compiled .xtc file into the current directory (which in this case is probably where you don't want it, but for the sake of this example, we'll ignore this detail). To execute the program: xec HelloWorld.xtc

And if all went well, you should see:

Hello World!

Composition

2019-07-14T15:30:00.001-04:00

In an OO language, one of the first questions to ask is how classes and types are composed. Sometimes, when looking at a new language, it's easy to get side-tracked by clever syntax, or syntactic "features", but while these are ultimately important, there is nothing more important in a language than being able to describe the shape of what one is building.

Ecstasy provides three basic shapes from which classes are composed:

Classes, which (just like in Java and C#) are useful for defining instantiable combinations of state and behavior.
Interfaces, which (just like in Java and C#) are useful for defining contracts, and may allow default behavior to be defined.
Mixins, which are used to define cross-cutting functionality.

One example of each of these from the core library is the Range class, the Sequential interface, and the Interval mixin. Consider this simple example:

for (Int i : 10..20)
    {
    // do something
    }

The expression "10..20" is an Range; it defines a "from value" and a "to value". The only requirement of a range is that its type must be Orderable, which is the funky interface that allows two objects to be compared for purposes of ordering.

The ability of a type to be ordered is a necessary but insufficient capability for iteration, which is what the for loop requires, and if you examine the Range class closely, you will notice that it does not implement Iterable. What it does, instead, is this:

const Range<Element extends Orderable>
        incorporates conditional Interval<Element extends Sequential>

Translated into English, that reads: "A range is a constant that contains elements, which must be of an orderable type. Additionally, for ranges whose elements are of a sequential type, the range will automatically incorporate the capabilities of an interval."

Think of the Sequential interface as the type that is necessary to support the "++" and "--" operators (pre-/post- increment/decrement). When a range of a sequential type is constructed, the composition of the range incorporates the Interval mixin, which in turn, being iterable, provides an iterator that can be used by the for loop.

A range of a non-sequential type cannot be iterated over, and an attempt to do so is detected by the compiler:

for (String s : "hello".."world")   // compiler error
    {
    // do something
    }

In the "const Interval" declaration shown above, the keyword used to declare the class was "const". To declare a class (in the abstract sense of the term), Ecstasy provides eight keywords:

module is used to declare a unit of compilation, or a unit of deployment. Java has a related concept, also called a module, and C# uses the term assembly. A module is a singleton const class; see Modules Overview.
package is used to declare a namespace within a module, which is kind of like creating a directory within a file system. Like module, a package is also a singleton const class.
class is used to declare any class that is not specialized as either a const or a service. Classes may be made immutable at run-time, but may not be singletons. For example, see ListMap.
const is used to declare a class that is immutable by the time that it finishes construction. Furthermore, it automatically provides implementations of a number of common interfaces, including both Orderable, Hashable, and Stringable. Consts can be singletons, and are always immutable. For example, see Int64, aka Int.
enum is used to declare an enumeration of values. The enumeration itself is an abstract const, and each enum value is a singleton const. For example, see Boolean.
service is used to declare a potentially asynchronous object, conceptually similar to a Java or C# thread, but in many ways, much closer to an Erlang process. Services may be singletons, and may not be immutable. There aren't any good examples of service in the core library, but the services.x test highlights the asynchronous and continuation-based behaviors of the service, using both an explicit Future-style programming model, and the implicit async/await style.
interface defines just the surface area (the API) of a class, and may include default implementations of that API.
mixin declares a cross-cutting composition that can be incorporated into another composition.

Each of these, and the forms of composition available to each, will be covered in more detail in subsequent articles. In the meantime, if you're curious about the raw syntax, see bnf.x, and if you're curious about how the parsing of the syntax works, see parseTypeCompositionComponent() in the Parser. The AST node for type compositions is TypeCompositionStatement.

A pane in the glass

2019-07-13T11:15:00.001-04:00

One of the best visual metaphors for object systems is a pane of glass. Imagine having a pane of glass and a red dry-erase marker; use that marker to draw and fill in some small circles on the glass. Now, take another pane of glass, and using a blue dry-erase marker this time, do the same thing, and then set that second pane on top of the first one. When you look through the panes from above, you see these small circles circles from both panes, almost as if they were on one pane.

One of these circles represents a virtual behavior. A virtual method, for example. Imagine that the pane of glass is magically subdivided as a grid, and those circles were magically located within the cells of that grid. Now, as you look through those two panes of glass together, those red circles that you see are methods implemented on the base class, and the blue circles are methods implemented on the derived class, because we put the red on the bottom (the base), and the blue overlaid it. And perhaps you might see some purple circles, representing methods that exist on the base class and are overridden on the derived class.

We can repeat the experiment with another pane of glass, and a yellow marker, but at this point, it's getting very difficult to hold and juggle all of this glass, so we need a special holder for these panes of glass. Since this experiment is in our mind's eye, we can instantly build whatever we need to hold these (and many more) panes of glass. We need something almost like the adjustable shelves in an oven -- a sort of "glass shelf system" that allows us to slide any pane-of-glass into, and pull any pane-of-glass out of this holder. We construct it to be free standing, so that we can look through it from above, and we build it with a light source beneath it that helps to provide illumination through our panes of glass. What we have now is our collection of panes of glass, any of which might have those colored circles laid out in a grid, and we can now appreciate our beautiful coloring job when we look down through all of that glass, from above.

It is tedious to build such a thing in our head, but it serves a most excellent purpose, for now we are all looking at the same thing together, and sharing words that have meaning because we are looking at the same thing.

For example, when we use the term method identity, we are referring to a pair (x,y) of coordinates that identify a location (a cell) in the grid on the glass. (In the real world, we know that a method has other means of identifying itself, such as a name and perhaps some information about its parameters, but in our mind's eye, it's far simpler to draw circles on a grid on glass with dry erase markers.)

And when we say that there is no such method, we mean that we look from above through the grid and there is no color in a particular cell -- not on the top pane of glass, but also not on any pane of glass under it.

And when we talk about a virtual method invocation, we mean that for a method identity (a cell location in the grid) in which we can see a color circle from above, we slide out the top pane of glass and see if the cell in question has a circle on this top pane of glass, and if it does, that circle represents the behavior (the code) of that method to execute. If on the other hand, it does not have a circle in that cell, then we slide the glass back into its shelf, we pull out the next piece of glass, we examine it, and we keep repeating this process until we find the first piece of glass that has a circle in that cell.

And when we talk about a super method invocation, what we mean is that during the execution of that code, if the code refers to its super, it is referring to the next circle that we would find if we were to go back and continue pulling out those glass shelves one by one and examining them, as we were when we performed the original virtual method invocation. If in doing so, we get to the last piece of glass and we have not found that circle, we would say that there is no super. If on the other hand we find that circle on a subsequent piece of glass, then that circle represents the super -- the code of the super method to execute.

And when we talk about a method chain, we are referring to that first circle that we found for the virtual method invocation, and also its super, and also its super, until we get to the end and there are no more supers. That sequence of circles is the method chain.

That's quite a vocabulary that we have developed, but it is invaluable in designing and discussing how an object system works. More importantly, it's fundamental to understanding the next concept, because the concept we're about to describe doesn't yet exist outside of the Ecstasy language. In The Quest for Equality, we introduced a notion of equality that is unlike any that can be found today in other languages. It is neither virtual (since it based on a function, not a virtual method), nor is it static (since it is based on run-time type information), nor is it dynamic (since the type information reflects the declared compile-time type information). So what is it, then?

Equality is an example of a funky interface. It is like a interface type (a C++ pure virtual class, or a Java/C# interface) in many ways, except that we do not stand over the panes of glass and look for a method in the manner that we described above. No, a funky interface is different, because it knows which pane of glass it refers to.

First, in terms of implementing a funky interface, what it means is that the pane of glass on which the implementation occurs will (and must) contain a colored-in circle for each of the functions (not methods) of the funky interface. (By now, you must be realizing how the funky interface got its name.) So when we slide out the pane of glass for a an Orderable implementation, we will see both the equals and the compare function circles colored in.

Second, in terms of using a funky interface, that compile-time type being compared-for-order (the Orderable interface) represents the slot of the pane of glass that we will find that compare function on. But it is not necessary that every pane of glass have that function for the type to be orderable. Instead, we begin at that pane of glass, and check if it implements that funky interface, and if it does not, then we proceed to the next lower pane of glass, until we find the one that does.

Remember when we said that an implementation of a funky interface must contain a colored-in circle for every one of the functions from the funky interface? That is because a funky interface represents a tight coupling of related functions. The Orderable example above is a good one, because both the equals and the compare function use some concept, some definition of equality, and thus if one changes, we must expect that the other changes as well.

But there are a few other obvious examples, such as Hashable: If you change the definition of equals, then you must also make sure that the hash code calculation for two equal objects produces the same result for each, and vice versa. Thus Hashable is a funky interface with those two functions, equals and hashCode.

And since equals shows up in both Hashable and Orderable, if one is to implement those two funky interfaces, then it is clear that filling in any one of those circles on a pane of glass requires filling in them all. And this allows the compiler to detect when a higher pane of glass attempts to fill in just some subset of those circles, which would naturally lead to errors in the running program. In other words, derived types must continue to respect the contracts from the funky interfaces of the base types.

Because to do otherwise would be a pane.

If it Quacks

2019-07-11T15:00:00.000-04:00

Ecstasy supports both type tests and type assertions. In most languages, a type assertion is called a cast, which may result in a compiler error or (in the case of languages with run-time type information) a run-time exception. A type assertion in Ecstasy is a run-time, type-safe operation, and importantly, can be expressed as an explicitly-left-associative operation:

String foo(Object o)
{
return o.as(String); // could throw TypeMismatch
}

Languages with run-time type information usually provide an additional, non-asserting means to test for a particular type at run-time, such as the Java instanceof binary operator, or the C# is binary operator. A type test in Ecstasy can be performed as an explicitly-left-associative operation:

String foo(Object o)
{
return o.is(String) ? o : "hello, world!";
}

Normally, an object cannot be of a certain type, unless it is explicitly declared to be of that type. For example, even if the imaginary class FakeString has all of the same properties and methods as the String class, instances of FakeString cannot be cast to a String.

Some languages do support such a thing, however. It's called duck typing, because "if it walks like a duck, and quacks like a duck, then it's a duck". An early prototype of Ecstasy had this feature for all types, and even provided a composition keyword, impersonates, to automate the composition of ducks and duck-like creatures. However, the capability did not mesh well with the design of the class-based portion of the type system, and was ultimately rejected for its unanswerable questions and potential incompatibilities.

Because duck typing is so useful, especially when working across the boundaries of loosely-coupled modules, one aspect of duck typing was explicitly retained: The ability to duck-type an interface. In many ways, an Ecstasy interface is simply a named type, i.e. a type plus a name. (This is not strictly true, but for this conversation, it will suffice.) And thus, in Ecstasy, one can make a Gosling Duck:

interface Duck
    {
    void waddle();
    void quack();
    }

class Gosling
    {
    void waddle() {}
    void quack() {}
    }

Duck foo(Gosling james)
    {
    return james;
    }

(No ducks were harmed in the making of this blog entry.)

Literally awesome!

2019-07-10T18:15:00.000-04:00

Ecstasy supports a rich set of literals -- too much to cover in a single post, so consider this the first installment. It's important to lay out, up front, why a language supports literals, and what its goals are for the design in doing so.

First, when building a language, literals are often terminal constructs, such that other things in the language can be composed of them.

Second, a literal allows an efficient, human-readable encoding of information. For example, for most of us, it is far easier to read the number 42 than something like:

new Byte(false, false, true, false, true, false, true, false)

Ecstasy's design goals for literals are fairly straight-forward:

Common constant types supported by the core runtime library should have a literal form. Examples include: Bits, nibbles, bytes, binary strings, integers, characters, character strings, dates, times, date/times, time durations, etc.
Common complex types supported by the core runtime library should have a literal form. Examples include: Tuples, arrays, lists, sets, and maps (aka directories).
Literal formats should emphasize readability, and the formats should be fairly obvious to a programmer.
It should be easy to work with literal formats using only a text editor.
Literals should make common programming tasks simpler, where possible.

Integers
A "whole number", or an integer, starts with an optional sign, followed by an optional radix indicator (such as "0b" for binary, "0o" for octal, or "0x" for hex), followed by the digits of the appropriate radix, with optional underscores between digits to separate digits as desired. The BNF is in the language specification, but the simple explanation above should suffice. Here are some examples:

0
-1
42
0xFF
0b10_1010_1010_1010_1010
12345678901234567890123456789012345678901234567890

So, what is the type of each of the above? A 32-bit "int"? A 64-bit "int"? No. Each of the above is an IntLiteral, a const class. Just think of IntLiteral as an object that has a good idea how to look on the screen, and simultaneously knows what values of various numeric types it can represent. The benefits are fairly obvious, in terms of support for arbitrary integer sizes (without weird type casting or literal suffixes like "L"), and support for other numeric types whose range may be far beyond the range of any arbitrary fixed-length integer type.

Characters
A character is a single-quoted Unicode code point, with predictable support for escapes using the backslash. If necessary to encode Unicode characters in the range up to U+FFFF, the format \u1234 can be used; beyond that range, the format \U12345678 can be used. Here are some examples:

'a'
' '
'\''
'\t'

This literal type is implemented by Char, a const class.

Strings
A (character) string is a double-quote enclosed sequence of characters, supporting the same escapes as are supported for character literals. Here are some examples:

""
"Hello, world!"
"This is an example of \"quotes\" inside \"quotes\""
"Multiple\nlines\nof\ntext."

Multi-line strings are freeform, which means that character escapes are not processed; Unicode escapes, on the other hand, are supported, because they are handled by the earlier "lexer" stage of the compilation. Multi-line strings use a hard left border, defined by the "pipe" ("|") character; the first line of a multi-line string begins with a back-tick ("`") followed by a pipe. Here is an example:

String s = `|This is a test of
            |a "multiline" string
            |containing | and \ and ` and ' and " etc.
            ; // <--- look at this

Like an end-of-line comment, the multi-line string takes everything from the pipe to the end of the line, as-is, which is why the semicolon in the example above has to be placed on the following line.

A template allows a string to be formed dynamically from any valid expression. The format of the template string is the same as a normal string, except prefixed by the dollar sign ("$"); expressions inside the string are prefixed by dollar-sign + open-curly ("${") and suffixed by close-curly ("}"). Here are a few examples:

$"Hello, ${name}!"
$"2 + 2 = ${2 + 2}."
$"Finished in ${timer.elapsed.milliseconds}ms."
$"Finished in ${{timer.stop(); return timer.elapsed;}}"

Templates are handy, and making up good examples is challenging, but we already use templates all over the place. The last example is quite interesting, in that it shows a statement expression (syntactically, a lambda body) inside of the template expression.

Templates can also be used with multi-line strings, which is denoted by using a dollar sign instead of the opening back-tick:

String s = $|# TOML doc
            |[name]
            |first = "{person.firstname}"
            |last = "{person.lastname}"
            ;

Finally, if the string you need to glue into your code is too big and ugly to put into the source file, then don't. Just stick it in its own file in the same directory; for example, in a file named "ugly.txt":

String s = $./ugly.txt;

Yeah. That was easy.

The literal type for all of these forms of string is implemented by String, a const class.

Arrays
An array literal is a square-bracket enclosed list of values.

Here are some examples:

[]
['a', 'b', 'c']
[1, 2, 3]

This literal type is implemented by the Array class, which is variably mutable: Array literals are either Persistent (if they contain any values that are not compile-time constants) or Constant (if they contain only compile-time constants).

Summary
This was just a brief introduction to literals in Ecstasy. Each of these literal forms has many more rules than we covered here, but those rules are there to allow for more expression (readability) in the source code, and not to restrict it. The forms for these literals are designed to make it super easy to write and very pleasant to read.

The rules do make the lexer and the parser more complex, but we look at it this way: The compiler only has to get written twice (one prototype to bootstrap the language, and then the real one written in natural Ecstasy code), so no matter how much work it is to make the language easier to use, we get to amortize that cost across many, many users over many, many years.

Literally.

Null is no Exception

2019-07-09T13:00:00.001-04:00

Since Tony Hoare formalized the concept of "null" in 1965, we have lived with an entire family of languages (including C, C++, Java, and C#) that made it possible for a pointer to contain a purposefully illegal address, for the purpose of representing the lack of a value. That purposefully illegal address is called null. (Or NULL, NIL, nil, etc.)

The reasoning was simple: Without using any additional memory, a pointer could be made to serve two purposes: First, to indicate whether or not a value exists, and secondly, what that value is if and only if it exists.

As type systems advanced, such that pointers became type-safe references instead of arbitrary integers, it became necessary to represent null as a typed value. To make it possible to assign the value null to any reference, these type systems made null a sub-type of all other types; otherwise, null would not be assignment compatible with any type other than the null type itself.

The use of null in all of these languages had an unfortunate side effect: Because null could be assigned to any type, it logically followed that each and every value might be null. That means that every single access to a value requires a null check, which in turn generates an exception in languages like Java (NullPointerException) and C# (NullReferenceException). In C, such code just segfaults (aka "Access Violation" in Windows) and core-dumps. Yay!

To avoid segfaults and exceptions, it became necessary to sprinkle code with lots of these:

if (s != null)
    {
    ...
    }

(Yay!)

There is an elegant solution to this ugliness, which is to make null into its own normal type, and not some magical "subclass of all classes" class, or "subtype of all types" type. In other words, simply by making the null value into an object reference of some normal class, it prevents that reference from being assigned willy-nilly to references of any other random type.

The complete code for the Nullable type (found in Ecstasy's module.x) is:

enum Nullable { Null }

That one line of code declares an enumeration class, called Nullable, with one value, called Null.

(Advanced: From an inheritance point of view, Null extends Nullable implements Object. From a composition point-of-view, as an enum, the class for Nullable incorporates the Enumeration mixin while the Nullable class itself is an abstract enum, and Null is an enum value. An enum value is a singleton const, which automatically implements both the Enum and Const interfaces. See source files: module.x, Const.x, Enum.x, Class.x, and Enumeration.x.)

This approach introduces some new requirements for the language's type system. First, a type system must be able to represent composite types, such as intersection types, union types, and difference types. Ecstasy represents intersection types with the "or" ("|") operator, because the code "(A|B)" reads "either type A or type B", which means that only the intersection of those two types can be assumed. (Apologies to any mathematicians reading this, but the "U" on our keyboard was stuck in the right-side-up position.)

Thus, to declare a type that can hold a value of either Nullable or String, and assign it a predictable value, one could write:

Nullable | String s = "Hello, world!";

This would quickly get old, so a short-hand notation for the "Nullable|" portion is the type-postfix "?"; here is the rewritten form of the above declaration, using the short-hand notation:

String? s = "Hello, world!";

Since the variable "s" is either a String value or a Nullable value, one can not ask it for its size:

Int len = s.size;    // compiler error!

The reason that "s" does not have a size is that its type is “Nullable or String”, and the Nullable type does not have a size property. This allows the compiler to know that the size property cannot be requested; this is an example of compile-time type safety. (Run-time type safety is exhibited by throwing a NullPointerException, etc.; Ecstasy has no equivalent to this exception, because such an exception cannot occur! "And there was much rejoicing.")

Compile-type type safety allows the compiler to know when a value might be Null. By checking if the type is a String, the compiler subsequently knows that the value cannot be Null, and specifically that the value is a String, after which it is safe to obtain the String size:

if (s.is(String))
    {
    console.println($"String s is ${s.size} characters long.");
    }

Similarly, if the code explicitly compares to the Null value, then compiler can know when the value is or is not Null. The above code could be modified slightly by first testing if the value is not Null, so that the compiler subsequently knows (by process of elimination) that the value is a String, after which it is safe to obtain the String size:

if (s != Null)
    {
    console.println($"String s is ${s.size} characters long.");
    }

The postfix "?" operator is a short-circuiting operator that performs the same not-Null test, so the above code could be written instead as:

console.println($"String s is ${s?.size} characters long.");

The short-circuiting "?" operator can be grounded using the else (":") operator. In the following example, if "a" is Null, or if "a.b" is Null, or if "a.b.c" is Null, then the result is the predictable value of "Hello, world!", otherwise the result is the value of "a.b.c":

String s = a?.b?.c? : "Hello, world!";

Ecstasy combines the postfix "?" and the else (":") operator into the elvis ("?:") operator:

String s = a ?: "Hello, world!";

The above code has the same effect as:

String s = a? : "Hello, world!";

As with most binary operators, it is possible to combine the operator with the assigment operator, such that:

x = x ?: y;

... can be rewritten using the elvis assignment operator as:

x ?:= y;

Notice the similarity in the postfix "?" operator, the else (":") operator, and the elvis operator, with the ternary operator; each of the four following lines of code has the same result:

x = x!=Null ? x : y;    
x = x? : y;    
x = x ?: y;
x ?:= y;

That is a lot to wrap one's head around, but there is a simple logic behind it.

Finally, there is a special Null-aware assignment operator that splits a nullable type, such as "String?", into a tuple of Boolean and the (conditional) non-nullable portion of the type (e.g. "String"). This can be used wherever a condition can be used, such as in an "if" or "while" statement. For example, imagine some method or function that can return a nullable string value:

String? foo();

Other than the operator, this example should seem quite familiar by now:

if (String s ?= foo())
    {
    console.println($"String s is ${s.size} characters long.");
    }

In the above example, if the function returns Null, then the result is the tuple (False), which is consumed by the if, causing the "else" branch of the if statement to be executed. Conversely, if the function returns a String value, then the result is the tuple (True, string-value), of which the (True) is consumed by the if, causing the string-value to be consumed by the assignment, and causing the "then" branch of the if statement to be executed.

Thus, it should be obvious that these two statements will have the same result:

s ?= foo();
s = foo()?;

As just another normal value, and thus without any mind-bendingly-crazy type system rules to accomodate some magical null value, Null is simultaneously less troublesome and more useful.

The null is dead. Long live the Null.

The Quest for Equality

2019-06-20T15:00:00.001-04:00

It's shocking how difficult equality is to get right in software. What does equality even mean?

Equality used to be so simple in the days of assembly language, when "bitwise equality" was all that existed, and the only things that could be compared were fixed-size CPU registers. Things got more and more complex, until we ended up with Object Oriented languages, in which each class may have its own idea of what equality means.

On top of that, the most popular languages today each have multiple forms of equality. Python offers "is" versus "==". Java and C# offer "==" and "o1.equals(o2)" -- which can easily differ from "o2.equals(o1)" because it is neither symmetric nor transitive. Javascript has both an "is" and "==", and just keeps adding equals signs when new meanings are desired; so far, they're up to "===", but anyone who denies the likelihood of a future "=====" operator is just kidding themselves.

Obviously, equality is not a simple problem. Most of the existing solutions are broken, because they cannot provide either symmetric or transitive behavior. Many of these problems are the direct result of having multiple type systems within a single language; that helps to explain why many of these languages provide two forms of equality: One for the primitive "value" type system, and one for the object type (reference-based) system.

It's obvious that in an object type system, a class needs to be able to define equality for that class, replacing any superclass definition of the same, which means that the definition of equality is virtual, as in "virtual method invocation". It's also obvious that such a definition cannot be a single dispatch method, because then "o1.equals(o2)" may provide a completely different answer from "o2.equals(o1)". There have been languages that attempt to address this type of conundrum with multiple dispatch support, but by the same token, one can swat a fly with an atomic bomb.

Generic types further add complexity to the notion of equality, because each type parameter may contribute its own potentially conflicting notion of equality to the equation.

So what's the answer?

Intent. Intent matters. The developer's intent when they write a line of code is important, but somehow that intent manages to be erased by a language compiler. (It's no mistake that the term type erasure is used to describe a generic type system that is implemented by forgetting what those types actually were.)

Ecstasy captures the developer intent by capturing the compile time type of the references in question, and retaining that information in the compiled code, and using that information for hard problems like equality.

Consider the following example:

Collection<String> c1 = foo();
Collection<String> c2 = bar();
if (c1 == c2)
    {
    // ...
    }

It doesn't matter what the actual runtime type of the object is that c1 or c2 refers to, because the developer clearly explained that they are Collections of String objects. The class of the collection is unknown here at compile time, and instead the Collection interface is used, which defines equality in a certain way. In theory, those collections could contain objects of various sub-classes of String, but when those contents are compared, they will be compared as Strings. Because that was the intent of the developer.

To accomplish this, equality is implemented non-virtually; here is the equals function on Collection:

/**
 * Two collections are equal iff they are they contain the same values.
 */
static <CompileType extends Collection>
        Boolean equals(CompileType collection1, CompileType collection2)
    {
    // they must be of the same arity
    if (collection1.size != collection2.size)
        {
        return False;
        }

    if (collection1.sortedBy() || collection2.sortedBy())
        {
        // if either is sorted, then both must be of the same order;
        // the collections were of the same arity, so the second iterator
        // shouldn't run out before the first
        Iterator<CompileType.Element> iter1 = collection1.iterator();
        Iterator<CompileType.Element> iter2 = collection2.iterator();
        for (val value1 : iter1)
            {
            assert val value2 := iter2.next();
            if (value1 != value2)
                {
                return False;
                }
            }

        // the collections were of the same arity, so the first iterator
        // shouldn't run out before the second
        assert !iter2.next();
        return True;
        }
    else
        {
        return collection1.containsAll(collection2);
        }
    }

What makes this work is that the compile-time type is passed to the equals function. Not the name of the type. Not the "type value". The type. The actual, for-real, usable-as-a-type-name-in-your-code type. A formal type. (Yes, types are objects. But they're also types.)

So how did this function get called? Well, because the compiler determined, and retained, and used, the compile-time type. Both the "==" operator and the "!=" operator ultimately cause a call to this function to occur.

And what if, instead, the developer had specified these variables as being Lists?

List<String> c1 = foo();
List<String> c2 = bar();
if (c1 == c2)
    {
    // ...
    }

Again, it doesn't matter what the actual runtime types of those object are, because the developer clearly explained that they must be Lists of String, and the result of the "==" operator is a call to List:

/**
* Two lists are equal iff they are of the same size, and
* they contain the same values, in the same order.
*/
static <CompileType extends List> Boolean equals(CompileType a1, CompileType a2)
    {
    Int c = a1.size;
    if (c != a2.size)
        {
        return False;
        }

    for (Int i = 0; i < c; ++i)
        {
        if (a1[i] != a2[i])
            {
            return False;
            }
        }

    return True;
    }

Even more obvious is that the equals functions themselves simply use the "==" and "!=" operators on the values with the Collections or Lists, because the compile time type of those contents has not been erased, and was encoded into the compiled code, and was used in the call to the Collection or List equals function. And this works for arbitrarily deep nesting of generic types, because turtles.

And it's type safe.
And it's symmetric.
And it's transitive.
And it's readable.
And it's obvious.
And it's correct.

One more thing. What if you really want to know if two objects are actually "the same object"? In Ecstasy, as described previously, the reference (Ref) to an object contains the type and the identity of the object that is referred to. Sameness refers to that identity, and the equals function on Ref tests for the equality of that identity:

List<String> c1 = foo();
List<String> c2 = bar();
if (&c1 == &c2)
    {
    // ...
    }

... which is exactly what the Object.equals function does itself:

static <CompileType extends Object>
        Boolean equals(CompileType o1, CompileType o2)
    {
    return &o1 == &o2;
    }

A pointed paean for C

2019-06-18T13:00:00.000-04:00

Less than 20% of programmers claim to use C today, and the real number of programmers actually using C is likely far smaller, but C lives on quite pervasively in its influence on C++, Java, C#, and other languages. Starting with Java, most of the "managed runtime" languages purposefully omitted one of the most powerful features in C: The pointer. In its place, these languages provide a concept called a reference, which is a type-safe pointer whose value (a memory address) cannot be obtained or manipulated by the programmer.

Two important things were lost in the process, however:

The reference itself became opaque, in that its only capability in these languages is to be de-referenced; and
Pass-by-reference is no longer possible.

Ecstasy references, on the other hand, are themselves objects (because turtles), and because references are objects, references have references (because turtles). It may make your head hurt to picture this, but in use, it becomes the most obvious and simple concept imaginable. In Ecstasy, a reference is represented by the Ref interface:

A Ref represents a reference to an Ecstasy object. In Ecstasy, "everything is an object", and the only way that one can interact with an object is through a reference to that object. The referent is the object being referred to; the reference (encapsulated in and represented by a Ref object) is the object that refers to the referent.

An Ecstasy reference is conceptually composed of two pieces of information:

A type;

An identity.

The type portion of an Ecstasy reference, represented by the actualType property of the Ref, is simply the set of operations that can be invoked against the referent and the set of properties that it contains. Regardless of the actual operations that the referent object implements, only those present in the type of the reference can be invoked through the reference. This allows references to be purposefully narrowed; an obvious example is when an object only provides a reference to its public members.

The Ref also has a RefType property, which is its type constraint. For example, when a Ref represents a compile time concept such as a variable or a property, the RefType is the compile time type of the reference. The reference may contain additional operations at runtime; the actualType is always a super-set (⊇) of the RefType.

The identity portion of an Ecstasy reference is itself unrepresentable in Ecstasy. In fact, it is this very unrepresentability that necessitates the Ref abstraction in the first place. For example, the identity may be implemented as a pointer, which points to an address in memory at which the state of the object is stored. However, that address could be located on the process' program stack, or allocated via a dynamic memory allocation, or could point into a particular element of an array or a structure that itself is located on the program stack or allocated via a dynamic memory allocation. Or the identity could be a handle, adding a layer of indirection to each of the above. Or the identity could itself be the object, as one would expect for the simplest (the most primitive) of types, such as booleans, bytes, characters, and integers.

To allow the Ecstasy runtime to provide the same behavioral guarantees regardless of how objects are allocated and managed, how they are addressed, and how house-keeping activities potentially affect all of the above, the Ref provides an opaque abstraction that hides the actual identity (and thus the actual underlying implementation) from the program and from the programmer.

Because it is impossible to represent the identity in Ecstasy, the Ref type is itself simply an interface; the actual Ref instances used for parameters, variables, properties, array elements, and so on, are provided by the runtime itself, and exposed to the running code via this interface.

Ref is read-only; the read/write form is the Var interface, which extends Ref. To obtain a Ref or a Var, we use the C address-of operator, "&":

String str = "Hello world!";

// get a read-only reference to the variable
Ref<String> ref = &str;

// alternatively, get a read/write reference to the variable
Var<String> var = &str;

// modify the variable via a reference
var.set("Goodbye, cruel world!");

// that modified the value that is held in the variable!
assert str == var.get();

// which we can also see through the read-only reference
assert ref.get() == str;

The last concept to grasp is this: Objects are of a class, but references are of a type. In most OO languages, the object's class is its type, but one takes a different route when designing a language -- like Ecstasy -- to build portable, containerized, safe, and secure applications in the cloud, versus designing a language -- like C -- to build an operating system.

This concept is unusual coming from the C++ (vtable-based, compile-time types only) family of languages, but -- very importantly! -- this concept does not create any additional cognitive load for the application developer. What it does allow, though, is for a systems developer to dynamically and securely reduce the surface area of an object when sharing that object across a container boundary.

An Introduction to the Ecstasy Type System

2019-06-17T09:00:00.000-04:00

Ecstasy's type system is designed to be an obvious and predictable type system:

Ecstasy has a single type system, which is an object type system, and that's it. In other words, Ecstasy only has objects.
Unlike C# or Java, there is no secondary primitive type system with types like "int" or "boolean" that everything else has to be constructed out of. In Ecstasy, all types are built out of other Ecstasy types. For someone who has used Smalltalk, this concept should be familiar.
There is a single root called Object. For someone who has used C# or Java, this should seem quite familiar, with one little twist: In Ecstasy, Object is an interface, not a class.
The Ecstasy type system is called the Turtles Type System, because the entire type system is bootstrapped on itself, and -- lacking primitives -- solely on itself. An Int, for example, is built out of an Array of Bit, and a Bit is built out of an IntLiteral (i.e. 0 or 1), which is built out of a String, which is an Array of Char, and a Char is built out of an Int. Thus, an Int is built out of many Ints. It's turtles, the whole way down.
The type system is fully generic, and fully reified. This means that a List-of-Int is actually a List-of-Int, and not just a List with some compile-time syntactic sugar.
The type system is covariant. This means that an Array-of-Int is a List-of-Int is a List-of-Number is a List-of-Object is a List is an Object.
The type system is module-based, transitively closed, type-checked, and type safe.
Most type safety checks are performed by the compiler and re-checked by the link-time verifier. Most of the remaining type safety checks are performed by the linker when it transitively closes over the type system. The remaining type safety checks are performed at runtime, for the specific cases in which the types cannot be fully known until runtime.
A class has a type, but a class is not a type. (A class may have many types, but minimally its existence implies at least four types: structural, private, protected, and public.)
The type system explicitly supports actual immutability.

In software, making something simple is the hardest thing that we do. The Ecstasy Turtles Type System is proof of that, but the benefits are worth it. Using the type system is incredibly easy, and reading the code is a joy.

Conditional Methods

2019-06-13T20:00:00.001-04:00

There's a common pattern in software libraries, which is the conditional result. For example, a language may have an iterator type, something that has a method that returns a type T:

T next();

Now, if there is no "next item" to return, the iterator could return some other value, like -1 or null. Of course, in some cases, -1 and null are perfectly valid values, so this design turns out to be a fairly poor one.

Instead, the iterator type may have to represent the "no next item" as a separate method, which must be called separately:

interface Iterator<T> { // note the ugly brace placement
boolean hasNext();
T next();
}

Now the caller can iterate like:

while (iter.hasNext()) {
T value = iter.next();
}

This could be simplified if a language supported more than one return value, like most languages support more than one parameter to a method:

(boolean, T) next();

The problem is what to return as a value when the boolean value is false. You could return a null value, but the cost to doing this is one billion dollars (payable to Tony Hoare).

Ecstasy addresses this common challenge by using conditional return values. For example:

/**
 * An iterator over a sequence of elements.
 */
interface Iterator<Element>
    {
    /**
     * Get a next element.
     *
     * @return a tuple of (true, nextValue) or (false) if no elements are available
     */
    conditional Element next();
    }

The next() method is allowed to return True and an element, or False if there is no element. Consuming a conditional method is incredibly simple; here's a helper methods on the Iterator interface itself:

/**
 * Perform the specified action for all remaining elements in the iterator, allowing for
 * a possibility to stop the iteration at any time.
 *
 * @param process  an action to perform on each element; if the action returns true, the
 *                 iterator is considered "short-circuited", the method returns immediately
 *                 and no more elements are iterated over
 *
 * @return true iff the iterator was short-circuited; otherwise false if the iteration
 *         completed without short-circuiting
 */
Boolean untilAny(function Boolean process(Element))
    {
    while (Element value := next())
        {
        if (process(value))
            {
            return True;
            }
        }
    return False;
    }

There's one other thing, the underlying reason that the next() method can't return a "null", and that is because Ecstasy corrects the billion dollar mistake:

/**
 * The Nullable type is the only type that can contain the value Null.
 *
 * Nullable is an Enumeration whose only value is the singleton enum value {@code Null}.
 */
enum Nullable { Null }

That one line is the actual code that defines the concept of Null in Ecstasy. And here's what it means:

Object   o = Null;    // ok ... Null is an Object
String   s = Null;    // error
Int      i = Null;    // error
Object[] a = Null;    // error

In other words, Null is an object just like "Hello world!" is an object, and all of the type rules apply to Null just like they apply to any other object. (Among other benefits, you can kiss those NullPointerExceptions goodbye.)

It really is that simple.

How to assert yourself more

2019-06-12T19:30:00.000-04:00

Assertions in software are valuable. Coupled with automated testing and continuous integration (CI), assertions help to harden software, and in doing so, provide an early warning system for breaking changes.

But different assertions address different types of requirements. For example, some assertions should only be run while testing, because they are simply too expensive to be used in production. Other assertions are actually runtime checks, so they must always run, and must not be disabled. Lastly, assertions should be easy to write, even easier to read -- and when they fail, they should provide the necessary information to help a developer track down and understand the cause of the failure.

The first thing to know about Ecstasy assertions is that they are checked at runtime, "in production". For example, if it is illegal to proceed using a negative number, then this assertion would check and prevent that condition:

assert n.sign != Negative;

Conceptually, that assertion is similar to (but simpler than) writing code like:

if (n.sign == Negative)
{
throw new IllegalState(
$"Assertion failed: n.sign != Negative, n={n}");
}

Which would throw an exception with a text message like:

Assertion failed: n.sign != Negative, n=-1

An assertion handles all of that complexity automatically on behalf of the developer. Having detailed information -- the inputs to the assertion -- show up automatically when the assertion fails is invaluable for debugging a failure after the fact -- especially when a problem is not easily reproducible!

Assertions can also be fine-tuned to throw an appropriate exception:

assert throws IllegalState
assert:arg throws IllegalArgument
assert:bounds throws OutOfBounds
assert:TODO throws UnsupportedOperation

The syntax for an assertion condition is the same syntax used in an if or while statement, so conditional declarations and assignments are naturally supported:

// the iterator must have at least one item left
assert String s := iterator.next();

A condition isn't even necessary if the assertion is being used to indicate that somehow execution reached some place that should have been impossible:

assert; // just use "assert;" instead of "assert False;"

In each case, the syntax is intended to clearly convey the intent of the developer, with as little excess as possible for the reader of the code to ingest.

And what if the intent of the developer is to only have the assertion execute during testing, and not when the software is running "in production"? Fortunately, Ecstasy provides a simple way to enable an assertion only when testing:

assert:test checkReferentialIntegrity();

(Or only when debugging, by using assert:debug to trigger a breakpoint when the assertion fails.)

Similarly, if something is important to verify at runtime, but only needs to be checked the first time that the code is executed, then assert:once can be used:

assert:once configLoaded;

Lastly, if an assertion needs to be occasionally tested at runtime, but is too expensive to check every time -- such as when the assertion occurs inside a performance-critical loop -- then a sampling-based assertion may be appropriate:

assert:rnd(100) checkConsistency(); // on average, 1 in 100 times

But at the end of the day, the advanced options are just that: Advanced, and optional.

Just the way that they should be.

On Hierarchical Organization

2019-04-29T00:00:00.000-04:00

Many a software developer has referenced this saying:

When the only tool that you have is a hammer, every problem begins to look like a nail.

That is not to imply that a hammer is not useful. If there is one conceptual hammer that – more so than any other – has repeatedly proven its merit for managing – and hiding – complexity, that would be the concept of hierarchy. File systems are hierarchies. B*Trees and binary trees and Patricia tries and parse trees are hierarchies. Most documents are internally organized as hierarchies, including the common HTML, XML, and JSON formats. Most graphical user interfaces are modeled as hierarchies. Many programming languages leverage hierarchy to provide nesting, information hiding, scoping, and identity. How is it that such a simple concept can be so universally useful?

First of all, hierarchical organization enables very simple navigation. What this means is that at any arbitrary point – called a node – in a hierarchy, there is a well-known set of operations that are possible, such as navigating from the current node to its parent node, and navigating from the current node to any of its child nodes. If a node does not have a parent, then it is the root node, and if a node does not have any child nodes, then it is a leaf node.

Child nodes are contained within their parent node. Each child is uniquely identifiable by its parent, for example by a name or some other unique attribute. A hierarchy is recursive; at any point in the hierarchy, from that point down is itself a hierarchy. Since a hierarchy has a single root and is recursive, each node in the hierarchy is uniquely identifiable in the hierarchy by combining the identities of each successive node starting with the root and proceeding down to the node; this identity is the absolute path to that node. It is possible to navigate between any two nodes in the same hierarchy by combining zero or more child-to-parent navigations with zero or more uniquely identifiable parent-to-child navigations; the sequence of these steps is a relative path between two nodes.

These basic attributes of a hierarchy combine in amazingly powerful ways. For example, since each node is itself the beginning of a hierarchy of any size, it is possible to refer to that entire sub-hierarchy simply by referring to that one particular node; this effectively hides the recursive complexity contained – or nested – within that node. As a result, it is possible to add, copy, move, or remove a hierarchy of any size simply by adding, copying, moving, or removing the node that is the “root” of that hierarchy.

Using a hierarchy, it is incredibly simple to construct the concept of scope. For example, a scope could include only a specific node, or it could include a specific node and all of its child nodes recursively to its descendent leaf nodes, or it could include a specific node and its ancestor nodes to the root node, or any other combination of inclusion and exclusion that could be described in an unambiguous manner.

These concepts are incredibly simple, yet at the same time incredibly powerful, and are leveraged liberally throughout the XVM, from managing and hiding complexity for developers, to managing memory in an execution context.

Explicit Intent: Enumerating the Priorities of Design

2019-04-16T13:26:00.001-04:00

All designs have priorities, but only some designs begin with the end in mind. When priorities are not explicitly stated, it is easy to chase the priorities that most effectively combine compulsive emotional appeal with apparent ease of implementation, in lieu of the priorities that are most valuable to the intended audience of a design. In order to create a coherent design that serves the intended audience, the Ecstasy specification began with a conscious discussion about priorities, and a series of explicit decisions as to what those priorities would be:

Correctness, aka Predictability. The behavior of a language must be obvious, correct, and predictable. This also incorporates The Principle of Least Surprise.
Security. While generally not a priority for language design, it is self-evident that security is not something that one adds to a system; security is either in the foundation and the substrate of a system, or it does not exist. Specifically, a language must not make possible the access to (or even make detectable the existence of) any resource that is not explicitly granted to the running software.
Composability. High-level computer languages are about composition. Specifically, a language should enable a developer to locate each piece of design and logic in its one best and natural place.
Readability. Code is written once, and referenced many times. What we call “code” should be a thing of beauty, where form follows function.
Lastly, a language must be recursive in its design. There is no other mechanism that predictably folds in complexity and naturally enables encapsulation. It’s turtles, the whole way down.

On Predictability vs. Performance

In the course of researching language design, one preterdominant concern emerges: That of execution performance. While many different concerns are evaluated and balanced against each other in a typical design, and while the goal of performance is often explicitly ranked in a position of lesser importance, in reality there is no one goal more considered, discussed, and pursued. Regardless of the importance assigned to performance as a language goal, performance to the language designer is a flame to a moth.

Perhaps it is simply best to admit, up front, that no language can survive – let alone succeed – without amazing feats of performance. Yet performance as a goal tends to conflict with other goals, particularly with respect to manageability, serviceability, and the quality of the abstractions that are presented to the developer.

Beginning programmers often ask: “Is language A faster than B?” After all, no one wants to be using a slow language, any more than someone would want to buy a slow automobile or a slow computer. Speed is a thrill, and speed in software execution holds no less attraction than a souped-up hot rod with a throaty growl and a body-rumbling ride.

The corollary to that question is “Why is language B slower than A?” The answers to this question tend to be very illuminating. Take any two languages that compile to and execute as native machine code, and compare their performance for a given task. Despite running on the same hardware, and using the same hardware instruction set, one may be dramatically slower than the other, by a factor of 10%, or 100% (half as fast), or even 1000% (an order of magnitude slower). How is such a difference possible?

The answer lies in translation, specifically in the automated translation of idioms. A language such as C selected idioms that closely represented the underlying machine instructions of the day, which allowed programmers to write simple programs that could be almost transliterated from C to machine code. In other words, the language chose as its abstractions the same set of abstractions that the CPU designers were using, or abstractions that were at most one translation removed from the machine’s abstractions. This allowed for very simple compilers, and made it extremely simple to support localized optimization.

A localized optimization is an optimization in the compiled code that can be made using only information about code that is most local to the code that is being optimized; in other words, information outside of the scope of the small amount of code being optimized is not even considered. Many optimizations in the C language, for example, are extremely local, such that they can be performed without any information about code outside of a particular expression or statement; it is hard to imagine more localized optimizations.

However, there is a trade-off implicit in achieving such simple and direct optimizations: The abstractions provided by the language are constrained by the abstractions provided by the CPU. As one could rightfully surmise, hardware abstractions tend not to be very abstract, and the abstractions provided by hardware instruction sets tend to be only slightly better. In its early days, the C language was jokingly referred to as “assembly with macros”, because as a language, it was only slightly higher level than assembly itself.

Computing efficiency is often stated in terms of a tradeoff between time (CPU) and space (memory); one can often utilize one in order to optimize for the other, subject to the law of diminishing returns. Unfortunately, there is no single simple metric that captures computing efficiency, but if the trade-off between time and space appeared as a graph with time on one axis and space on the other, it might generally resemble the shape of the curve y=1/x, which most closely approaches the origin at (1,1). If there were a single computing efficiency measurement for a programming language, it could arguably be represented by the closest that this trade-off curve approaches the origin (0,0), which distance could be considered the minimum weighted resource cost. To calculate a language’s efficiency for a particular purpose, one would calculate the inverse of the minimum weighted resource cost.

While one can easily speak about efficiency in hypothetical terms, a benchmark is a wonderful servant but a terrible master. The path chosen in the design of the XVM is to consciously avoid limits on potential efficiency by consciously avoiding contracts whose costs are not clearly defined and understood. This approach can be explained in the inverse, by way of example in existing languages and systems: Often, features and capabilities that were considered to be “easy” implementation-wise and “free” efficiency-wise when they were introduced, ultimately emerged as the nemesis to efficiency, due to the inflexibility of the programming contracts that these features and capabilities introduced[1].

To understand this, it is important to think of abstractions as a two-sided coin: On one side, we see the benefit of the abstraction, which allows a programmer to work with ever-larger and more powerful building blocks, while the other side of the coin represents the cost of the abstraction, which is called the contract. Imagine, for example, having to build something in the physical world, out of actual matter. One could conceivably build a structure out of atoms themselves, assembling the necessary molecules and arranging them carefully into perfectly formed crystalline structures. The contracts of the various elements are fairly well understood, but yet we wouldn’t try to build a refrigerator of out individual atoms.

One could imagine that building from individual atoms is the equivalent of building software from individual machine instructions, in two different ways: First, that the refrigerator is composed of atoms, just like all software is executed at some level as machine instructions; and second, that as the size and complexity of the constituent components increase, the minutiae of the contracts of the sub-components must not be surfaced in the contracts of the resulting components – those details must be hidable! This purposeful prevention of the surfacing of minutiae is called encapsulation, and exists as one of the cornerstones of software design. It is why one can use a refrigerator without knowing the number of turns of wire in the cooling pump’s motor, and why one can use a software library without worrying about its impact on the FLAGS register of the CPU.

Ultimately, it is the recursive composition of software that creates challenges for optimization. While low level optimizations are focused on the creation of more efficient low level code, higher level optimizations rely on explicit knowledge of what portions of a component’s contract – or the contracts of the various sub-components – can be safely ignored. In other words, the optimizer must be able to identify which contractual effects are ignored or discarded by the programmer, and then leverage that information to find alternative execution solutions whose contracts manage to cover at least the non-discarded and non-ignored contract requirements. Higher-level optimizations target the elimination of entire aspects of carefully defined behavioral contracts, and as a result, they typically require extensive information from across the entire software system; in other words, high-level optimizations tend to be non-localized to the extreme! No software has been more instrumental in illustrating this concept than Java’s Hotspot virtual machine, whose capabilities include the inlining of potentially polymorphic code by the determination that the potential for dynamic dispatch is precluded, and the elimination of specific memory barriers in multi-threaded programs as the result of escape analysis.

To enable these types of future optimizations, the contracts of the system’s building blocks must be explicit, predictable, and purposefully constrained, which is what was meant by the goal of “consciously avoiding contracts whose costs are not clearly defined and understood.” The contracts in the small must be encapsulatable in the large, which is to say that contracts must be composable in such a way that side-effects are not inadvertently exposed. It has been posited[2] that “all non-trivial abstractions, to some degree, are leaky,” but each such leak is eventually and necessarily realized as a limit to systemic efficiency.

[1] Such contracts are software legacy, meaning that the contracts cannot be altered without disclaiming the past; in other words, altering the contracts will break everything.

[2] https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/