2021/09/05

How to use the Ecstasy Debugger

Until recently, debugging Ecstasy code was an exercise in frustration, because there was no debugger. One could step through the interpreter's operations in a Java debugger, which was not for the faint of heart. The alternative, not surprisingly, was to sprinkle one's code with print statements, such as:

@Inject Console console;
console.println($"The value of i={i}");

Need to see another variable? Change the code, recompile, and run again. Repeat until everything works.

This is, of course, how all programmers used to debug code, back between the invention of the printing press and the invention of the interactive debugger.

So it should come as no surprise that Ecstasy now has an interactive debugger. Albeit, currently prototyped in text mode.

To enable the debugger, add a single line to your test:

assert:debug;

 Or to enable it on a particular condition:

assert:debug i != 3;

That will pop you into the interactive debugger mode:

Call stack frames:              | Variables and watches:
--------------------------------|---------------------------------
>0 TestSimple.run():10 | 0 this = (TestSimple.test.org)
1 Service TestSimple.test.org | 1 i = (Int) 3

Now you can type '?' for help, which displays all of the debugger commands, including breakpoint management, code stepping (in/out/thru), watches, and so on.

And yes, it is text mode, so it looks like some legit UNIX tool from 1995. Need a GUI? We'll need to finish project X-wing to build that! 🤣

2021/08/15

It's the little things ...

Ecstasy compiles to an Intermediate Representation (IR), which is a binary format intended to carry a rich set of information that can be used by a compiler back-end to generate the actual native code that will run on a machine's CPU. In the language and compiler field, Intermediate Representation is also know as: Intermediate Language (IL), byte code, bit code, p-code, and several other names.

Ecstasy IR was designed to be extremely tight (small), but it was also designed to support extremely large projects. To that end, all of the operators and addressing use 64 bit values; for example, the body of function can be 2^63-1 bytes long, and so the address (in either a relative or absolute form) specified by any jump operation must be able to encode 64-bit address values.

These two requirements are in natural conflict. The first says "keep it small", while the latter says "make it large". And this problem applies to many things besides jump addresses: register numbers, parameter counts, references into the constant pool, and so on. We recognized that solving this conflict in a uniform and efficient manner, and doing so up front, would be critical in keeping the design simple and consistent.

Using a byte-by-byte format, such as UTF-8, Portable Object Format (POF) encoded integers, or LEB-128 was deemed to be too computationally inefficient, in that the loop condition for the variable length encoding is re-evaluated within a tight loop (once per byte), completely trashing the CPU's branch predictor, and almost certainly killing instruction pipelining and speculative execution. Since these integers are used quite literally everywhere, a better encoding scheme was required.

Since a variable length encoding would be used, the resulting format was carefully designed to make the exact length of the entire value apparent from examining only the first byte, allowing all of the branch prediction penalties and possibilities for pipeline stalls to be lumped together at the front edge of the decoding process. Additionally, most CPUs have the ability to read even unaligned power-of-two-length integers from memory into a register and sign extend (either as part of the read, or with an inexpensive register-based op), so the design of the format allows for this possible optimization.

The XVM Integer Packing (XIP, pronounced "zip") format employs four encoding modes:

  • Tiny: For a value in the range -64..63 (7 bits), the value can be encoded in one byte. The least significant 7 bits of the value are shifted left by 1 bit, and the 0x1 bit is set to 1. When reading a packed integer, if bit 0x1 of the first byte is 1, then it's Tiny.
  • Small: For a value in the range -4096..4095 (13 bits), the value can be encoded in two bytes. The first byte contains the value 0x2 (010) in the least significant 3 bits, and bits 8-12 of the integer in bits 3-7; the second byte contains bits 0-7 of the integer.
  • Medium: For a value in the range -1048576..1048575 (21 bits), the value can be encoded in three bytes. The first byte contains the value 0x6 (110) in the least significant 3 bits, and bits 16-20 of the integer in bits 3-7; the second byte contains bits 8-15 of the integer; the third byte contains bits 0-7 of the integer.
  • Large: For a value in the range -(2^511)..2^511-1 (up to 512 bits), a value with s significant bits can be encoded in no less than 1+max(1,(s+7)/8) bytes; let b be the selected encoding length, in bytes. The first byte contains the value 0x0 in the least significant 2 bits (00), and the least 6 significant bits of (b-2) in bits 2-7. The following (b-1) bytes contain the least significant (b-1)*8 bits of the integer. 

The Ecstasy implementation for writing XIP'd integers is found in the DataOutput class, and the implementation for reading XIP'd integers is found in the DataInput class.

The Java implementations for reading and writing XIP'd integers can be found in the PackedInteger class.

In summary: Having a consistent and efficient mechanism to encode arbitrary 64-bit integers in a byte stream is a fundamental boost for an IR designer, because they no longer worry about the ugly trade-off between "will this support big enough structures in the real world?" and "will this waste space?"

2021/06/12

What is a type?

Another Coffee Compiler Club call, another concept to explain.

What is a type?

It seems like such a simple question, until you try to answer it. What is a type?

When we designed Ecstasy, we were intent on boiling down types to some pure form, some simple atomic model of constructing everything in our programming universe. Types are, in some way, the periodic table of program elements. The protons, neutrons, and electrons of program existence. If you can't explain the basic building blocks of your universe, how can you explain how anything works?

Let's start with the conclusion: An object's type is the sum of its behavior.

Not having any real background in type theory, I have no idea whether this is an obvious truism or a nutty novel notion. I'm going to assume the latter, only because I've never used a language with a type definition like this, and also because it provides an excellent opportunity to explain the concept.

Most type systems that I've known and used are built around some combination of identity, state, and behavior. Java object types, for example, are based entirely on identity: The identity of a class is its type; the identity includes the name of the class, and its ancestors, in terms of super classes and implemented interfaces. Java types do carry detailed information about state (fields) and behavior (methods), but those details don't define the type; only the identity defines the type. The question "Is some object reference o of type T?" is never answered by what fields the object contains, nor by what methods it has, but rather by its identity, and solely by its identity.

Of course, this was quite a leap forward from the answer in C (and by extension, C++); in C, the question "Is some object reference o of type T?" always has the answer "Yes". Got a pointer? It turns out that your pointer points to whatever type you tell the compiler it points to. Is it a Cat? Is it a Dog? Is it a Car? Is it a House? The answer is always "Yes!" One must admit that C is quite an agreeable language when it comes to types. (With this in mind, it's also easy to understand how C code is responsible for so many security flaws.)

But let's drop all of these notions on the floor, and start over. Let's start with a made-up syntax for defining a type:

type
{
// things that define what the type is
}

Looks kind of like a C structure. And we often think about types in exactly this way: They have a name (identity), and they have structure (fields).

type Person
{
String firstName;
String lastName;
}

But by our definition, this is completely wrong! We claimed that a type is only the sum of its behavior, and this example has only identity and state instead. So let's fix this, temporarily, and in the ugliest manner possible:

type // if we could name it, we'd call it "Person"
{
String getFirstName();
void setFirstName(String);
String getLastName();
void setLastName(String);
}

Interesting. Ugly, but interesting. We've turned state into behavior. And we turned the identity into a comment. But let's try an experiment: We'll allow a type to have an optional identity (including type name, type parameters, and ancestor types), just to make it easier to describe the types, and we'll create a few types that we can re-use to avoid some of that awful boilerplate:

type Ref<T>
{
T get();
}

type Var<T> : Ref<T>
{
void set(T);
}

type Person
{
Var<String> firstName();
Var<String> lastName();

So there is no state, per se, and the identity exists solely as a convenience for us, the reader, but we're almost back to where we started. In fact, if we introduce a short-hand notation for a zero-parameter method that returns a Var<T>, we are back where we started:

type Person
{
String firstName; // this just means "Var<String> firstName()"
String lastName; // this just means "Var<String> lastName()"
}

We're close, but not done, because we have referenced another type, "String". And what is a "String"? It's just another type, that has to be defined in the exact same way as "Person". To define a String, it helps to have an Array. To define an Array, which has a size, it helps to have an Int64. To define an Int64, it helps to have another Array of Bit. To define a Bit, it helps to have an IntLiteral to represent a 0 or 1. And to define an IntLiteral, it helps to have a String.

In other words, the type system forms a closed loop. All types are defined from other types, and types are defined solely by their behavior.

And behavior? Behavior is simply defined as a set of named methods, each taking zero or more typed parameters, and returning zero or more typed results.

So what does this mean?

To oversimplify the conclusion, it means that a mathematician can use set theory to implement a type calculus for such a type system. Really, that's it. You know, like curing cancer, or finding the holy grail.

In Ecstasy, all types are defined like this. Even Type is a Type.

Riddle me this

It should be pretty obvious now why we refer to this as the "Turtles Type System", since it's turtles the whole way down. One of the interesting riddles we encountered early on looked something like this:

type A
{
B foo();
}

type B
{
A foo();
}

Question: What is the difference between an A and a B?

Rules

One of the interesting things with such a type system is how easy it is to construct recursive rules from it. For example, we say that a method m consumes type T if any of the following holds true:

  1. m has a parameter type declared as T;
  2. m has a parameter type that produces T;
  3. m has a return type that consumes T.

Similarly, we say that a method m produces type T if any of the following holds true:

  1. m has a return type declared as T;
  2. m has a return type that produces T;
  3. m has a parameter type that consumes T.

These rules form the basis for checking the legality of things like method variance, such as co-variance and contra-variance, which in turn allows the type system to intelligently enforce type safety.





2021/05/21

What is a Property?

On a recent Cliff Click Coffee Compiler Club call, this question came up: What exactly is an Ecstasy property? It turns out that a property is a very obvious and simple thing, yet explaining it is not so simple.

Developers have different expectations when they hear the word "property", including:

  • It's just a named field in a structure.
  • It's something that has a getter and a setter.

These are logical expectations, because in languages like C++ and Java, "object properties" are just fields in structures, and in Java, the getter and setter methods are a well-known way to expose private fields as public virtual methods.

But unfortunately, starting with this train of thought takes us in the wrong direction, so let's forget all of this historical context, and back up to the beginning: What exactly is an Ecstasy property?

First, it is important to appreciate where an Ecstasy property exists:

  • A property can be declared inside any class, including module and package classes;
  • A property can be declared inside a property;
  • A property can be declared inside a method.

That a property can exist inside a class is not unusual, but it is a bit unusual that a property can exist inside another property, or even inside of a method.

In Ecstasy, everything is an object, so it follows that a property is an object. Objects have types. So what is the type of a property? A property, like a local variable, is an instance of Ref, a reference. If the property is mutable, then (also like a local variable), it is an instance of Var, which extends Ref.

Since a property is an object, and objects are instances of a class, then what is the class of a property? The class of a property is unknowable within Ecstasy. That does not mean that the property does not have a class; it simply means that the class is not visible from within the running code. Let's take a simple example:

  class Person
{
Int age;
}

When we have an instance of Person, we can ask that object's reference for its actual class, and if it was created within the current container, it will return the Person class. But it's also possible that an object reference comes from outside of the container, in which case asking for the actual class will not return the actual class, but will instead return just the interface type through which the object can be viewed; this is the basis for container security, and is a fundamental building block of Ecstasy's strong security model.

When the Ecstasy runtime starts up, and an Ecstasy application is loaded and starts running, it is running in the outermost Ecstasy container, called "container 0", which is the container within which the application's module was loaded, and within which all other containers and objects are created, so one would think that the applications' properties would also be created within that "container 0" ... but that would be incorrect. In order for the initial application "container 0" to be created, there had to already be a Container class, and since that class comes from the core Ecstasy module, that means that the core Ecstasy module was already loaded in some container before "container 0" was created. And since it's "turtles the whole way down", it should be obvious that "container 0" is itself actually sitting on top of an infinite stack of turtles, which for purposes of keeping this short, we will simply refer to as "container -1".

"Wait ... what?!?" I can almost hear the WTFs being hurled at computer screens everywhere. But here's the simple truth: Anything outside of the container that the application is loaded within is simply unknowable. So if the application is started in something that we call "container 0", and a container always exists within a container, then we know that there must be some "container -1", if only because otherwise there couldn't be a "container 0". And just to keep this short and as-simple-as-possible, the runtime itself is that unknowable outer container, and the runtime itself is the container that loaded the Ecstasy module, and the runtime itself is the thing that knows how to "new" a class, and to automatically "new" whatever class is automatically used for each property as well. And in reality, that doesn't actually happen -- each property couldn't actually be a new object, right?

As with many things in Ecstasy, the answer is purposefully unknowable. If you ask for a property's reference, you do get back a usable object -- one that you can reflect on, pass around, store in a property somewhere, or whatever it is that you do with objects -- so obviously the property object "exists", by some definition; but like all turtles, it may not have existed before you looked at it, and it may not exist when you're not looking at it.

But here's where things get seriously cool: Since a property does have a class, we can augment that class! Of course, we don't use the extend keyword like when we sub-class (because we don't know what class to extend) ... but we can write a mixin for the property, because we do know the type to mix into! In fact, lots of functionality in Ecstasy is built by writing mixins that can be mixed into properties and local variables, such as futures, lazily calculated values, and watched values.

Furthermore, we can augment a property where we define it, as if it were a class. Here's a silly example:

  module Test
{
@Inject Console console;
Log log = new ecstasy.io.ConsoleLog(console);

void run()
{
log.add("Simple property example!");

val o = new TestClass();
for (Int i : 0..5)
{
val n = o.x;
}

o.&x.foo(); // &x gets the property, instead of de-referencing it
}

class TestClass
{
Int x
{
@Override Int get()
{
++count;
return super();
}

void foo()
{
log.add($"Someone accessed this property {count} times!");
}

private Int count;
}
}
}

 And when we run it:

++++++ Loading module: Test +++++++

Simple property example!
Someone accessed this property 6 times!

Process finished with exit code 0
So we can augment our property with code where we define the property, we can mix in predefined functionality into a property (again, it's not magic, because you could have written those mixins yourself!), and we can even modify a property's behavior on a sub-class (assuming that the property wasn't private), because the subclass' property's class implicitly extends the super-class' property's class.

Okay, that was a lot of information, but it conveys an important point: A property isn't just some field in a structure. It's a real object, with a real class, and it behaves like a real object, with a real class.

But what about the property's value? Where is it stored? The Ecstasy type system determines which properties require a field for their storage, and automatically includes those fields in the underlying structure that is defined for each class, and thus exists for each object. In other words, all of a property's state is stored in whatever class the property "rolls up" into. Here are two simple examples:

  class Example
{
Int x; // this has a field on Example:struct

Int y.get()
{
return 7; // this does not have a field
}
}
The type system has rules that determine when a field is required. The compiler uses these rules. The runtime uses these same rules. If a field is required, then the field will exist. If the field is not required, then the field will not exist.

So how is that field accessed? Well normally, we don't even think about that. If there's an object o with a property x, we just dereference the property o.x, and we never think about the field. But conceptually, the field is accessed by the last method in the call chain for the get() method on the property, so if you don't override the get() method, then accessing the property goes straight to the field.

Alternatively, sometimes it is necessary to work with an object's structure directly. A serialization library, for example, may need to access the field values to store them off, and subsequently build a new structure using that stored-off data to re-instantiate the corresponding object. Instead of trying to paste an example here, it makes more sense just to point to the JSON serialization implementation that does exactly this. While it might seem like a common thing (directly accessing a field), it turns out that the only places in the entire Ecstasy code base where fields are being accessed directly are (i) serialization implementations and (ii) tests of the compiler and runtime itself.

So, back to the initial question: What is an Ecstasy property?

  • A property has a name.
  • A property represents state with a value, of a type.
  • A property is contained within a class, a property, or a method.
  • A property is a container of classes, properties, and methods.
  • A property is itself a class.
  • A property can be customized, mixed into, inherited, and overridden.
  • Properties are virtual.






2020/12/31

Welcome to the Ecstasy Language, the first programming language built for the cloud!

This is an official blog of the Ecstasy Language open source project at xtclang.org.

Table of Contents:
Blogs:
Repository:
Twitter:
  •  @xtclang - Official Twitter account for the xtclang project.
Current Status:
  • Early developer access / prototype runtime

Email Inquiries: info at xtclang dot org

2020/01/25

Coming from Java, Part III (Singletons)


(This entry is the third installment of "Coming from Java". This is Part III; here is a link to Part II.)

There is a common pattern in Java, which is the singleton pattern. Basically, it allows you to create a single instance of a class, and then to be able to find that one same instance from anywhere in your code
public class Singleton
    {
    public static final Singleton INSTANCE = new Singleton();

    private Singleton()
        {
        // initialization stuff goes here
        }
    }
Creating a singleton in Ecstasy is accomplished by using the static keyword for the class:
static const Singleton
    {
    construct()
        {
        // initialization stuff goes here
        }
    }
There is one tiny detail, though: In Ecstasy, only const classes and service classes can be singletons, and the reason is quite fundamental to the purposes for which Ecstasy was designed.

To begin with, consider the differences between high-end computers that Java was designed for, and the high-end computers that Ecstasy is designed for. At the time that Java was initially being designed in the early 1990s, very few computers had more than one CPU -- Sun hadn't yet even released its first multiprocessor workstation! A server or a high-end workstation might have had 8MB of RAM, and -- still hard to believe! -- the IBM 370 mainframe of the day topped out at 16MB of RAM. That's megabytes -- a new notebook today has a thousand times that much memory!

Over a decade later, with dual-CPU servers now the norm, Java would get its first working memory model specification, specifying the JVM's guarantees for reads and writes occurring across multiple CPUs and among multiple threads.

Ecstasy, in contrast, was designed explicitly to take advantage of computers with potentially many thousands of cores, and with potentially many terabytes of main memory. To accomplish this, the design focused on disentangling threads from each other, and disentangling the memory -- what Java calls "the heap". The rationale is simple: In a modern computer, a single thread of code can perform on the order of 1-10 billion instructions per second, if and only if the thread does not share read/write memory with any other threads. The moment that a thread starts to use read/write memory that is being used by other threads, the performance (and the predictability of the performance) drops like a lead balloon.

To avoid the lead balloon effect, Ecstasy carves out exclusive zones of mutable memory, each with its own single conceptual thread. Each of these is called a service. An Ecstasy service can be thought of as a simple Turing Machine, or a simple von Neumann machine. And an Ecstasy service can be thought of as a boundary for mutability, because all mutation of a service's memory occurs within that service, and only immutable data can permeate that boundary. Services can communicate with other services, but that communication is conceptually asynchronous in nature, the communication is in the form of invocation, and only immutable data is exchanged.

Since a singleton is, by its nature, visible to all code running in an application, it therefore stands to reason that the singleton must be immutable -- so that it can be used by code running in any service -- or the singleton must itself be a service -- so that it can be invoked by code running in any other service.

And here is a straight-forward example in Ecstasy:
static service PageCounter
    {
    Int count = 0;
    Int hit()
        {
        return ++count;
        }
    }
Using the singleton is equally simple:
PageCounter.hit();
(Since it is a singleton, the name of the class implies the singleton instance.)

The same example can be constructed in Java, but thread safety is the responsibility of the programmer:
public class PageCounter
    {
    public static final PageCounter INSTANCE = new PageCounter();

    private PageCounter() {}
    
    private int count;
    
    synchronized public void setCount(int count)
        {
        this.count = count;
        }
    
    synchronized public int getCount()
        {
        return count;
        }
    
    synchronized public int hit()
        {
        return ++count;
        }
    }

// how to call the singleton
PageCounter.INSTANCE.hit();
There are a variety of ways to implement the counter in Java in order to make it more concurrent; for example, an atomic integer class can be used, or an atomic updater on a volatile field can be used, and so on. In Ecstasy, on the other hand, the choice of how to make the counter more concurrent is left completely up to the run-time implementation. The choice to allow the run-time to optimize this facet of execution is based on what we learned from Java's own HotSpot JVM -- which is that only the run-time has enough information to know which parts of the application would actually benefit from optimization in the first place, and which optimizations would work best, based on the actual run-time profiling information!

A few miscellaneous notes to wrap up this singular topic:
  • In Ecstasy, every module, package, and enum is a "static const" class, automatically. That means that modules and packages are all singleton objects, and every enum value is a singleton object.
  • Ecstasy does not require a memory model for explaining the order of reads and writes of mutable data among threads, because (as explained above) the Ecstasy design does not have mutable shared state among threads. (The Ecstasy design also uses services in lieu of explicit developer-managed threads, but that is a topic for another blog entry.)
  • There is no "global heap" in Ecstasy, so there is no "stop the world" garbage collection. Ecstasy is automatically garbage-collected, but each service can manage its own memory. The Ecstasy design effectively eliminates the "GC pause" problem, even for programs that use terabytes of RAM.

 

2020/01/23

Coming from Java, Part II

(This topic is large, so this entry is just the second installment. This is Part II; here is a link to Part I.)

In Object Oriented languages, objects represent the combination of related state and behavior. Java classes declare fields to hold state, and methods to provide behavior. Java fields and the methods are nested immediately within the class that contains them. This is an example of a common pattern for a Java class exposing state, stored in fields, via property accessors:
public class Person
    {
    public Person(String name, String phone)
        {
        setName(name);
        setPhone(phone);
        }
    
    private String name;
    private String phone;

    public String getName()
        {
        return name;
        }

    public void setName(String name)
        {
        assert name != null;
        this.name = name;
        }

    public String getPhone()
        {
        return phone;
        }

    public void setPhone(String phone)
        {
        this.phone = phone;
        }
    }
Ecstasy classes do not declare fields; instead, Ecstasy classes have properties that represent object state. A property is like an object, in that it can have its own nested state, and its own nested behavior. For example, to obtain the value of a property, one can invoke the get() method on the property. If the property is writable, then one can modify the value of the property by invoking the set() method on the property. (Of course, it is possible to use the simple dot notation for property access, which means that explicit calls to get() and set()are unnecessary.) Here is the above class, re-written in Ecstasy:
class Person(String name, String? phone);
You could also write it out in long-hand if you prefer; the following code compiles to the same exact result as the above code:
class Person
    {
    construct(String name, String? phone = Null)
        {
        this.name  = name;
        this.phone = phone;
        }
        
    String name;
    String? phone;
    }
It's possible in Java to make the "getter" and "setter" have different access, such as:
public String getName()
    {
    return name;
    }

private void setName(String name)
    {
    assert name != null;
    this.name = name;
    }
To accomplish this in Ecstasy, the equivalent is:
public/private String name;
The first access, "public", specifies that the property shows up in the public type as a Ref<String>; a Ref represents a read-only reference to a value. The second access, "private", specifies that the property shows up in the private type as a Var<String>; a Var represents both read and write access to the value.

Remember, though, that a property is like an object. Let's expand the Java example slightly, to validate that the name is not an empty String:
public void setName(String name)
    {
    assert name != null && name.length() > 0;
    this.name = name;
    }
In Ecstasy, the property contains a method called set(String) that we can override:
public/private String name.set(String name)
    {
    assert:arg name.size > 0;
    super(name);
    }
The above is just short-hand notation for:
public/private String name
    {
    @Override void set(String name)
        {
        assert:arg name.size > 0;
        super(name);
        }    
    }
There are a couple of important points here:
  • It's not the set() method that is private. The set() method is public, because it is part of the Var interface, as explained above.
  • Instead, the public Person type (known as Person:public or Person.PublicType) has a property that does not have a set() method, while the private Person type (known as Person:private or Person.PrivateType) has a property that does have a set() method.
  • In Java, "super" refers to the super-class. In Ecstasy, super is a reference to the function (like a function pointer) that is next in line to invoke in the virtual method's invocation chain. In other words, super is a function.
  • While it's not directly related, you can read more about the various specializations of the assert statement on this blog. The assert:arg statement produces an IllegalArgument exception if the assertion fails.
The interesting thing, though, is that we never have to deal with the field. We know it's there, because it has to be in order to hold the value, but the field doesn't have a name, we don't access it, and we don't modify it. Instead, we just call the super function for get() or set(), and at the end of that chain there is some implementation of the method (that we didn't have to write!) that accesses or stores the value for us using the field.

But what if we made it so that we could never reach the end of those method chains?
public/private String name
    {
    String get()
        {
        return "Bob";
        }

    void set(String name)
        {
        assert:arg name.size > 0;
        // do nothing with the name ... do not store it!
        }
    }
In this case, there would be no field for the name property, because it's obvious to the compiler that one is not needed!

So now it should be obvious that fields exist in Ecstasy, but that we never really have to mess around with them. Where are those fields actually held, though?

In one of the examples above, we talked about how the Person class has a public type and a private type, so you probably already guessed that the Person class also has a protected type, and you would be correct!

But the Person class has one more type: the Person:struct type. The Person:struct type has one property for each property of the Person class that needs a field. We call each property on the struct type a "field"; in Ecstasy, a field is just a property on a class' struct type.

The struct type is not user-definable. The struct type is automatically calculated by the compiler at compile-time, and by linker/loader at run-time. While the public, protected, and private types all refer to the same underlying object -- as if they were three different lenses through which you can view the same object -- the struct, on the other hand, is a separate object that is an implementation of the Struct interface.

For the purpose of this article, this is already way too much low-level information about structs, but the details are important for one reason: To understand constructors.

Constructors are weird. They live in a zone between non-existence and existence. They play by some extraordinary rules in Java, and the same is true in Ecstasy, because they fit into a zone of unknowns. Here's what a constructor looks like in Java, just to pick one at random from our own prototype compiler that was written in Java:
public StringConstant(ConstantPool pool, String sVal)
    {
    super(pool);

    assert sVal != null;
    m_sVal = sVal;
    }
First, in Java, a constructor must call either a different constructor on this class, or a constructor on the super class. Then, it is free to do other stuff, like checking parameters and initializing fields. Fields that aren't explicitly initialized are all set to their defaults, which is easy when null is a sub-class of everything.

Ecstasy is different. Not necessarily simpler or more complicated. Not necessarily better or worse. But it is different for very purposeful reasons:
  • Contruction is treated as a finite state automaton. Eliminating unknowns and improving predictability of execution is extremely important, and that is exactly what a finite state automaton does.
  • There is a period of time before the object is constructed. The developer gets complete control over that process.
  • There is a period of time after the object is constructed. The developer gets complete control over that process.
  • In between the before and the after, the developer is completely absent, and completely out of the picture for the moment of creation. During that moment, all the rules of object instantiation can be verified, and the object is created. We say that "the this becomes existent".
In that period before the object creation, there are two phases that the developer can implement:
  1. The construct(...) function(s) allows the developer to specify what information is needed to initialize the state of the object, and the developer can validate that information and initialize the structure of the object, which is the aforementioned struct.
  2. The assert() function allows the developer to collect, in one place, any assertions (or any other last-second work) that must occur before the object is created.
Here is an example of a constructor, from the Date class, which simply delegates to another constructor using the construct keword:
construct (Int year, Int month, Int day)
    {
    construct Date(calcEpochOffset(year, month, day));
    }
For both construct(...) and assert(), the this variable is the struct  -- not the object, because it has not yet been created! After that code has all completed successfully, the struct is checked by the system to make sure that all necessary fields have been assigned a value, and then the object is instantiated based on the struct. Then -- after the moment of creation, and before the newly created "this" reference is returned to the code that invoked the new operator -- one more step occurs: The corresponding finally(...) function for each previously invoked construct(...) function is executed, so that the object itself gets to see itself (and finish anything that it needs to) before being returned to the code that requested it.

Here's an example from the Array class:
protected construct(ArrayDelegate<Element> delegate)
    {
    this.delegate = delegate;
    }
finally
    {
    if (mutability == Constant)
        {
        makeImmutable();
        }
    }
In this example, the construct(...) function fills in the fields of the struct, but because the Array object does not yet exist at this point, the construct(...) function can not call the makeImmutable() method on the Array -- until the Array actually exists! And that is the purpose of the finally function -- to allow the new Array object to perform behavior that must occur as if it were part of the instantiation of the object, before that object is returned to the code that requested the object to be created.

A class can define an assert() function (with no parameters) as well. Regardless of whether any particular construct(...) function is invoked by the new operator, or by a sub-class -- and note that a sub-class is not required to invoke any construct(...) function on its super-class! -- the assert() function will be invoked before the object is created.

There are many details regarding the specific order of execution, handling of exceptions, and so on, but this post hopefully has given you a glimpse into how object structure works in Ecstasy, and how Ecstasy objects are created.

(Continue to Part III.)

2020/01/22

Coming from Java

The first question that we get from new developers working on Ecstasy is how it is similar to, and how it is different from the languages that they already know and use. One of the goals of Ecstasy was to make the language instantly accessible to programmers who already were comfortable with any of the C family of languages, such as C++, Java, and C#. We'll start by looking at one such language, Java, which is one of the most widely used languages today.

(This topic is large, so this entry is just the first installment.)

Here's the pocket translation guide from Java to Ecstasy with respect to the type system:
  • Java's type system is a combination of primitive (machine) types and class-based types, with a few "hybrid" types, such as arrays, that fit neither category. Ecstasy's type system is simply class-based; there are no primitive types.
  • Java's null type has one value, null, that is assignment compatible with any reference type. The Ecstasy Nullable enumeration defines the value Null, although the lower-case null is also supported by alias. In Ecstasy, the Null enum value is only assignable to a Nullable type, or a super-type thereof such as Object.
  • Java's boolean type has two values, true and false. The Ecstasy Boolean enumeration defines the values False and True, although the lower-case false and true are also supported by alias.
  • Java's char type is a 2-byte unsigned integer that represents a common sub-set of Unicode characters. Ecstasy's Char class represents any Unicode code-point.
  • Java's int type is a 32-bit unchecked signed integer value; Java additionally has byte, short and long types for 8-bit, 16-bit, and 64-bit unchecked signed integer values. Ecstasy provides both checked and unchecked, and both signed and unsigned implementations for 8-bit, 16-bit, 32-bit, 64-bit, 128-bit, and variable-length integers (conceptually similar to Java's BigInteger class). For example, UInt32 is a checked unsigned 32-bit integer, and @Unchecked Int128 is an unchecked signed 128-bit integer. Additionally, the alias Int maps to the 64-bit signed integer, Int64, and the alias Byte maps to the 8-bit unsigned integer, UInt8. (In Java, the byte type is signed.)
  • Java also has some proprietary support for decimal values via the BigDecimal class. Ecstasy provides standard 32-bit, 64-bit, 128-bit, and variable-length decimal value support via the Dec32, Dec64, Dec128, and VarDec classes; these are implementations of the IEEE 754-2008 standard for decimal floating point.
  • Java's float and double represent 32-bit and 64-bit IEEE 754 binary floating point values.  Ecstasy provides standard 16-bit, 32-bit, 64-bit, 128-bit, and variable-length IEEE 754 binary floating point values via the Float16, Float32, Float64, Float128, and VarFloat classes. Additionally, Ecstasy provides the ML- and AI-optimized "brain float 16"  type, via the BFloat16 class.
  • Java's primitive type system is based on a 32-bit word size. Ecstasy's does not have a primitive type system, and thus does not have a "word size", but in practice, Ecstasy defaults to using 64-bit integer, decimal, and binary floating point values.
  • In Java, the value "0" is an int. The compiler converts it, if necessary, to other types. In Ecstasy, the value "0" is an IntLiteral, which has the ability (both at compile-time and run-time) to convert to any numeric type. Unlike Java, there is no need for an "l"/"L" suffix on integers to inform the compiler that a value is a long.
  • In Java, the value "0.0" is a double. In Ecstasy, the value "0.0" is an FPLiteral, which has the ability (both at compile-time and run-time) to convert to any decimal or binary floating point type. Unlike Java, there is no need for an "f"/"F" or "d"/"D" suffix to inform the compiler that a value is a 32-bit or 64-bit value.
  • Java supports the class, enum, and interface keywords for declaring classes. Ecstasy supports these three keywords, plus: module, package, service, mixin, and typedef.
  • Classes such as Int64, Float64, Dec64, Char, and String that are used to hold constant values are implemented in Ecstasy using the const keyword instead of the class keyword. Instances of a const class are automatically made immutable as part of their construction; specifically, no reference to an object of a const class becomes visible until after the object is made immutable.
  • Ecstasy classes such as Nullable and Boolean are enumerations; enumerations are abstract classes that contain enum values, such as False and True. Enum values are singleton const classes.
  • Ecstasy module and package classes are singleton const classes, and are written like any other classes would be. Declarative modularity was recently introduced into Java via Project Jigsaw, with some similar goals. You can read more about Ecstasy modules on this blog.
  • Java does not have any language capabilities similar to a service, a mixin, or a typedef in Ecstasy. A service class provides a boundary for concurrent and/or asynchronous behavior, so it can be thought of in the same manner as a Java thread; however, an Ecstasy application may have millions of service objects, while it is unlikely that so many threads would be desirable in any language. An Ecstasy mixin provides cross-cutting functionality; in Java, some combination of boilerplate, delegation, and cut & paste would be used instead. An Ecstasy typedef is a means to provide a name to a type that itself can be expressed using the type algebra of the Ecstasy language. You can read more about class composition on this blog.
To put this into practice, consider this Java example:
package com.mycompany.myproduct.gui;

class Point
        implements Comparable<Point>
    {
    public Point(int x, int y)
        {
        this.x = x;
        this.y = y;
        }

    private final int x;
    private final int y;

    public int getX()
        {
        return x;
        }

    public int getY()
        {
        return y;
        }

    @Override
    public int hashCode()
        {
        return x ^ y;
        }

    @Override
    public boolean equals(Object obj)
        {
        if (obj instanceof Point)
            {
            Point that = (Point) obj;
            return this.x == that.x && this.y == that.y;
            }

        return false;
        }

    @Override
    public String toString()
        {
        return "Point{x=" + x + ", y=" + y + "}";
        }

    @Override
    public int compareTo(Point that)
        {
        int n = this.x - that.x;
        if (n == 0)
            {
            n = this.y - that.y;
            }
        return n;
        }
    }
And here is the corresponding Ecstasy code:
const Point(Int x, Int y);
This particular example is dramatic, because the const class declaration in Ecstasy implies automatic implementations of the Comparable, Hashable, Orderable, and Stringable interfaces. Furthermore, the parameters specified at the class level declare two properties, and a constructor.

Local variable declarations are similar, but the use of the comma as a general purpose separator (as in C) is not permitted. For example, in Java:
int a=0, b=0, c=0;
In Ecstasy, these would likely become separate declarations:
Int a=0;
Int b=0;
Int c=0;
It is also possible (and occasionally necessary) to declare and initialize multiple left-hand-side variable ("L-values"); the above example could be written as:
(Int a, Int b, Int c) = (0, 0, 0);
Note that the left-hand-side is in the form of a tuple, and the right-hand-side has a corresponding tuple type. In this form, the type of each left-hand-side variable can differ, and a type is only specified when declaring a variable. For example, if a function "foo()" exists that returns both an Int and a String, then the above-defined variable "c" and a new String variable can be assigned as follows:
(c, String d) = foo();
This introduces a dramatic difference in Ecstasy: Methods and functions can return more than one value, and those return values can be treated either as individual values, or as a tuple of values.

Furthermore, method and function parameters can also be provided either as individual values, or as a tuple of values, or as named values. Consider this example in Java that uses multiple delegating constructors:
class ErrorList
    {
    public ErrorList(int maxErrors)
        {
        this(maxErrors, false);
        }

    public ErrorList(boolean abortOnError)
        {
        this(0, abortOnError);
        }

    public Example(int max, boolean abortOnError)
        {
        this.max = max;
        this.abortOnError = abortOnError;
        // ...        
        }

    private int max;
    private boolean abortOnError;
    }
Using default parameter values, the Ecstasy equivalent of this example would not need all of those redundant constructors, each with slightly different signatures:
class ErrorList(Int max=0, Boolean abortOnError=False)
    {
    // ...    
    }
And the class could then be constructed using any combination of named parameters, as in the following example:
ErrorList errs = new ErrorList(abortOnError=True);
For the most part, though, the Ecstasy syntax is designed to maintain a high level of compatibility with Java (and C#) syntax. One area in which the syntax differs is with respect to type assertions and type tests. In Java, the type test uses the relational operator "instanceof", and the type assertion uses the C-style cast syntax, which often requires two sets of parenthesis, as in this Java code:
if (x instanceof List)
    {
    ((List) x).add(item);
    }
Ecstasy simplifies this syntax dramatically by employing the dot notation that is already so naturally used for property access and method invocation. The "is" keyword replaces "instanceof", and the "as" keyword replaces the awkward use of parenthesis for the type assertion (aka "type casting"):
if (x.is(List))
    {
    x.as(List).add(item);
    }
This approach is far easier to read, because it follows left-to-right, with no precedence concerns. Furthermore, if the compiler determines that the value "x" is not subject to concurrent modification, then type inference obviates the need for the type-assertion altogether:
if (x.is(List))
    {
    x.add(item);
    }
Operator precedence also differs slightly from Java, in order to simplify more operators into left-to-right ordering and to resolve a number of cases in which parenthesis were awkwardly required in Java:
Operator        Description             Level   Associativity       
--------------  ----------------------  -----   -------------       
&               reference-of              1                         
                                                                    
++              post-increment            2     left to right       
--              post-decrement                                      
()              invoke a method                                     
[]              access array element                                
?               conditional                                         
.               access object member                                
.new            postfix object creation                             
.as             postfix type assertion                              
.is             postfix type comparison                             
                                                                    
++              pre-increment             3     right to left       
--              pre-decrement                                       
+               unary plus                                          
-               unary minus                                         
!               logical NOT                                         
~               bitwise NOT                                         
                                                                    
?:              conditional elvis         4     right to left       
                                                                    
*               multiplicative            5     left to right       
/                                                                   
%               (modulo)                                            
/%              (divide with remainder)                             
                                                                    
+               additive                  6     left to right       
-                                                                   
                                                                    
<< >>           bitwise                   7     left to right       
>>>                                                                 
&                                                                   
^                                                                   
|                                                                   
                                                                    
..              range/interval            8     left to right       
                                                                    
<  <=           relational                9     left to right       
>  >=                                                               
<=>             order ("star-trek")                                 
                                                                    
==              equality                 10     left to right       
!=                                                                  
                                                                    
&&              conditional AND          11     left to right       
                                                                    
^^              conditional XOR          12     left to right       
||              conditional OR                                      
                                                                    
? :             conditional ternary      13     right to left       
                                                                    
:               conditional ELSE         14     right to left       
As you can see, a number of operators are grouped together, which previously each had their own precedence level; this implicitly employs left-to-right precedence for all operators within that grouping. Bitwise operators also have been moved to a significantly higher precedence level, which reduces the need for unnecessarily awkward parenthesization. Additionally, almost all operators map directly to methods, which means that explicit left-to-right behavior can be achieved by replacing a relational operator with the corresponding method invocation.

(Continue to Part II.)