2019/04/29

On Hierarchical Organization


Many a software developer has referenced this saying:
When the only tool that you have is a hammer, every problem begins to look like a nail.
That is not to imply that a hammer is not useful. If there is one conceptual hammer that – more so than any other – has repeatedly proven its merit for managing – and hiding – complexity, that would be the concept of hierarchy. File systems are hierarchies. B*Trees and binary trees and Patricia tries and parse trees are hierarchies. Most documents are internally organized as hierarchies, including the common HTML, XML, and JSON formats. Most graphical user interfaces are modeled as hierarchies. Many programming languages leverage hierarchy to provide nesting, information hiding, scoping, and identity. How is it that such a simple concept can be so universally useful?
First of all, hierarchical organization enables very simple navigation. What this means is that at any arbitrary point – called a node – in a hierarchy, there is a well-known set of operations that are possible, such as navigating from the current node to its parent node, and navigating from the current node to any of its child nodes. If a node does not have a parent, then it is the root node, and if a node does not have any child nodes, then it is a leaf node.
Child nodes are contained within their parent node. Each child is uniquely identifiable by its parent, for example by a name or some other unique attribute. A hierarchy is recursive; at any point in the hierarchy, from that point down is itself a hierarchy. Since a hierarchy has a single root and is recursive, each node in the hierarchy is uniquely identifiable in the hierarchy by combining the identities of each successive node starting with the root and proceeding down to the node; this identity is the absolute path to that node. It is possible to navigate between any two nodes in the same hierarchy by combining zero or more child-to-parent navigations with zero or more uniquely identifiable parent-to-child navigations; the sequence of these steps is a relative path between two nodes.
These basic attributes of a hierarchy combine in amazingly powerful ways. For example, since each node is itself the beginning of a hierarchy of any size, it is possible to refer to that entire sub-hierarchy simply by referring to that one particular node; this effectively hides the recursive complexity contained – or nested – within that node. As a result, it is possible to add, copy, move, or remove a hierarchy of any size simply by adding, copying, moving, or removing the node that is the “root” of that hierarchy.
Using a hierarchy, it is incredibly simple to construct the concept of scope. For example, a scope could include only a specific node, or it could include a specific node and all of its child nodes recursively to its descendent leaf nodes, or it could include a specific node and its ancestor nodes to the root node, or any other combination of inclusion and exclusion that could be described in an unambiguous manner.
These concepts are incredibly simple, yet at the same time incredibly powerful, and are leveraged liberally throughout the XVM, from managing and hiding complexity for developers, to managing memory in an execution context.

2019/04/16

Explicit Intent: Enumerating the Priorities of Design


All designs have priorities, but only some designs begin with the end in mind. When priorities are not explicitly stated, it is easy to chase the priorities that most effectively combine compulsive emotional appeal with apparent ease of implementation, in lieu of the priorities that are most valuable to the intended audience of a design. In order to create a coherent design that serves the intended audience, the Ecstasy specification began with a conscious discussion about priorities, and a series of explicit decisions as to what those priorities would be:
  1. Correctness, aka Predictability. The behavior of a language must be obvious, correct, and predictable. This also incorporates The Principle of Least Surprise.
  2. Security. While generally not a priority for language design, it is self-evident that security is not something that one adds to a system; security is either in the foundation and the substrate of a system, or it does not exist. Specifically, a language must not make possible the access to (or even make detectable the existence of) any resource that is not explicitly granted to the running software.
  3. Composability. High-level computer languages are about composition. Specifically, a language should enable a developer to locate each piece of design and logic in its one best and natural place.
  4. Readability. Code is written once, and referenced many times. What we call “code” should be a thing of beauty, where form follows function.
  5. Lastly, a language must be recursive in its design. There is no other mechanism that predictably folds in complexity and naturally enables encapsulation. It’s turtles, the whole way down.

On Predictability vs. Performance

In the course of researching language design, one preterdominant concern emerges: That of execution performance. While many different concerns are evaluated and balanced against each other in a typical design, and while the goal of performance is often explicitly ranked in a position of lesser importance, in reality there is no one goal more considered, discussed, and pursued. Regardless of the importance assigned to performance as a language goal, performance to the language designer is a flame to a moth.
Perhaps it is simply best to admit, up front, that no language can survive – let alone succeed – without amazing feats of performance. Yet performance as a goal tends to conflict with other goals, particularly with respect to manageability, serviceability, and the quality of the abstractions that are presented to the developer.
Beginning programmers often ask: “Is language A faster than B?” After all, no one wants to be using a slow language, any more than someone would want to buy a slow automobile or a slow computer. Speed is a thrill, and speed in software execution holds no less attraction than a souped-up hot rod with a throaty growl and a body-rumbling ride.
The corollary to that question is “Why is language B slower than A?” The answers to this question tend to be very illuminating. Take any two languages that compile to and execute as native machine code, and compare their performance for a given task. Despite running on the same hardware, and using the same hardware instruction set, one may be dramatically slower than the other, by a factor of 10%, or 100% (half as fast), or even 1000% (an order of magnitude slower). How is such a difference possible?
The answer lies in translation, specifically in the automated translation of idioms. A language such as C selected idioms that closely represented the underlying machine instructions of the day, which allowed programmers to write simple programs that could be almost transliterated from C to machine code. In other words, the language chose as its abstractions the same set of abstractions that the CPU designers were using, or abstractions that were at most one translation removed from the machine’s abstractions. This allowed for very simple compilers, and made it extremely simple to support localized optimization.
A localized optimization is an optimization in the compiled code that can be made using only information about code that is most local to the code that is being optimized; in other words, information outside of the scope of the small amount of code being optimized is not even considered. Many optimizations in the C language, for example, are extremely local, such that they can be performed without any information about code outside of a particular expression or statement; it is hard to imagine more localized optimizations.
However, there is a trade-off implicit in achieving such simple and direct optimizations: The abstractions provided by the language are constrained by the abstractions provided by the CPU. As one could rightfully surmise, hardware abstractions tend not to be very abstract, and the abstractions provided by hardware instruction sets tend to be only slightly better. In its early days, the C language was jokingly referred to as “assembly with macros”, because as a language, it was only slightly higher level than assembly itself.
Computing efficiency is often stated in terms of a tradeoff between time (CPU) and space (memory); one can often utilize one in order to optimize for the other, subject to the law of diminishing returns. Unfortunately, there is no single simple metric that captures computing efficiency, but if the trade-off between time and space appeared as a graph with time on one axis and space on the other, it might generally resemble the shape of the curve y=1/x, which most closely approaches the origin at (1,1). If there were a single computing efficiency measurement for a programming language, it could arguably be represented by the closest that this trade-off curve approaches the origin (0,0), which distance could be considered the minimum weighted resource cost. To calculate a language’s efficiency for a particular purpose, one would calculate the inverse of the minimum weighted resource cost.
While one can easily speak about efficiency in hypothetical terms, a benchmark is a wonderful servant but a terrible master. The path chosen in the design of the XVM is to consciously avoid limits on potential efficiency by consciously avoiding contracts whose costs are not clearly defined and understood. This approach can be explained in the inverse, by way of example in existing languages and systems: Often, features and capabilities that were considered to be “easy” implementation-wise and “free” efficiency-wise when they were introduced, ultimately emerged as the nemesis to efficiency, due to the inflexibility of the programming contracts that these features and capabilities introduced[1].
To understand this, it is important to think of abstractions as a two-sided coin: On one side, we see the benefit of the abstraction, which allows a programmer to work with ever-larger and more powerful building blocks, while the other side of the coin represents the cost of the abstraction, which is called the contract. Imagine, for example, having to build something in the physical world, out of actual matter. One could conceivably build a structure out of atoms themselves, assembling the necessary molecules and arranging them carefully into perfectly formed crystalline structures. The contracts of the various elements are fairly well understood, but yet we wouldn’t try to build a refrigerator of out individual atoms.
One could imagine that building from individual atoms is the equivalent of building software from individual machine instructions, in two different ways: First, that the refrigerator is composed of atoms, just like all software is executed at some level as machine instructions; and second, that as the size and complexity of the constituent components increase, the minutiae of the contracts of the sub-components must not be surfaced in the contracts of the resulting components – those details must be hidable! This purposeful prevention of the surfacing of minutiae is called encapsulation, and exists as one of the cornerstones of software design. It is why one can use a refrigerator without knowing the number of turns of wire in the cooling pump’s motor, and why one can use a software library without worrying about its impact on the FLAGS register of the CPU.
Ultimately, it is the recursive composition of software that creates challenges for optimization. While low level optimizations are focused on the creation of more efficient low level code, higher level optimizations rely on explicit knowledge of what portions of a component’s contract – or the contracts of the various sub-components – can be safely ignored. In other words, the optimizer must be able to identify which contractual effects are ignored or discarded by the programmer, and then leverage that information to find alternative execution solutions whose contracts manage to cover at least the non-discarded and non-ignored contract requirements. Higher-level optimizations target the elimination of entire aspects of carefully defined behavioral contracts, and as a result, they typically require extensive information from across the entire software system; in other words, high-level optimizations tend to be non-localized to the extreme! No software has been more instrumental in illustrating this concept than Java’s Hotspot virtual machine, whose capabilities include the inlining of potentially polymorphic code by the determination that the potential for dynamic dispatch is precluded, and the elimination of specific memory barriers in multi-threaded programs as the result of escape analysis.
To enable these types of future optimizations, the contracts of the system’s building blocks must be explicit, predictable, and purposefully constrained, which is what was meant by the goal of “consciously avoiding contracts whose costs are not clearly defined and understood.” The contracts in the small must be encapsulatable in the large, which is to say that contracts must be composable in such a way that side-effects are not inadvertently exposed. It has been posited[2] that “all non-trivial abstractions, to some degree, are leaky,” but each such leak is eventually and necessarily realized as a limit to systemic efficiency.


[1] Such contracts are software legacy, meaning that the contracts cannot be altered without disclaiming the past; in other words, altering the contracts will break everything.

2019/04/14

On God, Turtles, Balloons, and Sandboxes


Wikipedia defines a software sandbox as follows[1]:
In computer security, a sandbox is a security mechanism for separating running programs. It is often used to execute untested or untrusted programs or code, possibly from unverified or untrusted third parties, suppliers, users or websites, without risking harm to the host machine or operating system. A sandbox typically provides a tightly controlled set of resources for guest programs to run in, such as scratch space on disk and memory. Network access, the ability to inspect the host system or read from input devices are usually disallowed or heavily restricted.
In the sense of providing a highly controlled environment, sandboxes may be seen as a specific example of virtualization. Sandboxing is frequently used to test unverified programs that may contain a virus or other malicious code, without allowing the software to harm the host device.
In the physical world, in which children play with sand, there are two common styles of sandbox. The first is constructed from four equally sized wooden planks, each stood on its long edge to form a square box, fastened in the corners, and then filled with sand. The second style is typified by a large green plastic turtle, whose “turtle shell” is the removable top that keeps the rain out, and whose “body” is the hollow bowl that keeps the sand in. Both styles hold sand and allow a child to dig tunnels and build sand-castles, but there is one major difference: When a child tunnels too deeply in the wooden-sided sandbox, the tunnel burrows past the sand and into the soil beneath, while the tunnel depth in the turtle sandbox is strictly limited by the plastic bowl.
Software sandboxes tend to mirror these physical types, in that the dirt often lies beneath. In other words, the sandbox attempts to protect the resources of the system, but a determined programmer will eventually be able to find a way through. The only way that a language runtime as a sandbox can ensure the protection of the underlying resources of a system is for the sandbox itself to completely lack the ability to access those resources. Thus, the purpose of the sandbox is to defend against privilege escalation:
Privilege escalation is the act of exploiting a bug, design flaw or configuration oversight in an operating system or software application to gain elevated access to resources that are normally protected from an application or user. The result is that an application with more privileges than intended by the application developer or system administrator can perform unauthorized actions[2].
As a language runtime designer, it is not sufficient to simply distrust the application code itself; one must distrust the entire graph of code that is reachable by the application code, including all third party libraries, including the language's own published runtime libraries, and including any internal libraries that come with the runtime that are accessible. Or, put another way, if there is a possible attack vector that is reachable, it will eventually be exploited. To truly seal the bottom of the sandbox, it is necessary to disallow resource access through the sandbox altogether, and to enforce that limit via transitive closure.
But what good is a language that lacks the ability to work with disks, file systems, networks, and network services? Such a language would be fairly worthless. Ecstasy addresses this requirement by employing dependency injection, which is a form of inversion of control. To comprehend this, it is important to imagine the container functionality not as a sandbox, but as a balloon, and our own universe as the primordial example.
Like an inflated balloon, the universe defines both a boundary and a set of contents. The boundary is defined not so much by a location, but rather by its impermeability – much like the bottom of the green plastic turtle sandbox. In other words, the content of the universe is fixed[3], and nothing from within can escape, and nothing from without can enter. From the point of view of someone within our actual universe, such as you the reader, there is no boundary to the universe, and the universe is seemingly infinite.
However, from outside of the universe, the balloon barrier is quite observable, as is the creation and destruction of the balloon. Religiously speaking, one plays the part of God when inflating a balloon, with complete control over what goes through that one small – and controllable – opening of the balloon.
It is this opening through which dependency injection of resources can occur. When an application needs access to a file system, for example, it supplicates the future creator of its universe by enumerating its requirement as part of its application definition. These requirements are collected by the compiler and stored in the resulting application binary; any attempt to create a container for the application will require a file system resource to be provided.
And there are two ways in which such a resource can be obtained. First of all, the resource is defined by its interface, so any implementation of that interface, such as a mock file system or a fully emulated – yet completely fake! – file system would do. The second way that the resource can be obtained is for the code that is creating the container to have asked for it in the same manner – to declare a dependency on that resource, and in doing so, force its own unknown creator to provide the filing system as an answer to prayer.
As the saying goes, it’s turtles all the way down. In this case, the outermost container to be created is the root of the container hierarchy, which means that if it requires a filing system, then the language runtime must inject something that provides the interface of a filing system, and that resource that is injected might even be a representation of the actual filing system available to the operating system process that is hosting the language runtime.
And here we have a seemingly obvious contradiction: What is the difference between a language that attempts to protect resources by hiding them at the bottom of a sandbox container, versus a language that provides access to those same resources by injecting them into a container? There are several differences, but let’s start with an obvious truth: Perfection in the design of security is difficult to achieve, and even harder to prove the correctness of, so it is important to understand that this design does not itself guarantee security. Rather, this design seeks to guarantee that only one opening in the balloon – and anything that is injected through that opening – needs to be protected, and the reason is self-evident: Transitive closure. By having nothing naturally occurring in the language runtime that represents an external resource, there is simply no surface area within the language runtime – other than the injected dependencies themselves – that is attackable.
Second, the separation of interface and implementation in the XVM means that the implementation of the resource is not visible within the container into which it is injected. While this pre-introduces a number of language and runtime concepts, the container implementation only makes visible the surface area of the resource injection interface – not of the implementation! This holds true even with introspection, and furthermore the injected resources are required to be either fully immutable, or completely independent services.
Third, this design precludes the possibility of native code within an Ecstasy application; native functionality can only exist outside of the outermost container and thus outside of the language runtime itself, and can only be exposed within the language runtime via a resource injected into a container, subject to all of the constraints already discussed.
Lastly, as has been described already, the functionality that is injected is completely within the control of the injector, allowing the requested functionality to be constrained in any arbitrary manner that the injector deems appropriate.
While it is possible to introduce security bugs via injection, the purpose of this design is to minimize the scope of potential security bugs to the design of the relatively small number of interfaces that will be supported for resource injection, and to the various injectable implementations of those interfaces.

2019/04/08

Ecstasy modules

We talk about modularity and reusability as core tenets of the design of the Ecstasy language, but what does it actually mean that a language supports modularity? What does effective software reusability actually require? What are the aspects of a language that make modularity possible, and reusability simple?

It should be self-evident that no language starts off with a design plan that says: "Minimize reusability. Encourage monolithic architecture. Prevent modularity." To the contrary, most languages make claims about enabling reusability, and many languages claim to support modularity. And yet, when it comes to modularity and reusability in existing languages, the actual results seem to fall woefully short.

The Ecstasy language is built on an explicit notion of modularity, and reuse is an explicit goal of its design. Let's talk about what that translates to in the real world.

Big and Little

The first requirement that we had for modules in Ecstasy is that they had to work really well "in the small", supporting a module containing as little as a few lines of code (perhaps a single function), but also "in the large", supporting massive applications with tens of thousands of classes, and tens of millions of lines of code. From experience, attempting to solve such dramatically different extreme points on a single scale results in significant trade-offs at best, or turns out to be an altogether quixotic quest at worst.

Let's start with a minimalist example:

  module MyLittleModule {}

Strangely enough, that is an entire module. Admittedly, it doesn't do much, but it can be compiled, and it can be loaded as part of an application. Perhaps a better example would include something of use:

  module TaxCalculation.example.com
    {
    static Dec calcTax(Dec amount)
      {
      return amount * 0.05;
      }
    }

In this second example, the module TaxCalculation contains a single function, which unsurprisingly calculates some tax. Why would someone write such a module?
  • By splitting an application into more than one module, work can proceed in parallel across these multiple modules, naturally separated by module boundaries.
  • Organizational separation of responsibilities can map well to module-based development, where each module is the responsibility of a specific part of an organization.
  • Modules may be produced by different organizations altogether; by separating functionality into multiple modules, it is possible to cleanly delineate work from one organization from that of another.
  • In Ecstasy, modules represent both units-of-compilation and units-of-deployment; by separating functionality into multiple modules, updates to functionality located in a single module can be written, built, versioned, and deployed independently of other modules.
  • By allowing a module to be as absolutely small as necessary, it reduces the natural risks associated with version changes, by allowing a module to fulfill a deliberately minimum set of responsibilities.
Many of the other aspects of the design of Ecstasy modules are designed to support the complexities inherent in building, maintaining, co-existing with, and consuming large modules, but it is important to remember that a module can be as simple as a single source file, and as simple a single line of code.

A Module: A Class of its Own

In the minimalist example above, we introduced the keyword "module". In Ecstasy, a module is a form of a class. As a class, it can contain methods, properties, typedefs, functions, constants, and other classes. Additionally:
  • A module is a const class, so once its construction completes, it cannot be modified.
  • A module is a singleton, so its singleton instance can be accessed from anywhere within the module by using the simple (unqualified) name of the module.
  • A module automatically provides an implementation of the Ecstasy Module interface.
  • A module also has the ability to contain another special form of class, called a package; only modules and packages may contain packages.
In many ways, a module is just like a package, except that a module's name can be qualified to include a domain name corresponding to the organization that provides the module. By allowing a qualified name, Ecstasy naturally supports distributed module repository systems, cryptographically-secure module signatures, and name-spacing that mirrors the Internet itself.

Module Name-Spacing

Code within a module can only refer to and use names within the same module. In other words, while modules themselves may have qualified names, code within a module never uses qualified names to reference other modules.

Instead, module dependencies are mounted within the module that needs them, just like network drives can be mounted within a local file system. To mount a module within another module, the package keyword is used. For example, using the example module that we introduced above:

  module MyApp.example.com
    {
    // mount the entire module "TaxCalculation.example.com"
    // (and its hierarchical namespace) as package "taxrate"
    package taxrate import TaxCalculation.example.com;

    const Invoice
      {
      // ...      Dec subtotal;
      Dec tax.get()
        {
        return taxrate.calcTax(subtotal);
        }
      Dec total.get()
        {
        return subtotal + tax;
        }
      }
    }

(Note that while this example conveniently shows the Invoice class contained directly within the module, the Invoice class would typically be split out as its own source file. A file may contain as little as a single class, or at the other extreme, a single file would contain the entire module, even if the module is enormous. As a rule of thumb, multiple classes should only be placed into a single file if doing so improves readability and maintainability.)

Module Embedding

If a module is only being used in one application, there may be no reason to use it as a separate unit of deployment. In this case, it is easy to physically include it in the resulting module as part of the compilation process:

  package taxrate import:embedded TaxCalculation.example.com;

Module embedding allows multiple modules developed independently to be delivered as a single module.

Module Optionality

When a module is mounted as a package, it is assumed to be required; however, in some cases, the module may be optional. There are two two flavors of optional. The first indicates that the runtime should use its best efforts to obtain the module:

  package helpers import:desired HandyHelpers.example.com;

The second allows one module to indicate a supported (but not required) module; in this case, the runtime will only load that module if it is desired or required by another module that is being loaded:

  // we only support version 3 or later for this module
  package fmts import:optional Formats.example.com v:3;

By allowing module dependencies to be optional, Ecstasy modules can include support for third-party modules that may or may not be used in a particular environment. In short, while coupling itself may be unavoidable, a coupling at the source code level can exist without creating a runtime dependency.

Module Versioning

A module's version is not part of its source code; the version can be assigned to (or "stamped onto") a module after it has been built. The version information is intended to support Continuous Integration (CI) models, Software Development Life-Cycle (SDLC), and the long-term maintenance requirements of complex applications:
  • A build can be marked as a development, CI, alpha / beta pre-release, release candidate (RC), or GA release build.
  • A build can be marked as a next logical version of another version; for example, version 2 is the next logical version after version 1.
  • A build can be marked as the revision of another version; for example, version 1.1 is a revision of version 1.
  • The revision tree supports any arbitrary depth, making it possible to patch any existing version; for example, after releasing version 1.0 and version 1.1, it would be possible to release  version 1.0.1 to fix a problem with version 1 for a customer who is unwilling to adopt version 1.1, and then subsequently to release 1.0.0.1 to fix a more specific problem with version 1 for a customer who is unwilling to adopt version 1.0.1.
  • Ecstasy module versioning explicitly supports Semantic Versioning 2.0.0, and adherence to that specification is encouraged. However, Ecstasy does not require conformance to the rules of Semantic Versioning; organizations and developers are free to support a versioning model of their choice.

Module versioning allows an application to specify which versions of other modules that it requires or is supported with, in order, to any arbitrary level of complexity; consider the example:

  package json import JsonUtils.example.com v:2
      avoid v:2.0.1
      prefer v:3, v:2.1;

The syntax supports a list of clauses using the keywords allow, avoid, and prefer. The result is a rich yet comprehensible means to declare a dependency on a separately-versioned module, while retaining a level of control over the versions of the other module to use if and when necessary.

Module Packaging

Multiple versions of a module can be combined into a single module file, allowing any number of versions (including patches and pre-release versions) to be delivered in a single file. This allows an organization to provide support for older versions of a module, while continuing to deliver newer versions of the same. It also dramatically reduces the entropy related to managing repositories full of different supported versions, and is designed to enable CI automation for testing against all available supported versions of a module.

The packaging of multiple versions of a module together into a single file is very space-efficient; only the "deltas" require additional space.

All of the version metadata is included in the module file, and is available for use by build, CI, testing, repository, and deployment tools.

Conditionality

One cannot support the notions of module optionality and module versioning in a type system that employs static typing (and type safety with transitive closure), unless there exists a means to adapt to the presence or absence of a module, a version thereof, or particular components contained therein. In other words, when considering the above optionality and versioning examples, it must be possible to build a module in such a way that it still works correctly in the absence of the desired HandyHelpers.example.com module, and that it integrates with the Formats.example.com module when it is present, and that it can gracefully handle the absence of the same.

To accomplish this, Ecstasy supports link-time conditionality that can be easily expressed using existing and obvious language constructs:
  • It is possible to declare data types such that they exist conditionally;
  • It is possible to declare aspects of a data type such that they exist conditionally; and
  • It is possible to define logic that exists conditionally.
Using the Formats.example.com example from above, if that module were to provide a Formatter class, logic can conditionally take advantage of that optional component:

  String to<String>()
    {
    if (fmts.Formatter.present)
      {
      return Formatter.format(this);
      }
    return super();
    }

Unlike a pre-processor approach to conditional logic, Ecstasy compiles and checks each and every combination of conditions, ensuring that all of the language rules are always enforced, and compiling all of the potential combinations into the same compiled module file. As a result, the conditions can be evaluated at link-time, based on the exact information that is available when a set of modules of specific versions are selected to be used together.

The keyword used to test conditionality is the "if" statement, and includes:
  • "name".defined is used to test if a particular named option, such as "debug" or "test", is specified for the modules being loaded;
  • identity.present is used to test if a particular module, package, class, property, method (etc.) is available;
  • module.versionMatches( ver )  is used to test if a version of a module (or a revision thereof) is available.
A benefit of this design is that unit and functional tests, debug builds, production builds, patches, and unreleased versions may all be included in the same compiled module file. That same file can have dependencies, some of which are optional, on any number of other modules, and it can even contain any necessary work-arounds for dealing with inconsistencies in specific versions of those same modules. These are the types of things that, until now, had to be done by hand; while the inherent complexity remains, the complexity has been encapsulated in a manner that allows solutions to be automated, and for the remaining non-automatable challenges to be solvable.

(Conditionality is a powerful capability, but as a rule of thumb, its use should be minimized to the extent possible. Even with the powerful organizational and automation capabilities that have been described here, the combination of variability and complexity still exists, and will inflict tremendous pain on those who underestimate the cost of entropy.)

The Ecstasy Core Module

Ecstasy's own type system is provided as the module Ecstasy.xtclang.org, which is automatically imported into every module as the ecstasy package. The Ecstasy core module has no dependencies on other modules, but all modules have a dependency on it. It contains all of the primordial types, which represent both the basic building blocks from which new types can be formed, and the means by which the runtime interacts with application code, such as exceptions.

(The Ecstasy core module is also dependent on the Ecstasy core module, which introduces a recursive dependency. As a result, not only is the class ecstasy.text.String available inside of every module, but so is ecstasy.ecstasy.text.String and ecstasy.ecstasy.ecstasy.ecstasy.ecstasy.text.String. Not surprisingly, the type system is referred to as the Turtle Type System, because it is turtles, all the way down.)

Transitive Closure

The module is the unit of transitive closure for the type system. Everything that a module references must be present within that module, either by importing another module, or by placing the necessary components within the module itself.

When a module is built, it contains a detailed fingerprint of all of its dependencies on other modules. The fingerprint defines the exact set of modules, packages, classes, properties, methods, and so on that are used by the dependent module.

When a module is loaded, each of the modules that it depends upon are loaded along with it. The loader is responsible for selecting the version of each module that will be loaded, for selecting the combination of conditions that will apply to the modules, for resolving all of the dependencies among the various modules, and for verifying that the result is correct according to the rules defined by the XVM.

Summary

Modules can be as small as an individual component, as reusable as a library, and as complete as an entire application. Modules support a wide range of granularity, and promote both composition and reuse. Modules contain extensive information in order to support versioning, in order to define dependencies in a non-ambiguous manner, and in order to provide supporting information to various development and management tools.

2019/04/01

Signs of Spring

Here in the New England, we are getting our first taste of spring, with temperatures reaching almost 70F (21C). The birds are returning, and the grass is starting to wake from its winter nap, with a beautiful green color beginning to peek out everywhere you look, and with the wildlife busy building nests and popping out the next generation. The plants have put their winter coats away for the year, and are rushing to get their flowers out and busy, making this a gorgeous time of the year for one and all.

We've been busy, too. For the past several years, we've been designing, prototyping, validating, and documenting the first programming language designed for the cloud: the Ecstasy programming language. We're not yet ready to make it generally available, but we're getting closer every day, and now seems like the right time to start introducing it.

I'd like to start by talking about the why: Specifically, why would anyone in their right mind build a new programming language from scratch? Or, asked differently: What are the problems that Ecstasy is designed to solve that are inefficient or impossible to solve with existing languages?

This, in turn, is quite a hard question to answer in a short space, because -- as computer science explains -- that which can be performed by any one Turing complete language can also be performed using any other Turing complete language. However, we can begin to describe which challenges that Ecstasy is designed to address, and how it was designed to solve them.

Scalable

There are a lot of meanings of the term scale as it is used in computer science and software. In the general sense, a scalable solution is one that works well in the relatively-small, and it continues to work well in the relatively-large. For a language to be considered scalable, it must work well for a small throw-away project -- perhaps even only a few lines of code! -- yet it must also be able to support the needs and processes of a large team building and maintaining a huge code base over a period of years or decades. From a runtime perspective, a scalable language must work well in a constrained device, yet it must also be able to efficiently utilize the capabilities of a large server cluster.

Ecstasy is a strongly typed language, and uses a modular type system with verified transitive closure. What this means in practice is that Ecstasy's design minimizes the number of potential surprises from code changes, and thus the number of flaws that can only be detected as runtime failures. This topic is expansive enough to require its own volume, but in general, Ecstasy's design emphasizes predictability above all other goals, and that predictability helps the language scale to large teams and to large projects.

Another design emphasis is on readability, meaning that when a choice is required between making the language easier-to-write versus easier-to-read, the choice has almost always been to make the language easier to read. This means that the Ecstasy language is not intended to win any terseness awards, and its style encourages the use of vertical space, explicit constructs, and rich documentation integrated as part of the code.

The Ecstasy language also includes fundamental support for the entire software development life cycle, including versioning and upgrades, mocking and testing, and dependency management -- including conditional dependencies.

From a runtime perspective, the Ecstasy language is designed to produce libraries and applications that can be (a) compiled to native executable formats, (b) executed in JIT/dynamically-optimizing runtime environments, or even (c) completely embedded into other languages, applications, and platforms! The language does not prescribe an explicit threading model, nor does it directly expose any of the raw capabilities of the underlying platform, allowing the language to scale from a single core, low-power CPU, all the way up to large SMP servers.

Memory management is fully automated by the language runtime, but it is not bound to a particular data structure (such as a heap), nor to a particular management approach (such as garbage collection or reference counting). The Ecstasy virtual machine (XVM) architecture was designed explicitly to efficiently support terabytes and even petabytes of application memory, without requiring any "stop the world" pauses.

So, why does cloud computing require a scalable language? First, because the cloud is connected, and the clients connecting to applications in the cloud range from low-power IoT devices, to phones and tablets, to PCs and web browsers, and even to other applications running in the cloud; a language built for the cloud needs to scale well across that range of requirements. Second, because quality is important: A language needs to be able to support the organic growth of applications over time, from prototype to MVP to production and beyond, all while supporting a growing development team and the accompanying development process automation. Lastly, because demand is bursty: A language for the cloud needs to scale because there is a potential for millions and even billions of connected clients.

Secure

The Ecstasy language is secure by design, and not through the use of layers of complex runtime logic and security checks. Security is a topic far more complex than can be contained in this article, let alone in a tome, but the Ecstasy principle is exquisitely simple: It must be possible to fully defend an environment from Ecstasy code running within that environment, even (or perhaps especially!) when the environment is itself built in the Ecstasy language.

To accomplish this, Ecstasy uses a container model, but unlike an operating system container, an Ecstasy container has absolutely no surface area from within the container -- not even an API! Code running within an Ecstasy container has no access to any operating system or hardware resources -- it doesn't even have visibility to the fact that an operating system or any hardware exists!

The resources that an Ecstasy library or application requires are enumerated within the compiled form of the Ecstasy code, and when the code is loaded, the host environment can decide precisely what to provide for each of those requested resources. All resources are provided via resource injection, and from the application's point of view, the resources are provided as if out of thin air! The resources (and their injection) are completely opaque to the application, even if the resources are implemented entirely in the Ecstasy language. Furthermore, resources are automatically limited in their surface area to the exact programming interfaces defined by the host.

Lastly, the Ecstasy runtime can only host Ecstasy code; Ecstasy cannot host native code. The only way to have "native" capabilities in an Ecstasy runtime is to inject a resource from outside of the Ecstasy runtime that in turn makes use of native code.

So, why does cloud computing require a secure language? First, because client devices and the servers in the cloud itself must never trust the code that they are hosting. Second, because a developer should never trust a language that doesn't assume the first.

Portable

The requirement for portability is pretty much a given in the cloud, but like so many of these terms, portability has multiple meanings associated with it. Like many modern languages, Ecstasy is a Turing-complete language with a well-defined intermediate compilation form that can be targeted to almost any operating system and hardware environment. It is compilable to a machine-dependent form ahead-of-time (AOT), it is just-in-time (JIT) compilable, and it is adaptively re-compilable (runtime profile-driven optimizations).

Code written in Ecstasy is also portable to the client. Ecstasy was designed to support compilation to Web Assembly in order to support browser deployment, ARM and x86 to support client device deployment, as well as the obvious cloud hosted deployment model.

Ecstasy is designed to support another form of portability: Application portability. By designing around a strict container model and an event-driven programming model, and by designing for the possibility of a fully managed runtime, Ecstasy applications can conceptually be lifted from one server or virtual machine and moved to another.

So why does cloud computing require a portable language? First, because the code needs to run across a variety of development environments and cloud provider platforms. Second, because applications themselves need to be able to migrate within a cloud and across cloud boundaries. Lastly, because the same code and libraries should be easily re-usable on client devices, in IoT applications, in a web browser, and on any other computer.

Reusable

It should be obvious by this point that reusability was high on the requirements list in the design of the Ecstasy programming language. Ecstasy approaches the challenge of reusability in several ways, but the most obvious is through its support for modular programming.

A module is the unit of reuse in Ecstasy. In Ecstasy, a module can be as simple as a few lines of code, or as complex as the world's biggest applications. When one module uses another, it creates a dependency that can be easily defined as optional, desired, or required. Each module contains all of the information to explain what module dependencies it has, down to the individual specific parts of those modules that it uses! Modules can be developed and versioned completely independently, but the dependencies can be specified in terms of versions -- including which versions to avoid.

Modules each have their own identity, including information about where they come from. Modules can be signed, and -- for execution in some environments -- even fully encrypted! Module repositories allow modules to be managed and made available online or offline, supporting capabilities such as caching and download-on-demand from the module provider or a module repository service. For the purpose of prepackaging, a module can even completely embed all of the modules that it depends on -- all within a single file!

Because a module is intended for wide reuse, it can carry all of its versions -- including patches and even pre-release versions -- inside a single, compact file. To manage the complexity of interactions among multiple third-party modules, each module carries all of its integration logic that allows it to support any of its optional dependencies, yet without the use of a language preprocessor, runtime reflection, or other convoluted and error-prone approaches to avoiding hard dependencies.

When working with an untrusted module, the module can be loaded and used in its own isolated container, relying on the secure design of the Ecstasy language and runtime. Not surprisingly, this is how Ecstasy cloud applications are intended to be hosted by default.

So why does cloud computing require reusability, and specifically with the type of solution offered in Ecstasy? First, because developers need to be able to safely reuse their own work. Second, because developers more and more rely on components, libraries, and services developed by others, and must be able to consume those capabilities in a secure, reliable, and predictable manner. Lastly, because without an explicit design for reusability, things get messy as the number of reused components and the complexity of their interactions grow.

Other

In the interest of not making this introductory article into its own book, many of the design elements that make Ecstasy the obvious language choice for the cloud must be unfairly glossed over, but we'll come back to them in later articles:

  • Nestable - The Ecstasy container model is recursive, allowing any Ecstasy application running in a container to create its own containers, and so on.
  • Simple - The Ecstasy language was designed to be easy to pick up by anyone who knows C++, Java, C#, or Python. Most of the time, you won't even realize that you're using a different language -- except that some of those little things that bothered you have been fixed.
  • Compact - The compiled form (IR) of an Ecstasy module is amazingly compact, even when carrying multiple versions and dependencies.
  • Service-Based and Event-Driven - Instead of exposing operating system threads, mutexes, semaphores, and other inflexible, error-prone constructs, Ecstasy supports a simple, well-defined service model, using an event-driven programming model with full support for both lambdas and continuations, and full support for both futures/promises and async/await.
  • Managed - All of the Ecstasy capabilities are designed to be hostable in a fully managed environment, whether in the cloud or in the palm of your hand. Resources like CPU, memory, storage, and network can be completely metered and managed. Operating system services can be carefully managed as well, preventing unauthorized and undesirable behavior.

Free and Open Source

Undoubtedly, the largest change in software in the past few decades has been cultural and not technological. The term “open” has long been used for marketing commercial software that had some slight yet often-only-theoretical potential for interoperability with other commercial software. Today, most core software components, libraries, operating systems and applications are available in complete source code form for use under open source or software libré license, and many of the specifications and standards – including languages and execution systems – that enable interoperability are similarly open and available. From an economic standpoint, it would appear that the demand for a fundamental set of software standards and components being available as a public good eventually out-weighed the cost of creating and managing that public good (even in some cases lacking any consistent centralized authority!), and the cost of reverting to private goods for that fundamental set of software standards and components is unacceptable for all but the most especial of requirements.

It is in this spirit that the Ecstasy specification (XTC/XVM) and source code will be made available, with its ideas and concepts often inspired from others’ open work, and – if any prove worthwhile – its own ideas and concepts freely available for re-use and recycling as the reader sees fit. To accomplish this, the source code will be made available under both a permissive open source license (Apache v2) and a software libré license (GPL), and the specifications will be made available under the permissive Creative Commons CC-BY-4.0 license.