2019/06/18

A pointed paean for C

Less than 20% of programmers claim to use C today, and the real number of programmers actually using C is likely far smaller, but C lives on quite pervasively in its influence on C++, Java, C#, and other languages. Starting with Java, most of the "managed runtime" languages purposefully omitted one of the most powerful features in C: The pointer. In its place, these languages provide a concept called a reference, which is a type-safe pointer whose value (a memory address) cannot be obtained or manipulated by the programmer.

Two important things were lost in the process, however:
  • The reference itself became opaque, in that its only capability in these languages is to be de-referenced; and
  • Pass-by-reference is no longer possible.
Ecstasy references, on the other hand, are themselves objects (because turtles), and because references are objects, references have references (because turtles). It may make your head hurt to picture this, but in use, it becomes the most obvious and simple concept imaginable. In Ecstasy, a reference is represented by the Ref interface:

A Ref represents a reference to an Ecstasy object. In Ecstasy, "everything is an object", and the only way that one can interact with an object is through a reference to that object. The referent is the object being referred to; the reference (encapsulated in and represented by a Ref object) is the object that refers to the referent.

An Ecstasy reference is conceptually composed of two pieces of information:
  • A type;
  • An identity.
The type portion of an Ecstasy reference, represented by the actualType property of the Ref, is simply the set of operations that can be invoked against the referent and the set of properties that it contains. Regardless of the actual operations that the referent object implements, only those present in the type of the reference can be invoked through the reference. This allows references to be purposefully narrowed; an obvious example is when an object only provides a reference to its public members.

The Ref also has a RefType property, which is its type constraint. For example, when a Ref represents a compile time concept such as a variable or a property, the RefType is the compile time type of the reference. The reference may contain additional operations at runtime; the actualType is always a super-set (⊇) of the RefType.

The identity portion of an Ecstasy reference is itself unrepresentable in Ecstasy. In fact, it is this very unrepresentability that necessitates the Ref abstraction in the first place. For example, the identity may be implemented as a pointer, which points to an address in memory at which the state of the object is stored. However, that address could be located on the process' program stack, or allocated via a dynamic memory allocation, or could point into a particular element of an array or a structure that itself is located on the program stack or allocated via a dynamic memory allocation. Or the identity could be a handle, adding a layer of indirection to each of the above. Or the identity could itself be the object, as one would expect for the simplest (the most primitive) of types, such as booleans, bytes, characters, and integers.

To allow the Ecstasy runtime to provide the same behavioral guarantees regardless of how objects are allocated and managed, how they are addressed, and how house-keeping activities potentially affect all of the above, the Ref provides an opaque abstraction that hides the actual identity (and thus the actual underlying implementation) from the program and from the programmer.

Because it is impossible to represent the identity in Ecstasy, the Ref type is itself simply an interface; the actual Ref instances used for parameters, variables, properties, array elements, and so on, are provided by the runtime itself, and exposed to the running code via this interface.
Ref is read-only; the read/write form is the Var interface, which extends Ref. To obtain a Ref or a Var, we use the C address-of operator, "&":

String str = "Hello world!"; 

// get a read-only reference to the variable
Ref<String> ref = &str;

// alternatively, get a read/write reference to the variable
Var<String> var = &str;

// modify the variable via a reference
var.set("Goodbye, cruel world!"); 

// that modified the value that is held in the variable!
assert str == var.get();

// which we can also see through the read-only reference
assert ref.get() == str;

The last concept to grasp is this: Objects are of a class, but references are of a type. In most OO languages, the object's class is its type, but one takes a different route when designing a language -- like Ecstasy -- to build portable, containerized, safe, and secure applications in the cloud, versus designing a language -- like C -- to build an operating system.

This concept is unusual coming from the C++ (vtable-based, compile-time types only) family of languages, but -- very importantly! -- this concept does not create any additional cognitive load for the application developer. What it does allow, though, is for a systems developer to dynamically and securely reduce the surface area of an object when sharing that object across a container boundary.

2 comments:

  1. I do miss a good (C) pointer (type) and some obfuscated/unsafe arithmetic on it ! :)

    my favorite:

    typedef struct {
    unsigned int l;
    char b[0];
    } Buffer, *BufferP;

    ReplyDelete
  2. Indeed. Even more fun when you hand out the pointer to "b", and then you have to subtract from that pointer to get back to the length field. Not that I would ever do such a thing ... ;-)

    ReplyDelete

All comments are subject to the Ecstasy code of conduct. To reduce spam, comments on old posts are queued for review before being published.