2019/07/09

Null is no Exception

Since Tony Hoare formalized the concept of "null" in 1965, we have lived with an entire family of languages (including C, C++, Java, and C#) that made it possible for a pointer to contain a purposefully illegal address, for the purpose of representing the lack of a value. That purposefully illegal address is called null. (Or NULL, NIL, nil, etc.)

The reasoning was simple: Without using any additional memory, a pointer could be made to serve two purposes: First, to indicate whether or not a value exists, and secondly, what that value is if and only if it exists.

As type systems advanced, such that pointers became type-safe references instead of arbitrary integers, it became necessary to represent null as a typed value. To make it possible to assign the value null to any reference, these type systems made null a sub-type of all other types; otherwise, null would not be assignment compatible with any type other than the null type itself.


The use of null in all of these languages had an unfortunate side effect: Because null could be assigned to any type, it logically followed that each and every value might be null. That means that every single access to a value requires a null check, which in turn generates an exception in languages like Java (NullPointerException) and C# (NullReferenceException). In C, such code just segfaults (aka "Access Violation" in Windows) and core-dumps. Yay!

To avoid segfaults and exceptions, it became necessary to sprinkle code with lots of these:
if (s != null)
    {
    ...
    }
(Yay!)

There is an elegant solution to this ugliness, which is to make null into its own normal type, and not some magical "subclass of all classes" class, or "subtype of all types" type. In other words, simply by making the null value into an object reference of some normal class, it prevents that reference from being assigned willy-nilly to references of any other random type.

The complete code for the Nullable type (found in Extasy's module.x) is:
enum Nullable { Null }
That one line of code declares an enumeration class, called Nullable, with one value, called Null.
(Advanced: From an inheritance point of view, Null extends Nullable extends Object. From a composition point-of-view, as an enum, the class for Nullable incorporates the Enumeration mixin while the Nullable class itself is an abstract enum, and Null is an enum value. An enum value is a singleton const, which automatically implements both the Enum and Const interfaces. See source files: module.x, Const.x, Enum.x, Class.x, and Enumeration.x.)
This approach introduces some new requirements for the language's type system. First, a type system must be able to represent composite types, such as intersection types, union types, and difference types. Ecstasy represents intersection types with the "or" ("|") operator, because the code "(A|B)" reads "either type A or type B", which means that only the intersection of those two types can be assumed. (Apologies to any mathematicians reading this, but the "U" on our keyboard was stuck in the right-side-up position.)

Thus, to declare a type that can hold a value of either Nullable or String, and assign it a predictable value, one could write:
Nullable | String s = "Hello, world!";
This would quickly get old, so a short-hand notation for the "Nullable|" portion is the type-postfix "?"; here is the rewritten form of the above declaration, using the short-hand notation:
String? s = "Hello, world!";
Since the variable "s" is either a String value or a Nullable value, one can not ask it for its size:
Int len = s.size;    // compiler error!
The reason that "s" does not have a size is that its type is “Nullable or String”, and the Nullable type does not have a size property. This allows the compiler to know that the size property cannot be requested; this is an example of compile-time type safety. (Run-time type safety is exhibited by throwing a NullPointerException, etc.; Ecstasy has no equivalent to this exception, because such an exception cannot occur! And there was much rejoicing.)

Compile-type type safety allows the compiler to know when a value might be Null. By checking if the type is a String, the compiler subsequently knows that the value cannot be Null, and specifically that the value is a String, after which it is safe to obtain the String size:
if (s.is(String))
    {
    console.println($"String s is ${s.size} characters long.");
    }
Similarly, if the code explicitly compares to the Null value, then compiler can know when the value is or is not Null. The above code could be modified slightly by first testing if the value is not Null, so that the compiler subsequently knows (by process of elimination) that the value is a Stringafter which it is safe to obtain the String size:
if (s != Null)
    {
    console.println($"String s is ${s.size} characters long.");
    } 
The postfix "?" operator is a short-circuiting operator that performs the same not-Null test, so the above code could be written instead as:
console.println($"String s is ${s?.size} characters long.");
The short-circuiting "?" operator can be grounded using the else (":") operator. In the following example, if "a" is Null, or if "a.b" is Null, or if "a.b.c" is Null, then the result is the predictable value of "Hello, world!", otherwise the result is the value of "a.b.c":
String s = a?.b?.c? : "Hello, world!";
Ecstasy combines the postfix "?" and the else (":") operator into the elvis ("?:") operator:
String s = a ?: "Hello, world!";
The above code has the same effect as:
String s = a? : "Hello, world!";
As with most binary operators, it is possible to combine the operator with the assigment operator, such that:
x = x ?: y;
... can be rewritten using the elvis assignment operator as:
x ?:= y;
Notice the similarity in the postfix "?" operator,  the else (":") operator, and the elvis operator, with the ternary operator; each of the four following lines of code has the same result:
x = x!=Null ? x : y;    
x = x? : y;    
x = x ?: y;
x ?:= y;
That is a lot to wrap one's head around, but there is a simple logic behind it.


Finally, there is a special Null-aware assignment operator that splits a nullable type, such as "String?", into a tuple of Boolean and the (conditional) non-nullable portion of the type (e.g. "String"). This can be used wherever a condition can be used, such as in an "if" or "while" statement. For example, imagine some method or function that can return a nullable string value:
String? foo();
Other than the operator, this example should seem quite familiar by now:
if (String s ?= foo())
    {
    console.println($"String s is ${s.size} characters long.");
    } 
In the above example, if the function returns Null, then the result is the tuple (False), which is consumed by the if, causing the "else" branch of the if statement to be executed. Conversely, if the function returns a String value, then the result is the tuple (True, string-value), of which the (True) is consumed by the if, causing the string-value to be consumed by the assignment, and causing the "then" branch of the if statement to be executed.

Thus, it should be obvious that these two statements will have the same result:
s ?= foo();
s = foo()?;
As just another normal value, and thus without any mind-bendingly-crazy type system rules to accomodate some magical null value, Null is simultaneously less troublesome and more useful.

The null is dead. Long live the Null.

6 comments:

  1. Funny, it looks exactly like Ceylon

    Ceylon-lang.org

    ReplyDelete
    Replies
    1. Interesting that you would point that out. I very recently emailed Gavin saying something quite similar. I was quite amazed to see some of the decisions in Ceylon were pulled magically straight out of my head (just kidding, of course!), but that is the way it is in software -- sometimes an idea is so good, that multiple people find the same one.

      Delete
  2. In a statement such as:

    console.println($"String s is ${s?.size} characters long.");

    If s is null, then I suppose nothing should happen. But how large is the scope of code that doesn't happen if s is null? Does the implicit "else" code happen at the statement level, so the println is skipped and nothing before or after it? Or does the println print null? Or is the whole surrounding code block within some pair of delimiters skipped?

    ReplyDelete
  3. The AST nodes each have the ability to grant or withhold permission for a given branch to short-circuit in this manner. That means that "dangerous" places to short-circuit can be disallowed completely at compile-time. By default, expressions allow short-circuiting only if their enclosing AST node allows (and thus handles) the short-circuiting, so in this example, the entire line of code would be skipped if s were null.

    ReplyDelete
  4. The ? type constructor, the operators, and the compile-time non-null inference is straight from Kotlin. Do you also notice, given a union type of any kind, that if a run-time type test has been made, the compiler can infer the narrowed type?

    ReplyDelete
    Replies
    1. Hi John - Apparently, the "?" syntax emerged somewhat simultaneously in several different languages, including Gavin King's Ceylon. Long (i.e. at least a decade) before I ever saw it used, I had planned to use it as the "Nullable" type suffix myself, but that said, some of the operators that we chose in Ecstasy were definitely inspired by programmers pointing us at things they had seen in other languages, and I'm guessing that Kotlin was one of those. For example, the "a?.b?.c?.d? : e" syntax was inspired by another language (I would credit which one if I could remember now who suggested it and why). Similarly, our type inference was conceived long before the Kotlin language existed, and includes support for all of the different relational types that Ecstasy has. (There's still a project in our internal R&D called the "assumptions" project -- because the planned data structure that carries inference data is named the Assumption class; that is the follow-on to the current type inference implementation, which doesn't yet support property type narrowing because it lacks a cascading invalidation mechanism. Since I now work in the Ecstasy code base 90+% of the time, it's personally quite irritating to not have the inference on values stored in properties, i.e. those values have to be loaded into registers first -- but this choice made it much easier to ensure code and runtime type correctness.)

      I've never used Kotlin or Ceylon myself, but over the past few months I have had a chance to look at some of what they did, and I couldn't believe how many choices that they had made were very similar to things that we had done. I will take that as a positive sign.

      Delete

All comments are subject to the Ecstasy code of conduct. To reduce spam, comments on old posts are queued for review before being published.