2019/06/20

The Quest for Equality

It's shocking how difficult equality is to get right in software. What does equality even mean?

Equality used to be so simple in the days of assembly language, when "bitwise equality" was all that existed, and the only things that could be compared were fixed-size CPU registers. Things got more and more complex, until we ended up with Object Oriented languages, in which each class may have its own idea of what equality means.

On top of that, the most popular languages today each have multiple forms of equality. Python offers "is" versus "==". Java and C# offer "==" and "o1.equals(o2)" -- which can easily differ from "o2.equals(o1)" because it is neither symmetric nor transitive. Javascript has both an "is" and "==", and just keeps adding equals signs when new meanings are desired; so far, they're up to "===", but anyone who denies the likelihood of a future "=====" operator is just kidding themselves.

Obviously, equality is not a simple problem. Most of the existing solutions are broken, because they cannot provide either symmetric or transitive behavior. Many of these problems are the direct result of having multiple type systems within a single language; that helps to explain why many of these languages provide two forms of equality: One for the primitive "value" type system, and one for the object type (reference-based) system.

It's obvious that in an object type system, a class needs to be able to define equality for that class, replacing any superclass definition of the same, which means that the definition of equality is virtual, as in "virtual method invocation". It's also obvious that such a definition cannot be a single dispatch method, because then "o1.equals(o2)" may provide a completely different answer from "o2.equals(o1)". There have been languages that attempt to address this type of conundrum with multiple dispatch support, but by the same token, one can swat a fly with an atomic bomb.

Generic types further add complexity to the notion of equality, because each type parameter may contribute its own potentially conflicting notion of equality to the equation.

So what's the answer?

Intent. Intent matters. The developer's intent when they write a line of code is important, but somehow that intent manages to be erased by a language compiler. (It's no mistake that the term type erasure is used to describe a generic type system that is implemented by forgetting what those types actually were.)

Ecstasy captures the developer intent by capturing the compile time type of the references in question, and retaining that information in the compiled code, and using that information for hard problems like equality.

Consider the following example:
Collection<String> c1 = foo();
Collection<String> c2 = bar();
if (c1 == c2)
    {
    // ...
    }
It doesn't matter what the actual runtime type of the object is that c1 or c2 refers to, because the developer clearly explained that they are Collections of String objects. The class of the collection is unknown here at compile time, and instead the Collection interface is used, which defines equality in a certain way. In theory, those collections could contain objects of various sub-classes of String, but when those contents are compared, they will be compared as Strings. Because that was the intent of the developer.

To accomplish this, equality is implemented non-virtually; here is the equals function on Collection:

/**
 * Two collections are equal iff they are they contain the same values.
 */
static <CompileType extends Collection>
        Boolean equals(CompileType collection1, CompileType collection2)
    {
    // they must be of the same arity
    if (collection1.size != collection2.size)
        {
        return False;
        }

    if (collection1.sortedBy() || collection2.sortedBy())
        {
        // if either is sorted, then both must be of the same order;
        // the collections were of the same arity, so the second iterator
        // shouldn't run out before the first
        Iterator<CompileType.Element> iter1 = collection1.iterator();
        Iterator<CompileType.Element> iter2 = collection2.iterator();
        for (val value1 : iter1)
            {
            assert val value2 := iter2.next();
            if (value1 != value2)
                {
                return False;
                }
            }

        // the collections were of the same arity, so the first iterator
        // shouldn't run out before the second
        assert !iter2.next();
        return True;
        }
    else
        {
        return collection1.containsAll(collection2);
        }
    }
What makes this work is that the compile-time type is passed to the equals function. Not the name of the type. Not the "type value". The type. The actual, for-real, usable-as-a-type-name-in-your-code type. A formal type. (Yes, types are objects. But they're also types.)

So how did this function get called? Well, because the compiler determined, and retained, and used, the compile-time type. Both the "==" operator and the "!=" operator ultimately cause a call to this function to occur.

And what if, instead, the developer had specified these variables as being Lists?
List<String> c1 = foo();
List<String> c2 = bar();
if (c1 == c2)
    {
    // ...
    }
Again, it doesn't matter what the actual runtime types of those object are, because the developer clearly explained that they must be Lists of String, and the result of the "==" operator is a call to List:
/**
 * Two lists are equal iff they are of the same size, and
 * they contain the same values, in the same order.
 */
static <CompileType extends List> Boolean equals(CompileType a1, CompileType a2)
    {
    Int c = a1.size;
    if (c != a2.size)
        {
        return False;
        }

    for (Int i = 0; i < c; ++i)
        {
        if (a1[i] != a2[i])
            {
            return False;
            }
        }

    return True;
    }
Even more obvious is that the equals functions themselves simply use the "==" and "!=" operators on the values with the Collections or Lists, because the compile time type of those contents has not been erased, and was encoded into the compiled code, and was used in the call to the Collection or List equals function. And this works for arbitrarily deep nesting of generic types, because turtles.

And it's type safe.
And it's symmetric.
And it's transitive.
And it's readable.
And it's obvious.
And it's correct.

One more thing. What if you really want to know if two objects are actually "the same object"? In Ecstasy, as described previously, the reference (Ref) to an object contains the type and the identity of the object that is referred to. Sameness refers to that identity, and the equals function on Ref tests for the equality of that identity:

List<String> c1 = foo();
List<String> c2 = bar();
if (&c1 == &c2)
    {
    // ...
    }
... which is exactly what the Object.equals function does itself:
static <CompileType extends Object>
        Boolean equals(CompileType o1, CompileType o2)
    {
    return &o1 == &o2;
    }

4 comments:

  1. This approach isn't unique to Ecstasy, here is a simple Collection trait in Rust, and two types implementing Collection are compared via Collection's definition of equality (posted on Rust Playground):

    https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=289ca8bc342ee6c1d0b9be7ade7ccae4

    ReplyDelete
    Replies
    1. Hi Bill -

      The link that you provided did not work for me. It took me to the default Rust "Hello, world!" example.

      ...two types implementing Collection are compared via Collection's definition of equality...

      When I went and looked today, I was not able to divine the workings of the Rust language in this area from reading various examples and library source code.

      Having read some of "Programming Rust", I do not believe that your comment is correct.

      It is not about being able to compare two collections for equality; it is about picking the correct definition of equality.

      Rust has virtual behavior, and Rust has compile time (static) function selection, but it does not have runtime type selection (function selection) based on compile time types.

      Like the article said, we've never seen this approach used before, but if it has been used (or other approaches with similar advantages), I'd personally appreciate pointers to any of that work.

      Peace,

      Cameron.

      Delete
  2. Well done. I have always wondered why equality is such a mess in many popular languages.

    ReplyDelete
    Replies
    1. Same here! It is a hard problem, but hopefully now that we've shown a decent solution, other languages can start to re-think their approaches.

      Delete

All comments are subject to the Ecstasy code of conduct. To reduce spam, comments on old posts are queued for review before being published.