All designs have priorities, but only some designs begin
with the end in mind. When priorities are not explicitly stated, it is easy to
chase the priorities that most effectively combine compulsive emotional appeal
with apparent ease of implementation, in lieu of the priorities that are most
valuable to the intended audience of a design. In order to create a coherent
design that serves the intended audience, the Ecstasy specification began with a
conscious discussion about priorities, and a series of explicit decisions as to
what those priorities would be:
- Correctness, aka Predictability. The behavior of a language must be obvious, correct, and predictable. This also incorporates The Principle of Least Surprise.
- Security. While generally not a priority for language design, it is self-evident that security is not something that one adds to a system; security is either in the foundation and the substrate of a system, or it does not exist. Specifically, a language must not make possible the access to (or even make detectable the existence of) any resource that is not explicitly granted to the running software.
- Composability. High-level computer languages are about composition. Specifically, a language should enable a developer to locate each piece of design and logic in its one best and natural place.
- Readability. Code is written once, and referenced many times. What we call “code” should be a thing of beauty, where form follows function.
- Lastly, a language must be recursive in its design. There is no other mechanism that predictably folds in complexity and naturally enables encapsulation. It’s turtles, the whole way down.
On Predictability vs. Performance
In the course of researching language design, one preterdominant
concern emerges: That of execution performance. While many different concerns
are evaluated and balanced against each other in a typical design, and while
the goal of performance is often explicitly ranked in a position of lesser
importance, in reality there is no one goal more considered, discussed, and
pursued. Regardless of the importance assigned to performance as a language
goal, performance to the language designer is a flame to a moth.
Perhaps it is simply best to admit, up front, that no
language can survive – let alone succeed – without amazing feats of
performance. Yet performance as a goal tends to conflict with other goals,
particularly with respect to manageability, serviceability, and the quality of
the abstractions that are presented to the developer.
Beginning programmers often ask: “Is language A faster than B?” After all, no one wants to be using a slow language, any more
than someone would want to buy a slow automobile or a slow computer. Speed is a
thrill, and speed in software execution holds no less attraction than a
souped-up hot rod with a throaty growl and a body-rumbling ride.
The corollary to that question is “Why is language B slower than A?” The answers to this question tend to be very illuminating. Take
any two languages that compile to and execute as native machine code, and
compare their performance for a given task. Despite running on the same
hardware, and using the same hardware instruction set, one may be dramatically
slower than the other, by a factor of 10%, or 100% (half as fast), or even
1000% (an order of magnitude slower). How is such a difference possible?
The answer lies in translation, specifically in the
automated translation of idioms. A language such as C selected idioms that
closely represented the underlying machine instructions of the day, which
allowed programmers to write simple programs that could be almost transliterated from C to machine code.
In other words, the language chose as its abstractions the same set of
abstractions that the CPU designers were using, or abstractions that were at
most one translation removed from the machine’s abstractions. This allowed for
very simple compilers, and made it extremely simple to support localized
optimization.
A localized optimization is an optimization in the compiled
code that can be made using only information about code that is most local to the code that is being
optimized; in other words, information outside of the scope of the small amount
of code being optimized is not even considered. Many optimizations in the C
language, for example, are extremely local, such that they can be performed
without any information about code outside of a particular expression or
statement; it is hard to imagine more localized optimizations.
However, there is a trade-off implicit in achieving such
simple and direct optimizations: The abstractions provided by the language are
constrained by the abstractions provided by the CPU. As one could rightfully surmise,
hardware abstractions tend not to be very abstract, and the abstractions
provided by hardware instruction sets tend to be only slightly better. In its
early days, the C language was jokingly referred to as “assembly with macros”,
because as a language, it was only slightly higher level than assembly itself.
Computing efficiency is often stated in terms of a tradeoff
between time (CPU) and space (memory); one can often utilize one in order to
optimize for the other, subject to the law of diminishing returns.
Unfortunately, there is no single simple metric that captures computing
efficiency, but if the trade-off between time and space appeared as a graph
with time on one axis and space on the other, it might generally resemble the
shape of the curve y=1/x,
which most closely approaches the origin at (1,1). If there were a single computing
efficiency measurement for a programming language, it could arguably be
represented by the closest that this trade-off curve approaches the origin
(0,0), which distance could be considered the minimum weighted resource cost. To
calculate a language’s efficiency for a particular purpose, one would calculate
the inverse of the minimum weighted
resource cost.
While one can easily speak about efficiency in hypothetical
terms, a benchmark is a wonderful servant but a terrible master. The path
chosen in the design of the XVM is to consciously avoid limits on potential
efficiency by consciously avoiding contracts whose costs are not clearly defined
and understood. This approach can be explained in the inverse, by way of
example in existing languages and systems: Often, features and capabilities
that were considered to be “easy” implementation-wise and “free”
efficiency-wise when they were introduced, ultimately emerged as the nemesis to
efficiency, due to the inflexibility of the programming contracts that these
features and capabilities introduced[1].
To understand this, it is important to think of abstractions
as a two-sided coin: On one side, we see the benefit of the abstraction, which
allows a programmer to work with ever-larger and more powerful building blocks,
while the other side of the coin represents the cost of the abstraction, which
is called the contract. Imagine, for
example, having to build something in the physical world, out of actual matter.
One could conceivably build a structure out of atoms themselves, assembling the
necessary molecules and arranging them carefully into perfectly formed crystalline
structures. The contracts of the various elements are fairly well understood,
but yet we wouldn’t try to build a refrigerator of out individual atoms.
One could imagine that building from individual atoms is the
equivalent of building software from individual machine instructions, in two
different ways: First, that the refrigerator is composed of atoms, just like
all software is executed at some level as machine instructions; and second,
that as the size and complexity of the constituent components increase, the minutiae
of the contracts of the sub-components must not be surfaced in the contracts of
the resulting components – those details must be hidable! This purposeful
prevention of the surfacing of minutiae is called encapsulation, and exists as one of the cornerstones of software
design. It is why one can use a refrigerator without knowing the number of
turns of wire in the cooling pump’s motor, and why one can use a software
library without worrying about its impact on the FLAGS register of the CPU.
Ultimately, it is the recursive composition of software that
creates challenges for optimization. While low level optimizations are focused
on the creation of more efficient low level code, higher level optimizations
rely on explicit knowledge of what portions of a component’s contract – or the
contracts of the various sub-components – can be safely ignored. In other
words, the optimizer must be able to identify which contractual effects are
ignored or discarded by the programmer, and then leverage that information to
find alternative execution solutions whose contracts manage to cover at least
the non-discarded and non-ignored contract requirements. Higher-level optimizations
target the elimination of entire aspects of carefully defined behavioral
contracts, and as a result, they typically require extensive information from
across the entire software system; in other words, high-level optimizations
tend to be non-localized to the extreme! No software has been more instrumental
in illustrating this concept than Java’s Hotspot virtual machine, whose
capabilities include the inlining of potentially polymorphic code by the
determination that the potential for dynamic dispatch is precluded, and the
elimination of specific memory barriers in multi-threaded programs as the
result of escape analysis.
To enable these types of future optimizations, the contracts
of the system’s building blocks must be explicit, predictable, and purposefully
constrained, which is what was meant by the goal of “consciously avoiding
contracts whose costs are not clearly defined and understood.” The contracts in
the small must be encapsulatable in the large, which is to say that contracts
must be composable in such a way that side-effects are not inadvertently
exposed. It has been posited[2]
that “all non-trivial abstractions, to some degree, are leaky,” but each such
leak is eventually and necessarily realized as a limit to systemic efficiency.
[1]
Such contracts are software legacy,
meaning that the contracts cannot be altered without disclaiming the past; in
other words, altering the contracts will break everything.
Regarding performance:
ReplyDelete- In Java there is Valhalla which is trying to improve performance by making memory layout of objects more dense, would such optimizations be possible in XTC?
- Given that native code is not allowed, will it be possible to make use of GPU programming, e.g. for speeding up matrix multiplication for neural networks using CUDA?
- In java dynamic bytecode generation is sometimes used for performance optimization, e.g. to speed up SpEL expressions, would such optimizations be impossible in XTC?
Yes, indeed. The Valhalla concepts are not just "possible", they were designed into the XVM from the start.
ReplyDeleteAnd yes, indeed, enabling CUDA (etc.) is part of the design. If you look at the instruction set, you will see that 2d matrix operations (for example) are part of the fundamental design.
Lastly, while it is *very* easy in Ecstasy to load dynamic code on the fly (because Ecstasy is explicitly designed to securely do that!), we don't really use the "optimization sites" concept that Java does. So, to be honest, I'm not sure how best to answer this particular question, because we have several different routes for such an optimization, and I'm not sure which would be good fits for byte-code generation-based optimizations.