Tag Archives: performance

The Cost of Compile Time in D

When I was creating my presentation for dconf online 2022, I was looking at alternatives to building constraints. If you watched my talk, you can see the fruit of that experiment in my strawman library (which is very much a proof-of-concept, and not ready for real use).

But it got me thinking — how much more expensive are these strawman constraints than the current Phobos range constraints? But even before I went that far, I started looking at some of the phobos constraints, and realized even there, we can achieve some savings.

Consider the constraint for isInputRange:

enum bool isInputRange(R) =
    is(typeof(R.init) == R)
    && is(ReturnType!((R r) => r.empty) == bool)
    && (is(typeof((return ref R r) => r.front)) ||
        is(typeof(ref (return ref R r) => r.front)))
    && !is(ReturnType!((R r) => r.front) == void)
    && is(typeof((R r) => r.popFront));

Let’s focus on one aspect of this, the use of the ReturnType template. What does that do? Essentially, it takes the parameter (in this case a lambda function) and evaluates to the return type of the callable.

But…. we have that as part of the language, don’t we? Yeah, it’s called typeof. typeof gives you the “type of” an expression. And it’s a direct link into the compiler’s semantic analysis — no additional semantic computation is needed.

To see what we are comparing against, let’s take a look at the ReturnType template (and its dependencies):

template ReturnType(alias func)
if (isCallable!func)
{
    static if (is(FunctionTypeOf!func R == return))
        alias ReturnType = R;
    else
        static assert(0, "argument has no return type");
}

template FunctionTypeOf(alias func)
if (isCallable!func)
{
    static if ( (is(typeof(& func) Fsym : Fsym*) && is(Fsym == function)) || is(typeof(& func) Fsym == delegate))
    {
        alias FunctionTypeOf = Fsym; // HIT: (nested) function symbol
    }
    else static if (is(typeof(& func.opCall) Fobj == delegate) || is(typeof(& func.opCall!()) Fobj == delegate))
    {
        alias FunctionTypeOf = Fobj; // HIT: callable object
    }
    else static if (
            (is(typeof(& func.opCall) Ftyp : Ftyp*) && is(Ftyp == function)) ||
            (is(typeof(& func.opCall!()) Ftyp : Ftyp*) && is(Ftyp == function))
        )
    {
        alias FunctionTypeOf = Ftyp; // HIT: callable type
    }
    else static if (is(func T) || is(typeof(func) T))
    {
        static if (is(T == function))
            alias FunctionTypeOf = T;    // HIT: function
        else static if (is(T Fptr : Fptr*) && is(Fptr == function))
            alias FunctionTypeOf = Fptr; // HIT: function pointer
        else static if (is(T Fdlg == delegate))
            alias FunctionTypeOf = Fdlg; // HIT: delegate
        else
            static assert(0);
    }
    else
        static assert(0);
}

template isCallable(alias callable)
{
    // 20 lines of code
}

template isSomeFunction(alias T)
{
    // 15 lines of code
}

Whoa, that’s a lot of code to tell me what the type of something is! Why is it so complex? The reason is because in order to determine the return type of something, we have to use the typeof primitive, but this needs a valid expression. For a callable, that means we need a valid set of parameters. All of that needs to be introspected by the library, which is simply given a symbol and doesn’t know anything about that symbol without context.

However we have context! We know exactly how to call the lambda function we have constructed, with an R! Why do we need this complexity for something that should be a simple call? As most well-versed in writing generic library code know, this is not an easy thing to do (sometimes generic types can’t be easily constructed, or you might have issues with disabled copying, etc.). In addition, ReturnType is built to handle all sorts of callable things, not just lambda functions.

But isInputRange doesn’t actually need to construct, or even have a valid R for generating the expression, all it needs is an already existing R to call methods on it. We can do this using a reinterpret cast of null to an R* and now we have an “already made” R. Yes, this would crash if actually run, but we don’t ever need to run it, we just need to get its type! And so, here is an equivalent isInputRange template that does not use ReturnType:

enum isInputRange(R) =
    is(typeof(R.init) == R)
    && is(typeof(() { return (*cast(R*)null).empty; }()) == bool)
    && (is(typeof((return ref R r) => r.front)) ||
        is(typeof(ref (return ref R r) => r.front)))
    && !is(typeof(() { return (*cast(R*)null).front; }()) == void)
    && is(typeof((R r) => r.popFront));

The difference here is we have a no-argument lambda, and so we don’t have to rely on library tricks or introspection to know how to call it (and as you can see, we call it with no parameters as expected).

Measuring the results

Given an isInputRange template that is completely independent of std.traits, what is the result? How much does it save?

To test this, I wrote a program generator that created 10000 identical but independently named input ranges, that are tested like this:

struct S0 { int front; void popFront() {}; bool empty = false; }
static assert(isInputRange!S0);
struct S1 { int front; void popFront() {}; bool empty = false; }
static assert(isInputRange!S1);
...
struct S9999 { int front; void popFront() {}; bool empty = false; }
static assert(isInputRange!S9999);

Running on my Linux system, using DMD 2.101.2, I get the following results:

COMMANDTIMEMEMORY USAGE
dmd -version=usePhobos2.75s1.755G
dmd -version=useTypeof1.47s621M

Looking at the savings, it’s quite significant — almost 50% time savings, and over 65% memory savings. Note that each call to ReturnType is unique, and so it will execute its own semantic analysis. Using the compiler’s -vtemplates switch, we can see that using the current Phobos adds quite a few dependent templates. For each usage of isInputRange, we see:

  • 2 distinct instantiations of ReturnType
  • 4 instantiations of isCallable (2 distinct)
  • 2 distinct instantiations of FunctionTypeOf
  • 2 distinct instantiations of isSomeFunction

All that adds up to an additional 8 distinct template instantiations, and 10 total instantiations. A distinct template instantiation will run semantic analysis, but a non-distinct one will just find the existing template in the symbol table and return it.

Using the measurement numbers we can somewhat extrapolate that each ReturnType instantiation adds 64 microseconds, and consumes 56.7K of RAM. The RAM consumption comes from storing the additional template instantiation symbols in the symbol table.

Conclusion

Such small savings, why is it important? It’s important because this is a perfect example of “death by 1000 paper cuts”. Each little template instantiation gives us a bit of convenience, but adds a tiny cost. These costs can add up significantly, and produce an overall compiler experience that is frustratingly slow, or worse, runs out of memory (yes, I have had this happen)! For something such as isInputRange, which almost nobody ever looks at or needs to, the cost is not well spent — especially considering how short and readable the alternative is!

When you reach for something in std.traits, consider what the compile-time cost might be, and don’t always assume that a small call will be efficient. Are you writing something people have to understand easily? If not, make the messy details as complex as needed to avoid such costs. If you can write the same thing using builtins, it will run faster, and it might even work better. I like to prefer compiler builtins such as typeof, is expressions and __traits to std.traits whenever possible, as long as the cognitive load of the resulting code isn’t too great (and yes, it can be).

I do plan to submit a PR to streamline everything I can about the range traits, maybe we can all pitch in and see where some of this interdependent fat can be trimmed all throughout Phobos!