Introduction
The basic functions of Racket are implemented
in C and are called primitives, for example cons,
car, +,
*, and many more (they are about
1500). In this article we will see how to add a new primitive.
In general, it’s not usual to add a new
primitive, but this seems like an interesting exercise to understand how the
other Racket primitives are implemented. Moreover, the new primitive is useful
for the secret hidden type system of the optimizer of Racket (which is
unrelated to the type system of TypedRacket).
The new primitive
The new primitive will be
(strict-true? v)
Returns #t
if v is #t
and otherwise returns #f.
In general, it is not good Racket code style
to compare something with #t. Usually
the code should only see that the value is not #f.
As it’s a bad style primitive, I will hide it to make it inaccessible.
(Although it is hidden, the internal test may
call it by importing it from #%kernel.
Anyway, to make some examples for the article, I will export it renamed as ugly-strict-true? that clearly express my
opinion.)
The new primitive is similar to
(not v)
Returns #t
if v is #f
and returns #f otherwise.
So the plan is to see how not is implemented, and copy its implementation.
You can look at the details and the complete code in github.
You can look at the details and the complete code in github.
Disable cstartup.inc
The first step to add a primitive is to make
Racket ignore the cstartup.inc
file that is actually a .zo
file with the bytecode encoded in a strange way. The content of the .zo
files depends on the position of each primitive, so when you add or remove a
primitive they become obsolete. In the same step we change the version number
so all the .zo files are recalculated. (When
the file cstartup.inc
disabled everything is slower, so remember to enable it after finishing the changes.)
To disable this cstarup.inc
file we have to edit the file schminc.h and put
#define USE_COMPILED_STARTUP 0
The version number is changed in schminc.h
and base/info.rkt
Adding the primitive
First you have to write the C implementation,
which in this case goes in the file bool.c.
There is nothing surprising here. We just look if the first argument is scheme_true, that is the internal name
of the variable that has the value #t.
static Scheme_Object *strict_true_p_prim(int
argc, Scheme_Object *argv[])
{
return (SAME_OBJ(argv[0], scheme_true) ? scheme_true: scheme_false);
}
{
return (SAME_OBJ(argv[0], scheme_true) ? scheme_true: scheme_false);
}
Then we create a primitive to allow us to use
the function in defined in C in the Racket code, and we add it the environment
to be recognized for compiling the new programs in Racket.
p = scheme_make_folding_prim(strict_true_p_prim,
"strict-true?", 1, 1, 1);
SCHEME_PRIM_PROC_FLAGS(p) | = scheme_intern_prim_opt_flags (SCHEME_PRIM_IS_OMITABLE);
scheme_add_global_constant( "strict-true?", p, env);
SCHEME_PRIM_PROC_FLAGS(p) | = scheme_intern_prim_opt_flags (SCHEME_PRIM_IS_OMITABLE);
scheme_add_global_constant( "strict-true?", p, env);
This definition expresses many details about
the primitive. Luckily it is like not
and we can copy it.
Some of the following examples seem
ridiculous, but after the original code suffers several steps Racket expansion,
macros and inlining, sometimes the result looks like this.
It accepts a single argument
So we don’t need to check the number of
arguments. So
(strict-true? 1 2)
fails at run time.
(strict-true? 1 2)
fails at run time.
It returns a single value.
This enables some optimizations, for example
(lambda (x) (car (cons (strict-true? x)
x)))
==>
(lambda (x) (strict-true? x))
==>
(lambda (x) (strict-true? x))
instead of
==>
(lambda (x) (values (strict-true? x)))
(lambda (x) (values (strict-true? x)))
in which values
is used to check that the result has exactly one value.
And it also marks
(lambda (x)
(display x)
(strict-true? x))
(display x)
(strict-true? x))
as a function that returns a single value,
which helps the interpreter/JIT to avoid generating code for the case of
multiple values.
We mark it as folding,
So during optimization it can participate in
the constant folding, in expressions like (strict-true?
#t) ==> #t.
Or more complicated cases, that use the
interaction with other optimizations, such as
(lamba (v)
(when (vector? v)
(strict-true? (box? v))))
==>
(lamba (v)
(when (vector? v)
(strict-true? #f)))
==>
(lambda (v)
(when (vector? v)
#f)
(when (vector? v)
(strict-true? (box? v))))
==>
(lamba (v)
(when (vector? v)
(strict-true? #f)))
==>
(lambda (v)
(when (vector? v)
#f)
We mark it as OMITABLE,
So will automatically have some reductions as
(lambda (x)
(strict-true? x)
(display x))
==>
(lambda (x)
(display x))
(strict-true? x)
(display x))
==>
(lambda (x)
(display x))
It’s not marked as UNARY_INLINED
The attentive reader will note that it is not
marked as UNARY_INLINED, that
is another flag that not
has, but we'll add it soon.
Enable cstartup.inc again
Now we can recalculate the file cstartup.inc
using
make cstartup
and re-enable it modifiying the file schminc.h
#define USE_COMPILED_STARTUP 1
While I'm preparing new code, I like to keep
these steps in different commits in case I have to make a rebase and recalculate
cstartup.c, but the final version is better
to squash them into one.
JIT
If the primitive is simple, it is possible to
add it to the list of primitives that are processed by the JIT so that the running
code is compiled to machine code rather than interpreted from the bytecode.
Racket uses GNU lightning with several conventions and auxiliary functions to
facilitate writing code.
First we have to add a flag in the definition
of strict-true? in bool.c
p = scheme_make_folding_prim(strict_true_p_prim,
"strict-true?", 1, 1, 1);
SCHEME_PRIM_PROC_FLAGS(p) |= scheme_intern_prim_opt_flags(SCHEME_PRIM_IS_UNARY_INLINED
| SCHEME_PRIM_IS_OMITABLE);
scheme_add_global_constant("strict-true?", p, env);
SCHEME_PRIM_PROC_FLAGS(p) |= scheme_intern_prim_opt_flags(SCHEME_PRIM_IS_UNARY_INLINED
| SCHEME_PRIM_IS_OMITABLE);
scheme_add_global_constant("strict-true?", p, env);
And then we add in jitinline.c
else if (IS_NAMED_PRIM(rator,
"strict-true?")) {
generate_inlined_constant_test(jitter, app, scheme_true, NULL, for_branch, branch_short, dest);
return 1;
}
generate_inlined_constant_test(jitter, app, scheme_true, NULL, for_branch, branch_short, dest);
return 1;
}
This case was easy because we already have
the function generate_inlined_constant_test
that generates the JIT code to check if the value is equal to the chosen
constant, in this case scheme_true.
But generally writing the implementation of a primitive for the JIT is very
very painful.
To see the effect of this change, we can
compare how long it takes the following program to run before and after
enabling the JIT version of strict-true?.
We compare it with the equivalent version for not.that
in both case use the JITed code. (We use a mutable variable x, so that the optimizer is not sure about
the value of x and it have to check
the value in each cycle, and there is no risk that a clever optimization removes
the check.)
#lang racket / base
(define x 'something-else)
(set! x x)
(time
(for/or ([i (in-range 1000000000)])
(not x)))
(time
(for/or ([i (in-range 1000000000)])
(ugly-strict-true? x)))
(define x 'something-else)
(set! x x)
(time
(for/or ([i (in-range 1000000000)])
(not x)))
(time
(for/or ([i (in-range 1000000000)])
(ugly-strict-true? x)))
cpu time
(ms)
|
Before:
|
After:
|
not
|
4109
|
4109
|
ugly-strict-true?
|
9031
|
4110
|
These times include time of the the loop, if
we could deduct its time the difference would be even more remarkable. Usually,
I like to repeat the measurement several times to get an informal version of a statistics,
but in this case each test takes about 5-10 seconds and the improvement is x2
so it’s not necessary to repeat it to be sure.
Global variable
With the changes that we have already made
the primitive is completely ready. Now we can add some specific optimizations
for strict-true?.
Some optimizations can recognize strict-true?
by name, but for others it is necessary to put strict-true?
in a global variable so that it can be recognize or use from other files.
In schpriv.h
we add
extern Scheme_Object
*scheme_strict_true_p_proc;
There is no official rule for the name of the
global variables, but generally if it is a primitive the name is scheme_<name>_proc,
where in the name of the - changes
to _ and ? to _p
and things like that, with a few exceptions.
In the definition in bool.c
we have to save the primitive in the variable and register the variable with the
garbage collector.
READ_ONLY
Scheme_Object *scheme_strict_true_p_proc;
REGISTER_SO(scheme_strict_true_p_proc);
p = scheme_make_folding_prim(strict_true_p_prim, "strict-true?", 1, 1, 1);
REGISTER_SO(scheme_strict_true_p_proc);
p = scheme_make_folding_prim(strict_true_p_prim, "strict-true?", 1, 1, 1);
SCHEME_PRIM_PROC_FLAGS(p) |=
scheme_intern_prim_opt_flags (SCHEME_PRIM_IS_UNARY_INLINED
|
SCHEME_PRIM_IS_OMITABLE);
scheme_strict_true_p_proc = p;
scheme_add_global_constant("strict-true?",
p, env);
Optimizations
There are several interesting optimizations that
we can add now. To make this article short I will not put the code here and I’ll
left as an exercise for the interested reader to search for the details in the commits.
All the changes are in functions in optimize.c.
Relevant predicate
We add strict-true?
to the list of relevant predicates, which are the ones that are tracked by the Racket
optimizer. These predicates are almost disjoint (I don’t know the technical
term), so this automatically enables optimizations such as
(strict-true? (vector 1 2 3)) ==> #f
and also
(vector? (if x
(begin (display x) #t)
#t))
==>
(begin
(if x
(begin (display x) #t)
#t)
#f)
(begin (display x) #t)
#t))
==>
(begin
(if x
(begin (display x) #t)
#t)
#f)
Replace a variable by a constant
If the optimizer deduces that a variable
verifies strict-true?
then it replaces it by #t.
(lambda (x)
(when (strict-true? x)
x))
==>
(lambda (x)
(when (strict-true? x)
#t))
(when (strict-true? x)
x))
==>
(lambda (x)
(when (strict-true? x)
#t))
Add boolean? as a relevant predicate
We can also add to the primitive boolean? to the relevant predicates.
The problem is that boolean?
includes strict-true? and
not (not
is a predicate, is more evident if you write it as not?
or strict-false? but for historical
reasons we call it not). So
we have to add code to handle unions and intersections and see if a variable
that satisfies one predicate can satisfy a different predicate. Whith this
change we have reductions like:
(lambda (f)
(boolean? (if (f)
(begin (display 1) #t)
(begin (display 2) #f)))
==>
(lambda (f)
(if (f)
(begin (display 1) #t)
(begin (display 2) #f))
#t)
(boolean? (if (f)
(begin (display 1) #t)
(begin (display 2) #f)))
==>
(lambda (f)
(if (f)
(begin (display 1) #t)
(begin (display 2) #f))
#t)
Another example that seems to be most
interesting is
(lambda (x)
(when (boolean? x)
(if x (box x) (vector x))))
==>
(lambda (x)
(when (boolean? x)
(If x (box #t) (vector #f))))
(when (boolean? x)
(if x (box x) (vector x))))
==>
(lambda (x)
(when (boolean? x)
(If x (box #t) (vector #f))))
Predicates are Boolean
We can also mark the result of many common
predicates as a Boolean, then we have
(lambda (v)
(let ([x (vector? v)])
(if x (box x) (vector x))))
==>
(lambda (v)
(let ([x (vector? v)])
(if x (box #t) (vector #f))))
(let ([x (vector? v)])
(if x (box x) (vector x))))
==>
(lambda (v)
(let ([x (vector? v)])
(if x (box #t) (vector #f))))
Optimization for or
The macro or
is a very usual and interesting case. The expansion (without optimization) is
(lambda (x)
(or (symbol? x) 7)
==>
(lambda (x)
(let ([temp (symbol? x)])
(if temp temp 7)))
(or (symbol? x) 7)
==>
(lambda (x)
(let ([temp (symbol? x)])
(if temp temp 7)))
With the new enhancements, the optimizer
realizes that temp is
Boolean and optimizes
==>
(lambda (x)
(let ([temp (symbol? x)])
(if temp #t 7)))
(lambda (x)
(let ([temp (symbol? x)])
(if temp #t 7)))
It would be good to move the expression that
defines temp to the
place of use and get a result like
==>
(lambda (x)
(if (symbol? x) #t 7))
(lambda (x)
(if (symbol? x) #t 7))
but the optimizer only moves expressions that
define variables used once, and the optimizer does not realize that after the
replacement of temp by #t, the variable temp
appears only once.
Luckily, there was an ad hoc optimization for
expressions like the produced by or,
but it was applied only when they were in a Boolean context, such as (if (or ...) ... ...) or (not (or ...)). With some changes we can
extend to (or <boolean?> ...).
Conclusion
With these changes, we would have a new primitive in Racket. It is not very useful for use directly, but it helps implement some optimizations for usual idioms, particularly for expressions that combine or and an expression with a Boolean result.
You can look at the details and the complete code in github.