Introduction
In the
previous article we discussed what kind of things appeared in the tests parts of the
if in
Racket bytecode. This code is generated by several steps of macro expansions and several
steps of optimization, so some unexpected thing may appear.
In particular, it was strange that there were things like
(if #t ? ?),
(if # f ? ?) and
(if (not ?) ? ?). Not only because they look easy to optimize, but the optimizer explicitly
takes them into account and tries to eliminate them.
The optimizer is important because it allows you to write clear and
nice
code, and the code is run efficiently without you worrying about inlining the
obvious optimizations (and the not so obvious one). (Thus 99% of the time you
can work in "cute code" mode, and use a 1% of ugly code just for critical parts that really need speed.)
Overall the optimizer works very well and looking at the optimized code
it's difficult to spot unoptimized parts. Moreover, many times a small test program is
optimized to just (print #t) or
an equivalent code.
Looking for an example
The first step is to find the spot from where these strange constructions
come. After adding a
displayln here
and there, the
program of the previous article shows the weird things and in
which file they were originally. One of the first cases is in the file
mred\private\check.rkt .
Using
decompile we can see that the problem is in the function
check-non-negative-integer/false,
which is defined (without expansions or optimizations) as:
#lang racket
(define (-check-non-negative-integer who i false-ok?)
(when (or i (not false-ok?))
(unless (and (integer? i) (exact? i) (not (negative? i)))
(raise-argument-error [...]))))
(define (check-non-negative-integer who i)
(-check-non-negative-integer who i #f))
(define (check-non-negative-integer/false who i)
(-check-non-negative-integer who i #t))
After several simplifications and expansions by hand, the
example is much clearer.
#lang racket
{define (check my-false)
(if (if (random 1) #t my-false) 2 3)}
(check #f)
After optimizing it, in the bytecode we get (decompiled and cleaned version to maintain clarity)
#lang racket
{define check
(#%closed check
{lambda (my-false)
(if (if (random 1) #t my-false) 2 3)})}
(print-values (if (random 1) (if #t 2 3) 3))
The example is still a little tangled, but unfortunately I could not
simplify it more :(.
The example uses an auxiliary function, the
#f has to be an argument, but
every time I tried to shorten the example, the problem just disappears. In the
decompiled code, we see that there is a
#t as a
condition of an
if, and it is
not optimized. If we replace the bold
#t by
#f or
(not (random 4))
we find the same kind of problem.
(On the other hand,
(random 1)
is always
0, but it
has side effects (it changes the internal state of the random number generator)
so it is difficult to optimize. There are some tricks to make the
optimizer unhappy and be able to get optimized code that is not a constant. I prefer to
use
(random 7)
because using different values of
7 I can tell where
each part did came from. In general, the test in the Racket codebase use
(read) instead, that
also has side effects and is difficult to optimize.)
The origin of the problem
One of the most curious part is that the
(if (if ? #t ?) ? ?)
in the initial code becomes
(if ? (if #t ? ?) ?).
Well, there is a part of the
optimizer that transforms
(if (if M N #f) M2 K)
into
(if M (if N M2 K) K),
when
K is a simple
value that can be duplicated without problems. All things inside
(if (if M N #f) M2 K)
are already optimized. In particular, if
M was originally
of the form
#t,
#f or
(not ?) then in
the central
if the following
optimizations should have been already applied.
(if #t X Y) --> X
(if #f X Y) --> Y
(if (not P) X Y) --> (if P Y X)
Moreover, in the same expression
N and
#f are optimized
for a Boolean context, because they only will be used to decide what the outside
if does. In this context there are some additional optimizations, but not all
the optimizations that are applied directly to the test part of an
if construction
.
The new expression
(if N M2 K) also has
all its parts separately optimized .
Moreover, as we already said, the
N is optimized in a Boolean context. The problem is that
the expression is never optimized together, so if
N is of the form
#t,
#f or
(not ?),
the whole expression
(if N M2 K) is left unoptimized. This may
be the cause of the
(if #t ? ?),
(if #f ? ?) and
(if (not ?) ? ?)
we found.
Old and new optimizations
During the optimization, the
if are subject to many optimizations
steps. The details are
quite long. This is a list of the optimizations that are in Racket and some
additional optimization
that I tested in my repository.
a: [boolean] (if M #t #f) --> M
b: [boolean] (if v v X) --> (if v #t X)
c: [new] (if v X v) --> (if v X #f)
d: (if (not P) X Y) --> (if P Y X)
d: (if #t X Y) --> X
e: (if #f X Y) --> Y
f: (if v v #f) --> v
g: (if <omitable> v v) --> v
h: (if (if M N #f) P K) => (if M (if N P K) K)
i: [new] (if M (if (not Q) P K) K) => (if M (if Q K P) K)
j: [new] (if M (if #t P K) K) => (if M P K)
k: [new] (if M (if #f P K) K) => (if M K K)
The list has many simplifications and I hope I have not forgotten any
of them.
The fist optimizations
are applied only to expressions in a Boolean context and are marked with [boolean] .
I tried some additional optimizations in my repository. The new optimizations I tested are marked with
[new].
The
v indicate local variables and the
K constants that can be duplicated. The
"d" and "j"
optimizations are slightly more general, because they are applied also when the
#t
is replaced by any
value other than
#f. The new optimizations
"i", "j" and "k" are applied only to the
results of "h", not to all the expressions. (I could have added a copy of "f" and
"g" to
further optimize the results of "h", but there were too many things getting copied and I was
getting bored.)
Another idea was to change the order of "a", "b" and "c", so we
first add the
#t and
#f and then the
"a" optimization can be applied in more cases to reduce the amount of
if in the final
code .
c: [new] (if v X v) --> (if v X #f)
b: [boolean] (if v v X) --> (if v #t X)
a: [boolean] (if M #t #f) --> M
For these optimizations, the relevant part of the code after
the reorder is:
/* Convert (if <id> expr <id>) to (if <id> expr #f) */
if (SAME_TYPE(SCHEME_TYPE(t), scheme_local_type)
&& SAME_TYPE(SCHEME_TYPE(fb), scheme_local_type)
&& (SCHEME_LOCAL_POS(t) == SCHEME_LOCAL_POS(fb))) {
b->fbranch = fb = scheme_false;
}
if (context & OPT_CONTEXT_BOOLEAN) {
/* Convert (if <id> <id> expr) to (if <id> #t expr) */
if (SAME_TYPE(SCHEME_TYPE(t), scheme_local_type)
&& SAME_TYPE(SCHEME_TYPE(tb), scheme_local_type)
&& (SCHEME_LOCAL_POS(t) == SCHEME_LOCAL_POS(tb))) {
b->tbranch = tb = scheme_true;
}
/* For test position, convert (if <expr> #t #f) to <expr> */
if (SAME_OBJ(tb, scheme_true) && SAME_OBJ(fb, scheme_false))
return scheme_optimize_expr(t, info, context);
}
Versions to compile and test
The optimizer code is in C. It takes a long time to compile, so you have to be patient and try not to make
mistakes, because an error usually means a few extra hours recompiling.
To measure the differences we will count how many times some combinations
of
if appear in
the bytecode of the version 6.0.0.1 of Racket (git head, in January of this year, the
results may vary depending on the exact version, hopefully only a little) .
In general, to analyze the values that appear in the
if
constructions, they were classified in
#f,
#t,
v (local variable) ,
#<void> ,
"",
#"",
K (constants: numbers,
not empty strings and bytes)
Results of a-b-c
First, let's look at the results after changes only in the optimizations labeled
"a-b-c". We tried
four versions :
- just "a"
- with "a-b", this is code that uses the unmodified Racket
- with "b-a", use the reverse order
- "c-b-a", use the reverse order and add "c"
Examples |
just "a" |
"a-b" |
"b-a" |
"c-b-a" |
(if v ? ?) |
146592 |
146592 |
146592 |
146594 |
(if (let (v ?) ?) ? ?) |
10158 |
10158 |
10158 |
10158 |
(if #t ? ?) |
13 |
13 |
13 |
13 |
(if #f ? ?) |
8 |
8 |
8 |
8 |
(if (if ? ? #f) ? ?) |
24260 |
24261 |
24259 |
24275 |
(if (if ? ? #t) ? ?) |
11343 |
11343 |
11343 |
11343 |
(if (if ? v #f) ? ?) |
313 |
313 |
313 |
313 |
(if (if ? #t #f) ? ?) |
159 |
159 |
159 |
159 |
(if (if ? #f #t) ? ?) |
27 |
27 |
27 |
27 |
(if (null? ?) ? ?) |
110065 |
110068 |
110068 |
110062 |
(if (not ?) ? ?) |
1937 |
1937 |
1937 |
1937 |
(if ? ?) |
305625 |
305637 |
305633 |
305649 |
(if ? #f) |
114308 |
114299 |
114299 |
114556 |
(if ? v) |
75536 |
75537 |
75537 |
75268 |
(if v ?) |
63714 |
63533 |
63534 |
63534 |
(if #t ?) |
19502 |
19669 |
19672 |
19672 |
(if ? #t) |
12607 |
12607 |
12607 |
12607 |
(if #f ?) |
11389 |
11389 |
11389 |
11389 |
(if v #f) |
7494 |
7499 |
7499 |
7499 |
(if #t #f) |
1598 |
1604 |
1601 |
1601 |
(if #f #t) |
223 |
223 |
223 |
223 |
- The optimization "b" transforms 170 appearances
(if v v ?)
into (if v # t ?).
- Similarly, the optimization "c" transforms about 260 appearances
(if v ? v)
into (if v ? #f).
- I expected that swapping the optimizations "a" and
"b" they could compose, so that there is a transformation like
(if v v
#f) into (if v #t #f)
into v, but there are not too many of these cases.
- (If we eliminate the optimization "a", there appear approximately 5000
"new" instances of (if ? #t #f),
it's a very bad idea.)
Results of h-i-j-k
Now, let's test other three versions, with changes only in the optimizations
marked as "h-i-j-k".
- without "h"
- with "h" , this is the code that uses unmodified Racket
- with "h-i-j-k ", i.e. optimizing the result of "h"
Example |
no "h" |
only "h" |
"h-i-j-k" |
(if v ? ?) |
146030 |
146594 |
146618 |
(if (let (v ?) ?) ? ?) |
9981 |
10158 |
10166 |
(if #t ? ?) |
0 |
13 |
0 |
(if #f ? ?) |
0 |
8 |
0 |
(if (if ? ? #f) ? ?) |
30214 |
24260 |
24262 |
(if (if ? ? #t) ? ?) |
10965 |
11343 |
11343 |
(if (if ? v #f) ? ?) |
470 |
313 |
315 |
(if (if ? #t #f) ? ?) |
221 |
159 |
159 |
(if (if ? #f #t) ? ?) |
23 |
27 |
27 |
(if (null? ?) ? ?) |
110004 |
110065 |
110071 |
(if (not ?) ? ?) |
1526 |
1937 |
1820 |
(if ? ?) |
305830 |
305634 |
305643 |
(if ? #f) |
119928 |
114301 |
114295 |
(if ? v) |
72762 |
75537 |
75523 |
(if v ?) |
62971 |
63535 |
63537 |
(if #t ?) |
19654 |
19672 |
19675 |
(if ? #t) |
12483 |
12607 |
12602 |
(if #f ?) |
11380 |
11389 |
11391 |
(if v #f) |
7668 |
7499 |
7499 |
(if #t #f) |
1664 |
1601 |
1596 |
(if #f #t) |
227 |
223 |
219 |
- With regard to
(if #t ?)
and (if #f ?),
comparing different versions, we see that the optimization "h" is causing
them. The if
with #t
and #f
don't appear when "h" is not present and they disappear when the result of "h" is
further optimized.
- However, there are several instances of
(if (not ?) ? ?)
in all the versions. Moreover, after optimizing the result of "h" not all the
new instances disappear. It'd like investigate their origin and see how
they can be eliminated, because they appear more than 1500 times.
Ideas and remarks
- It would have been better to use the optimizer internal function
scheme_optimize_expr or
optimize_branch to optimize the second
if in the
result of the optimization "h". It would be more elegant because we wouldn't need to
repeat the code and this alternative method would apply all the other optimizations,
although it would be slightly slower. The problem is that to call these
functions we must use a Optimize_Info structure containing information about the
state of the program being optimized . I tried several times, but I could not
write it correctly.
- There are many
(if ? #f #t), maybe they can be replaced
by (not ?). I did some microbenchmarks
to measure the speed difference, but I saw almost no change. However, the
modification might allow the optimizer to apply other optimizations. Something similar happens
with (if ? #t #f),
but I don't know what would be the name of the associated function.
- There are 31 occurrences of
(if (if ? #<void> ?) ?)
and 11 occurrences of (if (if ? '() #f) ?). It's possible to add a optimization
for them, but they appear very seldom so probably it's not very useful.
- In each recompilation, the number of times that each
thing appears changes a
little. It's very strange. It doesn't affect the results too much, but we must be careful when
comparing very close values.
- There is exactly one
(if (if ? 1 0) ? ?). I did not seek the code that
produces it, but at first glance it looks like a
(if (bool->binary ?) ? ?), if
bool->binary
were a real function. Is this a bug or just a strange coincidence produced by all
the intermediate expansions?