Introduction
In the previous article we counted how many times each primitive function appears in the Racket bytecode . This time, the idea is to analyze the functions that appear as conditions in the if. Specifically, we will look what functions that are replacing what in expressions like (if (what ?) ? ?). It is important to remember that we are analyzing the bytecode, so we analyze the result of multiple macro expansions steps and multiple optimization steps, so the values we get are very different from those obtained by analyzing the source code. We will apply this analysis to all the .zo files in the Racket directory (version 5.3.6).At the same time, we will also analyze if there are other strange things are in the condition of the if.
Changes in Code
We modify slightly the code we used in the last article. The main changes are in the explore-exp function:{define (explore-expr expr globs stack closed)
(match expr
[...]
[(struct branch (test then else))
(begin
(match test
[(struct application (rator rands))
((current-explore-found) (list 'app (or (decompilex-var rator globs stack closed) '#%app)))]
[(struct apply-values (proc args-expr))
((current-explore-found) (list 'app (or (decompilex-var proc globs stack closed) '#%app)))]
[(struct branch (test then else))
(let ([then (filter-simple-values then)]
[else (filter-simple-values else)])
((current-explore-found) (list 'if then else)))]
[(? struct?)
((current-explore-found) (list 'expression (vector-ref (struct->vector test) 0)))]
[else ((current-explore-found) (list 'other test))])
(explore-expr test globs stack closed)
(explore-expr then globs stack closed)
(explore-expr else globs stack closed))]
[...])}
If there is an struct of the type
application or
apply-values inside of
if, it corresponds to a function call, so we store
it using the function that is in the
(current-explore-found) parameter (we should have considered
also
call-with-values, but they appear
very seldom).If inside the if there is a branch, then it means that there is another if, we store what is in then and else, using the filter-simple-values function that allows us us to keep the simple values that appear there and replace the other things '?.
{define (filter-simple-values x)
(if (or (member x (list #t #f (void) null)) (number? x)) x '?)}
If there is any other struct inside the if, then the condition is more complex, for example there is a let or begin.
Finally, if there is something else, it has to be a simple value , such as #t, #f a fixnum , ..., also the write them down.
We need to store all the founded items in a single hash, so we wrap them in a list in which the first element indicates where we found it. These list can't be compared with eq?, we need to compare them with equal?, so we change the main program to keep everything in a hash instead of hasheq.
Results
Functions that appear more often in the place of what in expressions of the form (if (what ?) ? ?).Pos. | Count | Name | |
---|---|---|---|
1 | 88255 | null? | |
2 | 57209 | eq? | |
3 | 38819 | pair? | |
4 | 17083 | list? | |
5 | 14264 | _stx-pair?:P@(lib "racket/private/stx.rkt") | |
6 | 12256 | = | |
7 | 10080 | < | |
8 | 7672 | syntax? | |
9 | 6353 | unsafe-fx< | |
10 | 5441 | _identifier?:P@(lib "racket/private/stx.rkt") | |
11 | 4932 | equal? | |
12 | 4486 | zero? | |
13 | 3978 | procedure? | |
14 | 3506 | _stx-list?:p@(lib "racket/private/stx.rkt") | |
15 | 3027 | <= | |
16 | 2954 | _stx-null/#f:P@(lib "racket/private/stx.rkt") | |
17 | 2947 | string? | |
18 | 2712 | >= | |
19 | 2371 | _spec?:?@(lib "scribble/private/manual-class.rkt") | |
20 | 2371 | _impl?:?@(lib "scribble/private/manual-class.rkt") | |
21 | 2326 | _cpointer?@(quote #%foreign) | |
22 | 2169 | > | |
23 | 2056 | free-identifier=? | |
24 | 2050 | symbol? | |
25 | 1836 | eof-object? |
Almost everything listed here are Boolean predicates, which by convention end in ?, there are also <, = and >, that are also Boolean predicates, but for historical reasons have not the ? . Nothing too surprising. The only exception in this list is stx-null, that is " almost" Boolean sometimes returns null and sometimes it returns #f. The next most common non Boolean functions are:
Pos. | Count | Name | |
---|---|---|---|
30 | 1436 | unbox | |
32 | 1396 | _alt-memq@(lib "racket/private/list.rkt") | |
33 | 1384 | hash-ref | |
40 | 847 | regexp-match | |
43 | 744 | _alt-member@(lib "racket/private/list.rkt") |
Regarding unbox and hash-ref, I hope someone had stored a Boolean value in a box or in a hash. Something similar happens with regexp-match . Regarding alt-memq and alt-member, they are "almost" Boolean , if the result is negative they return #f and if the result is positive they return a value that may be useful.
Counting what are the other things that may appear inside the if we have this table (with an example of how would the decompiled version be, where v is a local value).
Pos. | Count | Name | Example | |
---|---|---|---|---|
1 | 113550 | struct:localref | (if v ...) | |
2 | 41860 | struct:branch | (if (if ...) ...) | |
3 | 7700 | struct:let-one | (if (let ...) ...) | |
4 | 7243 | #%localref | (if (v ...) ...) | |
5 | 1287 | #%app | (if ((...) ...) ...) |
Original:
(let ([x (random 1)]
[y (random 2)])
(if (or x y) 3 4))
After expansion (cleaned version to improve clarity, for example the two let are actually let-values)
(let ([x (random 1)]
[y (random 2)])
(if (let ([or-part x])
(if or-part or-part y))
3 4))
After optimization (cleaned version to improve clarity, for example the names of the variables are lost)
(let ([x (random 1)])
(let ([y (random 2)])
(if (if x #t y) 3 4)))
To see the general behavior, we plot the number of times each type of thing appears in the test position of if. The Boolean predicates are marked with a cross and the other things with a blue circle. Most are Boolean predicates, but between the first items (v ...) and (if ...) appear interleaved with null?, eq? and pair?. The distribution can be approximated by y = 3.02E5 x-1.60.
Weird results
Reviewing the other things that may appear in the if we found :Pos. | Count | Name | Example | |
---|---|---|---|---|
35 | 1011 | not | (if (not ?) ? ?) | |
565 | 13 | #t | (if #t ? ?) | |
926 | 7 | #f | (if #f ? ?) | |
1476 | 3 | void | (if (void ?) ? ?) | |
1920 | 2 | set-box! | (if (set-box! ? ?) ? ?) |
For example, void and set-box! appear only 3 and 2 times (the position is not very accurate, because there are many draws). It could be possible to optimize the code, but I don't think it's worthwhile because they appear very seldom.
The other cases are stranger, because the optimizer specifically optimized such code... The answer will appear in the next article.