5 de febrero de 2014

If what??? in the Racket's bytecode


In the previous article we counted how many times each primitive function appears in the Racket bytecode . This time, the idea is to analyze the functions that appear as conditions in the if. Specifically, we will look what functions that are replacing what in expressions like (if (what ?) ? ?).  It is important to remember that we are analyzing the bytecode, so we analyze the result of multiple macro expansions steps and multiple optimization steps, so the values we get are very different from those obtained by analyzing the source code. We will apply this analysis to all the .zo files in the Racket directory (version 5.3.6).

At the same time, we will also analyze if there are other strange things are in the condition of the if.

Changes in Code 

We modify slightly the code we used in the last article. The main changes are in the explore-exp function:

{define (explore-expr expr globs stack closed)
  (match expr
    [(struct branch (test then else))
       (match test
         [(struct application (rator rands)) 
           ((current-explore-found) (list 'app (or (decompilex-var rator globs stack closed) '#%app)))]
         [(struct apply-values (proc args-expr))
          ((current-explore-found) (list 'app (or (decompilex-var proc globs stack closed) '#%app)))]
         [(struct branch (test then else))
          (let ([then (filter-simple-values then)]
                [else (filter-simple-values else)])
          ((current-explore-found) (list 'if then else)))]
         [(? struct?) 
          ((current-explore-found) (list 'expression (vector-ref (struct->vector test) 0)))]
         [else ((current-explore-found) (list 'other test))])
       (explore-expr test globs stack closed)
       (explore-expr then globs stack closed)
       (explore-expr else globs stack closed))]

If there is an struct of the type application or apply-values inside of if, it corresponds to a function call, so we store it using the function that is in the (current-explore-found) parameter (we should have considered also call-with-values, but they appear very seldom).

If inside the if there is a branch, then it means that there is another if, we store what is in then and else, using the filter-simple-values ​​function that allows us us to keep the simple values ​​that appear there and replace the other things '?.

{define (filter-simple-values x)
  (if (or (member x (list #t #f (void) null)) (number? x)) x '?)}

If there is any other struct inside the if, then the condition is more complex, for example there is a let or begin.

Finally, if there is something else, it has to be a simple value , such as #t, #f a fixnum , ..., also the write them down.
We need to store all the founded items in a single hash, so we wrap them in a list in which the first element indicates where we found it. These list can't be compared with eq?, we need to compare them with equal?, so we change the main program to keep everything in a hash instead of hasheq.


Functions that appear more often in the place of what in expressions of the form (if (what ?) ? ?).

Pos. Count Name
1 88255 null?
2 57209 eq?
3 38819 pair?
4 17083 list?
5 14264 _stx-pair?:P@(lib "racket/private/stx.rkt")
6 12256 =
7 10080 <
8 7672 syntax?
9 6353 unsafe-fx<
10 5441 _identifier?:P@(lib "racket/private/stx.rkt")
11 4932 equal?
12 4486 zero?
13 3978 procedure?
14 3506 _stx-list?:p@(lib "racket/private/stx.rkt")
15 3027 <=
16 2954 _stx-null/#f:P@(lib "racket/private/stx.rkt")
17 2947 string?
18 2712 >=
19 2371 _spec?:?@(lib "scribble/private/manual-class.rkt")
20 2371 _impl?:?@(lib "scribble/private/manual-class.rkt")
21 2326 _cpointer?@(quote #%foreign)
22 2169 >
23 2056 free-identifier=?
24 2050 symbol?
25 1836 eof-object?

Almost everything listed here are Boolean predicates, which by convention end in ?, there are also <, = and >, that are also Boolean predicates, but for historical reasons have not the ? . Nothing too surprising. The only exception in this list is stx-null, that is " almost" Boolean sometimes returns null and sometimes it returns #f. The next most common non Boolean functions are:

Pos. Count Name
30 1436 unbox
32 1396 _alt-memq@(lib "racket/private/list.rkt")
33 1384 hash-ref
40 847 regexp-match
43 744 _alt-member@(lib "racket/private/list.rkt")

Regarding unbox and hash-ref, I hope someone had stored a Boolean value in a box or in a hash. Something similar happens with regexp-match . Regarding alt-memq and alt-member, they are "almost" Boolean , if the result is negative they return #f and if the result is positive they return a value that may be useful.

Counting what are the other things that may appear inside the if we have this table (with an example of how would the decompiled version be, where v is a local value).

Pos. Count Name Example
1 113550 struct:localref (if v ...)
2 41860 struct:branch (if (if ...) ...)
3 7700 struct:let-one (if (let ...) ...)
4 7243 #%localref (if (v ...) ...)
5 1287 #%app (if ((...) ...) ...)
Sometimes this kind of complex conditions appear in the original code, but sometimes they appear in the expansion and optimization steps. For example something as innocent as an or is transformed into:

(let ([x (random 1)]
      [y (random 2)])
  (if (or x y) 3 4))

After expansion (cleaned version to improve clarity, for example the two let are actually let-values)
(let ([x (random 1)]
      [y (random 2)])
  (if (let ([or-part x])
        (if or-part or-part y))
    3 4))

After optimization (cleaned version to improve clarity, for example the names of the variables are lost)
(let ([x (random 1)])
  (let ([y (random 2)])
    (if (if x #t y) 3 4)))

To see the general behavior, we plot the number of times each type of thing appears in the test position of if. The Boolean predicates are marked with a cross and the other things with a blue circle. Most are Boolean predicates, but between the first items (v ...) and  (if ...) appear interleaved with null?, eq? and pair?. The distribution can be approximated by y = 3.02E5 x-1.60.

Weird results

Reviewing the other things that may appear in the if we found :

Pos. Count Name Example
35 1011 not (if (not ?) ? ?)
565 13 #t (if #t ? ?)
926 7 #f (if #f ? ?)
1476 3 void (if (void ?) ? ?)
1920 2 set-box! (if (set-box! ? ?) ? ?)

For example, void and set-box! appear only 3 and 2 times (the position is not very accurate, because there are many draws). It could be possible to optimize the code, but I don't think it's worthwhile because they appear very seldom.

The other cases are stranger, because the optimizer specifically optimized such code... The answer will appear in  the next article.