13 de enero de 2014

Most used functions in Racket bytecode

Introduction

The idea is to count how many times each function appears in the bytecode of compiled code of Racket (version 5.3.6).
When compiling a Racket program, several steps are performed automatically. First, all macros are expanded, then the code is compiled to a specific bytecode that is later optimized and the result is saved in a .zo file. The optimization includes constant propagation, functions inlining (including functions defined in other modules) and more. So between the original program and the compiled bytecode version the difference is very large. Functions can be primitive (defined in C) or be defined in other modules.
The functions can be assigned to variables or passed as parameters to other functions. We will count only how many times each function appears in the direct application position . For example, in (zero? (+ 1 2)) we will count zero? and +, but in (cons' sum +) we will only have cons, but not +.
Furthermore, we will ignore the application of functions that are arguments to other functions, for example in (define (apply-to-random f) (f (random))) will ignore f. Although it is likely that this code is inlined by the optimizer and if the original code is (apply-to-random add1) will get (add1 (random)) and in this case we will count add1.
This is a static count of how many times each function, the number of calls to each function when executed may be very different appears.

Results

In the directory Racket (version 5.3.6) there are 4575 .zo files and 3145659 function calls. Each file has an average of 688 function calls.
Calls
 
Unique Functions Calls per Function Calls per File
Prim 2042289 65% 1033 4% 1977 446
NoPrim 1103370 35% 23992 96% 46 241
Total 3145659 25025 126 688
Although only 4% of the found functions are primitive, but 65% of the calls to refers to these functions. This is because each primitive function appears 1977 times, while each non-primitive function appears in only 46 calls. All of this is not very surprising because the optimizer inlines many non-primitive functions.
Counting the functions we obtained the following results. The non-primitive functions are in italics.
Pos Count Name Prim
1 309802 datum->syntax #t
2 305884 list #t
3 118896 cons #t
4 96201 unsafe-cdr #t
5 91229 null? #t
6 81452 car #t
7 77059 eq? #t
8 71870 unsafe-car #t
9 62725 _*keep-s-expr:p@(lib "scribble/private/manual-scheme.rkt") #f
10 57761 _to-element15.25:p@(lib "scribble/racket.rkt") #f
11 45041 vector #t
12 42930 syntax-e #t
13 42924 values #t
14 42063 pair? #t
15 41244 _to-paragraph31.31:p@(lib "scribble/racket.rkt") #f
16 38956 cdr #t
17 36267 _check-pre-part:P@(lib "scribble/doclang.rkt") #f
18 30222 + #t
19 29284 _find-method/who:p@(lib "racket/private/class-internal.rkt") #f
20 26117 _*Value:f@(lib "typed-racket/rep/type-rep.rkt") #f
21 25417 _make-var-id:mk@(lib "scribble/racket.rkt") #f
22 22954 _flat-named-contract:p@(lib "racket/contract/private/misc.rkt") #f
23 19851 vector-ref #t
24 19697 srcloc #t
25 18502 list? #t
The most popular functions are datum->syntax and list. They appear almost three times more than the next functions. The following functions also handle lists. I think this is because the code of Racket many functions that are the definition macros. When you compile something simple like
{define-syntax-rule (repeat n
                       body ...)
  (for ([i (in-range n)])
     body ...)}
we get an expanded version that uses primitive functions like datum->syntax and much list management code. I guess this in because we are analyzing the code that defines Racket from a more simplified version of Racket and therefore uses many macros. Also, in the code there are also several aditonal languages built on top of Racket, adding more macros. I guess the normal code has usually less macros.
To see the counting of all the function, we can order them from the most common to the least common and plot the number of occurrences of each one. To see the general behavior clearly, we use logarithmic scales on both axes. On this scale the straight lines represent functions of the type xk.

We see that the only big jump is between the first two functions and the rest, the x3 corresponds to half scale mark. The primitive and non-primitive functions are fairly interleaved, although there are many non-primitive that appear only once.
We can divide it into two groups. The amount of occurrences of the first 100 follow a formula of the form x-0.997. Instead the others follow a formula of the form x-1.53. The coefficients are about -1.0 and -1.5. There seems to be a change in behavior at that point, but I don't know the reason. It may be only numerology and a direct application of the Mar's law: Everything is linear if plotted log-log with a fat magic marker. My corollary is:. It's easier with two markers.

Modifying decompile

The bytecode can translated to an almost human readable format using decopile. It's not really a direct translation, because some of the bytecode instructions do not have a direct representation in Racket. Do not take it literally, but it is generally understandable and it gives a good idea of what is is optimized and what remained.
This is much similar to what we want to do, so we just take the source code of decompile, change all decompile to explore and start making changes. For example:
  • No need to see what's in the provide and require, because they only serve to connect the functions of a module with functions in another module, not running code.
  • No need to see what's inside of the syntax, because they have data that is used to generate new code in macros.
  • Some functions have two versions, one for running and one for inlining. We ignore the inlining version.
  • We ignore direct values as numbers, #t and #f, strings, quote.
  • We don't need to return any results, so we remove the lists representing the decompiled code.
The following is one of the main functions. The arguments are:
  • expr is the expression to be analyzed.
  • glob have some global definitions, for example the variables (and function) and globally defined and defined in other modules.
  • stack has local variables, roughly corresponds to what is usually saved on the stack
  • closed is has the closed functions that were analyzed analized
The idea is simply to look at every possible structure and recursively analyze the parts that interest us.
{define (explore-expr expr globs stack closed); short version, with some omissions.
  (match expr
    [(struct assign (id rhs undef-ok?))
     (explore-expr rhs globs stack closed)]
    [(? lam?)
     (explore-lam expr globs stack closed)]
    [(struct case-lam (name lams))
     (for ([lam (in-list lams)])
       (explore-lam lam globs stack closed))]
    [(struct let-one (rhs body type unused?))
     (let ([id (or (extract-id rhs) (gensym (or type (if unused? 'unused 'local))))])
       (explore-expr rhs globs (cons id stack) closed)
       (explore-expr body globs (cons id stack) closed))]
    [(struct let-void (count boxes? body))
     (let ([ids (extract-ids body count)])
       (let ([vars (for/list ([id (in-list ids)])
                     (or id (gensym (if boxes? 'localvb 'localv))))])
         (explore-expr body globs (append vars stack) closed)))]
    [(struct let-rec (procs body))
     (begin
       (for ([proc (in-list procs)])
         (explore-expr proc globs stack closed))
       (explore-expr body globs stack closed))]
    [(struct install-value (count pos boxes? rhs body))
     (begin
       (explore-expr rhs globs stack closed)
       (explore-expr body globs stack closed))]
    [(struct boxenv (pos body))
     (explore-expr body globs stack closed)]
    [(struct branch (test then else))
     (begin
       (explore-expr test globs stack closed)
       (explore-expr then globs stack closed)
       (explore-expr else globs stack closed))]
    [(struct application (rator rands))
     (let ([vars (for/list ([i (in-list rands)]) (gensym 'rand))])
       (explore-var rator globs (append vars stack) closed)
       (explore-expr rator globs (append vars stack) closed)
       (for ([rand (in-list rands)])
         (explore-expr rand globs (append vars stack) closed)))]
    [(struct apply-values (proc args-expr))
     (begin
       (explore-var proc globs stack closed)
       (explore-expr proc globs stack closed)
       (explore-expr args-expr globs stack closed))]
    [(struct seq (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct beg0 (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct closure (lam gen-id))
     (unless (hash-ref closed gen-id #f)
       (hash-set! closed gen-id #t)
       (explore-expr lam globs stack closed))]
    [else (void)])}


We will recursively travel across the structure that represents the bytecode, reaching the applications of functions (the normal version and the apply-values version). In these two cases we call the new explore-var function that computes the function name and sends it to the main develop using a function stored in a parameter.
The parameters used to define almost-global variables. In general it is better to return the results as the result of the function. In this case I thought it was better to return the result using that parameter because the values were obtained very deeply  in recursive function calls and did not want to change to much the parameters of these functions. Another advantage is that you can show the variables as they found, instead of waiting until the end.
Also, you can change the function stored in the parameter, without having to modify the code of explore. The parameter code is much simpler and does not depend on the internal details of the bytecode, making it easy to separate them and only focus on the data visualization part.
(define current-explore-found (make-parameter void))

{define (explore-var expr globs stack closed)
  (let ([v (match expr
             [(struct toplevel (depth pos const? ready?))
              (list-ref/protect globs pos 'toplevel)]
             [(struct primval (id))
              (hash-ref primitive-table id {lambda () (error "unknown primitive")})]
             [(struct localref (unbox? offset clear? other-clears? type))
              (list-ref/protect stack offset 'localref)]
             [else #f])])
    ((current-explore-found) v))}


Looking for the .zo

The main program calls these functions to analyze each file zo..
First we define some auxiliary functions to find the Racket directory and filter the .zo files..
#lang racket
(require compiler/decompile
         compiler/zo-parse)
(require racket/match)
(require "explore.rkt")

{define (get-racket-directory)
  (let-values ([(dir file must-be-dir?)
                (split-path (find-system-path 'exec-file))])
    dir)}

{define (zo-file? f)
  (and (file-exists? f)
       (filename-extension f)
       (bytes=? (filename-extension f) #"zo"))}

We define a hash to store the results and a function that saves  there the symbols that appear, keeping only the interned to filter the arguments and local variables.
(define found (make-hasheq))

{define (count-founded v)
  (when (and v (symbol-interned? v))
    (hash-set! found v (add1 (hash-ref found v {lambda () #;(displayln v) 0})))
    #;(displayln (list v (hash-ref found v))))}

Now we combine them all, we get the files and we analyze them.
(parameterize ([current-explore-found count-founded])
  (for ([file (sequence-filter zo-file? (in-directory (get-racket-directory)))])
    #;(newline)
    (displayln file)
    {define program-zo (let ([port (open-input-file file)])
                        (begin0
                          (zo-parse port)
                          (close-input-port port)))}
    (explore program-zo)
    #;(pretty-display (decompile program-zo))
    #;(pretty-display program-zo)))

Finally, it sorts the data in the hash and displays it.
(newline)
(newline)
(for ([name/count (in-list (sort (hash->list found) > #:key cdr))])
  (display (cdr name/count)) (display " ") (displayln (car name/count)))
  

Ideas

  • I would like to analyze combinations of functions. For example, counting the expressions of the form (f (g ?)) ignoring the value of ?. Also combinations of special forms and functions. For example, in the if's
  • I deleted a lot of code of decompile, mostly to concentrate on understanding the part that I wanted to understand. It would be nice to have a version that keeps a bigger part of the initial information, so when we get to an interesting point we can see the decompiled version. (I should modify the decompile module to export the auxiliary functions and think what to do with closed functions (which are not closures).

Complete code

Complete code of the explore.rkt module that defines explore. It's based in the decompile module.
#lang racket/base
(require compiler/zo-parse
         syntax/modcollapse
         racket/port
         racket/match
         racket/list
         racket/path)

(provide explore
         current-explore-found)

;; ----------------------------------------
(define current-explore-found (make-parameter void))
;; ----------------------------------------

{define primitive-table
  ;; Figure out number-to-id mapping for kernel functions in `primitive'
  (let ([bindings
         (let ([ns (make-base-empty-namespace)])
           (parameterize ([current-namespace ns])
             (namespace-require ''#%kernel)
             (namespace-require ''#%unsafe)
             (namespace-require ''#%flfxnum)
             (namespace-require ''#%extfl)
             (namespace-require ''#%futures)
             (for/list ([l (namespace-mapped-symbols)])
               (cons l (with-handlers ([exn:fail? {lambda (x) #f}])
                         (compile l))))))]
        [table (make-hash)])
    (for ([b (in-list bindings)])
      (let ([v (and (cdr b)
                    (zo-parse
                     (open-input-bytes
                      (with-output-to-bytes
                          {lambda () (write (cdr b))}))))])
        (let ([n (match v
                   [(struct compilation-top (_ prefix (struct primval (n)))) n]
                   [else #f])])
          (hash-set! table n (car b)))))
    table)}

{define (list-ref/protect l pos who)
  (list-ref l pos)
  #;
  (if (pos . < . (length l))
      (list-ref l pos)
      `(OUT-OF-BOUNDS ,who ,pos ,(length l) ,l))}

;; ----------------------------------------

;; Main entry:
{define (explore top)
  (let ([stx-ht (make-hasheq)])
    (match top
      [(struct compilation-top (max-let-depth prefix form))
       (let ([globs (explore-prefix prefix stx-ht)])
         (explore-form form globs '(#%globals) (make-hasheq) stx-ht))]
      [else (error 'explore "unrecognized: ~e" top)]))}

{define (explore-prefix a-prefix stx-ht)
  (match a-prefix
    [(struct prefix (num-lifts toplevels stxs))
     (let ([lift-ids (for/list ([i (in-range num-lifts)])
                       (gensym 'lift))]
           [stx-ids (map (lambda (i) (gensym 'stx))
                         stxs)])
       (append
        (map {lambda (tl)
               (match tl
                 [#f '#%linkage]
                 [(? symbol?) (string->symbol (format "_~a" tl))]
                 [(struct global-bucket (name))
                  (string->symbol (format "_~a" name))]
                 [(struct module-variable (modidx sym pos phase constantness))
                  (if (and (module-path-index? modidx)
                           (let-values ([(n b) (module-path-index-split modidx)])
                             (and (not n) (not b))))
                      (string->symbol (format "_~a" sym))
                      (string->symbol (format "_~s~a@~s~a"
                                              sym
                                              (match constantness
                                                ['constant ":c"]
                                                ['fixed ":f"]
                                                [(function-shape a pm?)
                                                 (if pm? ":P" ":p")]
                                                [(struct-type-shape c) ":t"]
                                                [(constructor-shape a) ":mk"]
                                                [(predicate-shape) ":?"]
                                                [(accessor-shape c) ":ref"]
                                                [(mutator-shape c) ":set!"]
                                                [else ""])
                                              (mpi->string modidx)
                                              (if (zero? phase)
                                                  ""
                                                  (format "/~a" phase)))))]
                 [else (error 'explore-prefix "bad toplevel: ~e" tl)])}
             toplevels)
        stx-ids
        (if (null? stx-ids) null '(#%stx-array))
        lift-ids))]
    [else (error 'explore-prefix "huh?: ~e" a-prefix)])}



{define (mpi->string modidx)
  (cond
   [(symbol? modidx) modidx]
   [else
    (collapse-module-path-index modidx (build-path
                                        (or (current-load-relative-directory)
                                            (current-directory))
                                        "here.rkt"))])}

{define (explore-module mod-form orig-stack stx-ht mod-name)
  (match mod-form
    [(struct mod (name srcname self-modidx prefix provides requires body syntax-bodies unexported
                       max-let-depth dummy lang-info internal-context flags pre-submodules post-submodules))
     (let ([globs (explore-prefix prefix stx-ht)]
           [stack (append '(#%modvars) orig-stack)]
           [closed (make-hasheq)])
       `(,mod-name ,(if (symbol? name) name (last name)) ....
           ,@(if (null? flags) '() (list `(quote ,flags)))
           ,@(let ([l (apply
                       append
                       (for/list ([req (in-list requires)]
                                  #:when (pair? (cdr req)))
                         (define l (for/list ([mpi (in-list (cdr req))])
                                     (define p (mpi->string mpi))
                                     (if (path? p)
                                         (let ([d (current-load-relative-directory)])
                                           (path->string (if d
                                                             (find-relative-path (simplify-path d #t)
                                                                                 (simplify-path p #f)
                                                                                 #:more-than-root? #t)
                                                             p)))
                                         p)))
                         (if (eq? 0 (car req))
                             l
                             `((,@(case (car req)
                                    [(#f) `(for-label)]
                                    [(1) `(for-syntax)]
                                    [else `(for-meta ,(car req))])
                                ,@l)))))])
               (if (null? l)
                   null
                   `((require ,@l))))
          ,@(for/list ([submod (in-list pre-submodules)])
              (explore-module submod orig-stack stx-ht 'module))
          ,@(for/list ([b (in-list syntax-bodies)])
              (let loop ([n (sub1 (car b))])
                (if (zero? n)
                    (cons 'begin
                          (for/list ([form (in-list (cdr b))])
                            (explore-form form globs stack closed stx-ht)))
                    (list 'begin-for-syntax (loop (sub1 n))))))
          ,@(map {lambda (form)
                   (explore-form form globs stack closed stx-ht)}
                 body)
          ,@(for/list ([submod (in-list post-submodules)])
              (explore-module submod orig-stack stx-ht 'module*))))]
    [else (error 'explore-module "huh?: ~e" mod-form)])}

{define (explore-form form globs stack closed stx-ht)
  (match form
    [(? mod?)
     (explore-module form stack stx-ht 'module)]
    [(struct def-values (ids rhs))
     (if (inline-variant? rhs)
         (explore-expr (inline-variant-direct rhs) globs stack closed)
         (explore-expr rhs globs stack closed))]
    [(struct def-syntaxes (ids rhs prefix max-let-depth dummy))
     (let ([globs (explore-prefix prefix stx-ht)])
       (explore-form rhs globs '(#%globals) closed stx-ht))]
    [(struct seq-for-syntax (exprs prefix max-let-depth dummy))
     (let ([globs (explore-prefix prefix stx-ht)])
       (for/list ([rhs (in-list exprs)])
         (explore-form rhs globs '(#%globals) closed stx-ht)))]
    [(struct seq (forms))
     (map {lambda (form)
            (explore-form form globs stack closed stx-ht)}
          forms)]
    [(struct splice (forms))
     (map {lambda (form)
            (explore-form form globs stack closed stx-ht)}
          forms)]
    [(struct req (reqs dummy))
     (void)]
    [else
     (explore-expr form globs stack closed)])}

{define (extract-name name)
  (if (symbol? name)
      (gensym name)
      (if (vector? name)
          (gensym (vector-ref name 0))
          #f))}

{define (extract-id expr)
  (match expr
    [(struct lam (name flags num-params arg-types rest? closure-map closure-types tl-map max-let-depth body))
     (extract-name name)]
    [(struct case-lam (name lams))
     (extract-name name)]
    [(struct closure (lam gen-id))
     (extract-id lam)]
    [else #f])}

{define (extract-ids body count)
  (define (extract-ids! body ids)
    (match body
      [(struct let-rec (procs body))
       (for ([proc (in-list procs)]
             [delta (in-naturals)])
         (when (< -1 delta (vector-length ids))
           (vector-set! ids delta (extract-id proc))))
       (extract-ids! body ids)]
      [(struct install-value (val-count pos boxes? rhs body))
       (extract-ids! body ids)]
      [(struct boxenv (pos body))
       (extract-ids! body ids)]
      [else #f]))
  (define ids (make-vector count #f))
  (extract-ids! body ids)
  (vector->list ids)}

{define (explore-var expr globs stack closed)
  (let ([v (match expr
             [(struct toplevel (depth pos const? ready?))
              (list-ref/protect globs pos 'toplevel)]
             [(struct primval (id))
              (hash-ref primitive-table id {lambda () (error "unknown primitive")})]
             [(struct localref (unbox? offset clear? other-clears? type))
              (list-ref/protect stack offset 'localref)]
             [else #f])])
    ((current-explore-found) v))}

{define (explore-expr expr globs stack closed)
  (match expr
    [(struct assign (id rhs undef-ok?))
     (explore-expr rhs globs stack closed)]
    [(? lam?)
     (explore-lam expr globs stack closed)]
    [(struct case-lam (name lams))
     (for ([lam (in-list lams)])
       (explore-lam lam globs stack closed))]
    [(struct let-one (rhs body type unused?))
     (let ([id (or (extract-id rhs) (gensym (or type (if unused? 'unused 'local))))])
       (explore-expr rhs globs (cons id stack) closed)
       (explore-expr body globs (cons id stack) closed))]
    [(struct let-void (count boxes? body))
     (let ([ids (extract-ids body count)])
       (let ([vars (for/list ([id (in-list ids)])
                     (or id (gensym (if boxes? 'localvb 'localv))))])
         (explore-expr body globs (append vars stack) closed)))]
    [(struct let-rec (procs body))
     (begin
       (for ([proc (in-list procs)])
         (explore-expr proc globs stack closed))
       (explore-expr body globs stack closed))]
    [(struct install-value (count pos boxes? rhs body))
     (begin
       (explore-expr rhs globs stack closed)
       (explore-expr body globs stack closed))]
    [(struct boxenv (pos body))
     (explore-expr body globs stack closed)]
    [(struct branch (test then else))
     (begin
       (explore-expr test globs stack closed)
       (explore-expr then globs stack closed)
       (explore-expr else globs stack closed))]
    [(struct application (rator rands))
     (let ([vars (for/list ([i (in-list rands)]) (gensym 'rand))])
       (explore-var rator globs (append vars stack) closed)
       (explore-expr rator globs (append vars stack) closed)
       (for ([rand (in-list rands)])
         (explore-expr rand globs (append vars stack) closed)))]
    [(struct apply-values (proc args-expr))
     (begin
       (explore-var proc globs stack closed)
       (explore-expr proc globs stack closed)
       (explore-expr args-expr globs stack closed))]
    [(struct seq (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct beg0 (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct with-cont-mark (key val body))
      (begin
        (explore-expr key globs stack closed)
        (explore-expr val globs stack closed)
        (explore-expr body globs stack closed))]
    [(struct closure (lam gen-id))
     (unless (hash-ref closed gen-id #f)
       (hash-set! closed gen-id #t)
       (explore-expr lam globs stack closed))]
    [else (void)])}

{define (explore-lam expr globs stack closed)
  (match expr
    [(struct closure (lam gen-id)) (explore-lam lam globs stack closed)]
    [(struct lam (name flags num-params arg-types rest? closure-map closure-types tl-map max-let-depth body))
     (let ([vars (for/list ([i (in-range num-params)]
                            [type (in-list arg-types)])
                   (gensym (format "~a~a-"
                                   (case type
                                     [(ref) "argbox"]
                                     [(val) "arg"]
                                     [else (format "arg~a" type)])
                                   i)))]
           [rest-vars (if rest? (list (gensym 'rest)) null)]
           [captures (map {lambda (v)
                            (list-ref/protect stack v 'lam)}
                          (vector->list closure-map))])
       (explore-expr body globs (append captures vars rest-vars) closed))])}

Funciones más usadas en el bytecode de Racket

Introducción

La idea es contar cuántas veces aparece cada función en el bytecode de el código compilado Racket (versión 5.3.6) .
Al compilar un programa de Racket, se realizan automáticamente varios pasos. Primero se expanden todas las macros, después se compila a un bytecode específico que se optimiza y el resultado se guarda en un archivo .zo. La optimización incluye propagación de constantes, enlineado de funciones (incluso funciones definidas en otros módulos) y muchas cosas más. Así que entre el programa original y la versión compilada a bytecode la diferencia es muy grande. Las funciones pueden ser primitivas (definidas en C) ó estar definidas en otros módulos.
Las funciones se pueden asignar a variables o pasar como parámetros a otras funciones. Vamos a contar solamente cuántas veces aparece cada función en la posición de aplicación directa. Por ejemplo en (zero? (+ 1 2)) vamos a contar a zero? y a +, pero en (cons 'sum +) sólo vamos a contar cons, pero no +.
Además vamos a ignorar las aplicaciones de funciones que son argumentos de otras funciones, por ejemplo en (define (apply-to-random f) (f (random))) vamos a ignorar f. Aunque es probable que este código sea enlineado por el optimizador y si calculamos (apply-to-random add1) va a quedar (add1 (random)) y en ese caso vamos a contar add1.
Este es un recuento estático de cuántas veces aparece cada función, la cantidad de llamadas a cada función al ejecutarse puede ser muy distinta.

Resultados

En el directorio de Racket (versión 5.3.6) hay 4575 archivos .zo y 3145659 llamadas a funciones. Cada archivo tiene en promedio a 688 llamadas a funciones.
Llamadas
 
Funciones Distintas Llamadas por Función Llamadas por Archivo
Prim 2042289 65% 1033 4% 1977 446
NoPrim 1103370 35% 23992 96% 46 241
Total 3145659 25025 126 688
Aunque sólo un 4% de las funciones encontradas son primitivas, el 65% de las llamadas son a estas funciones. Esto se explica porque cada función primitiva aparece en promedio en 1977, mientras que cada función no primitiva aparece en promedio en sólo 46 llamadas. Todo esto no es muy sorprendente porque el optimizador enlinea muchas de las funciones no primitivas.
Al contar las funciones obtuvimos los siguientes resultados. Las funciones no-primitivas están en itálica.
Pos Cantidad   Nombre Prim
1 309802   datum->syntax #t
2 305884   list #t
3 118896   cons #t
4 96201   unsafe-cdr #t
5 91229   null? #t
6 81452   car #t
7 77059   eq? #t
8 71870   unsafe-car #t
9 62725   _*keep-s-expr:p@(lib "scribble/private/manual-scheme.rkt") #f
10 57761   _to-element15.25:p@(lib "scribble/racket.rkt") #f
11 45041   vector #t
12 42930   syntax-e #t
13 42924   values #t
14 42063   pair? #t
15 41244   _to-paragraph31.31:p@(lib "scribble/racket.rkt") #f
16 38956   cdr #t
17 36267   _check-pre-part:P@(lib "scribble/doclang.rkt") #f
18 30222   + #t
19 29284   _find-method/who:p@(lib "racket/private/class-internal.rkt") #f
20 26117   _*Value:f@(lib "typed-racket/rep/type-rep.rkt") #f
21 25417   _make-var-id:mk@(lib "scribble/racket.rkt") #f
22 22954 _flat-named-contract:p@(lib "racket/contract/private/misc.rkt") #f
23 19851 vector-ref #t
24 19697 srcloc #t
25 18502 list? #t
Las más populares son datum->syntax y list, aparecen casi tres veces mas que la siguiente. Siguen más funciones que manejan listas. Creo que esto se debe a que en el código de Racket muchas funciones son macros. Al compilar algo sencillo como
{define-syntax-rule (repeat n 
                       body ...) 
  (for ([i (in-range n)])
     body ...)}
se obtiene una versión expandida que usa funciones primitivas como datum->syntax y mucho manejo de la expresión como lista. Supongo que esto de debe a que estamos analizando el código que define a Racket a partir de una versión más simplificada de Racket y por ello usa muchas macros. En el código también hay varios lenguajes adicionales construidos a partir de Racket, lo que agrega más macros. Supongo que el código normal tiene usualmente menos macros.
Para ver el recuento de todas las funciones, podemos ordenarlas desde la más frecuente a la menos frecuente y graficar la cantidad de veces que aparece cada una. Para poder ver el comportamiento general con claridad, usamos ejes logarítmicos en ambas escalas. En esta escala las líneas rectas representan funciones del tipo xk.
Vemos que el único salto grande es entre las primeras dos funciones y el resto, el x3 corresponde a media graduación de la escala. Las funciones primitivas y las no primitivas están bastante intercaladas, aunque hay muchas no primitivas que aparecen una sola vez.
Podemos dividirlo en dos grupos. La cantidad de apariciones de las primera 100 sigue una formula de la forma x-0,997. En cambio las otras siguen una formula de la forma x-1,53. Los coeficientes son aproximadamente -1,0 y -1,5. Parece haber algún cambio en el comportamiento en ese punto, pero no se a que se debe. Es posible que sea sólo numerología y una aplicación directa de la ley de Mar: Todo es lineal si se grafica en coordenadas log-log con un marcador bien grueso. Mi corolario es: Es más fácil con dos marcadores.

Modificando decompile

El bytecode se puede ver en un formato casi leíble por humanos usando decopile. En realidad no es una traducción directa, porque algunas de las instrucciones del bytecode no tienen una representación directa en Racket. No hay que tomarlo como algo literal, pero en general es entendible y da una buena idea de qué es lo que se optimizó y qué es lo que quedó.
Esto es muy parecido a lo que queremos hacer, así que simplemente tomamos el código fuente de decompile, cambiamos todos los decompile por explore y empezamos a hacer cambios. Por ejemplo:
  • No necesitamos ver que hay en los require y provide, porque sólo sirven para conectar las funciones de un módulo con las de otro módulo, no ejecutan código.
  • No necesitamos ver que hay adentro de las syntax, porque tienen sólo datos que se usa para generar nuevo código en las macros.
  • Algunas funciones tienen dos versiones, una para ejecutar y otra para enlinear. Ignoramos la versión para enlinear.
  • Ignoramos los valores directos como números, #f y #t, cadenas de texto, quote.
  • No hace falta devolver ningún resultado, así que eliminamos las listas que representan el código decompilado.
La siguiente función es una de las principales. Los argumentos son:
  • expr es la expresión a analizar.
  • glob tiene alguna definiciones globales, por ejemplo las variables (y funciones) definidas globalmente y en otros módulos.
  • stack tiene las variables locales, corresponde más o menos a lo que se guarda usualmente en el stack
  • closed tiene las funciones cerradas que ya fueron analizadas
La idea es simplemente mirar cada estructura posible y analizar recursivamente las partes que nos interesan.
{define (explore-expr expr globs stack closed); versión corta, con algunas omisiones. 
  (match expr
    [(struct assign (id rhs undef-ok?))
     (explore-expr rhs globs stack closed)]
    [(? lam?)
     (explore-lam expr globs stack closed)]
    [(struct case-lam (name lams))
     (for ([lam (in-list lams)])
       (explore-lam lam globs stack closed))]
    [(struct let-one (rhs body type unused?))
     (let ([id (or (extract-id rhs) (gensym (or type (if unused? 'unused 'local))))])
       (explore-expr rhs globs (cons id stack) closed)
       (explore-expr body globs (cons id stack) closed))]
    [(struct let-void (count boxes? body))
     (let ([ids (extract-ids body count)])
       (let ([vars (for/list ([id (in-list ids)])
                     (or id (gensym (if boxes? 'localvb 'localv))))])
         (explore-expr body globs (append vars stack) closed)))]
    [(struct let-rec (procs body))
     (begin
       (for ([proc (in-list procs)])
         (explore-expr proc globs stack closed))
       (explore-expr body globs stack closed))]
    [(struct install-value (count pos boxes? rhs body))
     (begin
       (explore-expr rhs globs stack closed)
       (explore-expr body globs stack closed))]
    [(struct boxenv (pos body))
     (explore-expr body globs stack closed)]
    [(struct branch (test then else))
     (begin
       (explore-expr test globs stack closed)
       (explore-expr then globs stack closed)
       (explore-expr else globs stack closed))]
    [(struct application (rator rands))
     (let ([vars (for/list ([i (in-list rands)]) (gensym 'rand))])
       (explore-var rator globs (append vars stack) closed)
       (explore-expr rator globs (append vars stack) closed)
       (for ([rand (in-list rands)])
         (explore-expr rand globs (append vars stack) closed)))]
    [(struct apply-values (proc args-expr))
     (begin
       (explore-var proc globs stack closed)
       (explore-expr proc globs stack closed) 
       (explore-expr args-expr globs stack closed))]
    [(struct seq (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct beg0 (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct closure (lam gen-id))
     (unless (hash-ref closed gen-id #f)
       (hash-set! closed gen-id #t)
       (explore-expr lam globs stack closed))]
    [else (void)])}
Vamos así recorriendo recursivamente toda la estructura que representa el bytecode, hasta llegar a las aplicaciones de las funciones (la versión normal y la de apply-values). En esos dos casos llamamos a una nueva función explore-var que calcula el nombre de la función y lo manda al programo principal usando una función guardada en un parámetro.
Los parámetros sirven para definir variables casi globales. En general es mejor devolver los resultados como el resultado de la función. En este caso me pareció mejor devolver el resultado por un parámetro porque los valores se obtenían en una función muy metida adentro de llamadas recursivas y no quería cambiar mucho los parámetros de estas funciones. Otra ventaja es que así se puede ir visualizando las variables a medida que se encuentran en vez de esperar hasta el final.
También así se puede cambiar la función que está en el parámetro, sin tener que modificar el código de explore. El código del parámetro es mucho más sencillo y no depende de los detalles internos del bytecode, por lo que al separarlos es más fácil concentrarse en sólo la visualización de los datos. 
(define current-explore-found (make-parameter void))

{define (explore-var expr globs stack closed)
  (let ([v (match expr
             [(struct toplevel (depth pos const? ready?))
              (list-ref/protect globs pos 'toplevel)]
             [(struct primval (id))
              (hash-ref primitive-table id {lambda () (error "unknown primitive")})]
             [(struct localref (unbox? offset clear? other-clears? type))
              (list-ref/protect stack offset 'localref)]
             [else #f])])
    ((current-explore-found) v))}

Buscando los archivos .zo

El programa principal llama a estas funciones para que analicen cada archivo .zo.
Primero definimos funciones auxiliares para encontrar el directorio de Racket y filtrar los archivos .zo.
#lang racket
(require compiler/decompile
         compiler/zo-parse)
(require racket/match)
(require "explore.rkt")
 
{define (get-racket-directory)
  (let-values ([(dir file must-be-dir?) 
                (split-path (find-system-path 'exec-file))])
    dir)}

{define (zo-file? f)
  (and (file-exists? f)
       (filename-extension f)
       (bytes=? (filename-extension f) #"zo"))}
Definimos un hash para guardar los resultados y una función que los guarda allí los símbolos que van apareciendo, dejando sólo los interned para filtrar los argumentos y variables locales.

(define found (make-hasheq))

{define (count-founded v)
  (when (and v (symbol-interned? v))
    (hash-set! found v (add1 (hash-ref found v {lambda () #;(displayln v) 0})))
    #;(displayln (list v (hash-ref found v))))}

 

Ahora combinamos todo, buscamos cada archivo lo analizamos.
(parameterize ([current-explore-found count-founded])
  (for ([file (sequence-filter zo-file? (in-directory (get-racket-directory)))])
    #;(newline)
    (displayln file)
    {define program-zo (let ([port (open-input-file file)])
                        (begin0
                          (zo-parse port)
                          (close-input-port port)))}
    (explore program-zo)
    #;(pretty-display (decompile program-zo))
    #;(pretty-display program-zo)))

Al final ordena las los datos del hash y los muestra.
(newline)
(newline)
(for ([name/count (in-list (sort (hash->list found) > #:key cdr))])
  (display (cdr name/count)) (display " ") (displayln (car name/count)))
  

Ideas

  • Me gustaría analizar combinaciones de funciones. Por ejemplo, contar las expresiones de la forma (f (g ?)) ignorando el valor de ?. También combinaciones de formas especiales y funciones. Por ejemplo, en los if.
  • Borré mucho del código de decompile, más que nada para poder concentrarme en entender la parte que me interesaba. Estaría bueno tener una versión que mantenga más de la información inicial, así al llegar a algún punto interesante poder ver la versión decompilada. (Debería modificar el módulo de decompile para que exporte las funciones auxiliares y ver bien que hacer con las funciones cerradas (que no son closures).

Código completo

Código completo del módulo explore.rkt que define explore. Es un poco largo. Está basado en el módulo de decompile.
#lang racket/base
(require compiler/zo-parse
         syntax/modcollapse
         racket/port
         racket/match
         racket/list
         racket/path)

(provide explore 
         current-explore-found)

;; ----------------------------------------
(define current-explore-found (make-parameter void))
;; ----------------------------------------

{define primitive-table
  ;; Figure out number-to-id mapping for kernel functions in `primitive'
  (let ([bindings
         (let ([ns (make-base-empty-namespace)])
           (parameterize ([current-namespace ns])
             (namespace-require ''#%kernel)
             (namespace-require ''#%unsafe)
             (namespace-require ''#%flfxnum)
             (namespace-require ''#%extfl)
             (namespace-require ''#%futures)
             (for/list ([l (namespace-mapped-symbols)])
               (cons l (with-handlers ([exn:fail? {lambda (x) #f}])
                         (compile l))))))]
        [table (make-hash)])
    (for ([b (in-list bindings)])
      (let ([v (and (cdr b)
                    (zo-parse 
                     (open-input-bytes
                      (with-output-to-bytes
                          {lambda () (write (cdr b))}))))])
        (let ([n (match v
                   [(struct compilation-top (_ prefix (struct primval (n)))) n]
                   [else #f])])
          (hash-set! table n (car b)))))
    table)}

{define (list-ref/protect l pos who)
  (list-ref l pos)
  #;
  (if (pos . < . (length l))
      (list-ref l pos)
      `(OUT-OF-BOUNDS ,who ,pos ,(length l) ,l))}

;; ----------------------------------------

;; Main entry:
{define (explore top)
  (let ([stx-ht (make-hasheq)])
    (match top
      [(struct compilation-top (max-let-depth prefix form))
       (let ([globs (explore-prefix prefix stx-ht)])
         (explore-form form globs '(#%globals) (make-hasheq) stx-ht))]
      [else (error 'explore "unrecognized: ~e" top)]))}

{define (explore-prefix a-prefix stx-ht)
  (match a-prefix
    [(struct prefix (num-lifts toplevels stxs))
     (let ([lift-ids (for/list ([i (in-range num-lifts)])
                       (gensym 'lift))]
           [stx-ids (map (lambda (i) (gensym 'stx)) 
                         stxs)])
       (append 
        (map {lambda (tl)
               (match tl
                 [#f '#%linkage]
                 [(? symbol?) (string->symbol (format "_~a" tl))]
                 [(struct global-bucket (name)) 
                  (string->symbol (format "_~a" name))]
                 [(struct module-variable (modidx sym pos phase constantness))
                  (if (and (module-path-index? modidx)
                           (let-values ([(n b) (module-path-index-split modidx)])
                             (and (not n) (not b))))
                      (string->symbol (format "_~a" sym))
                      (string->symbol (format "_~s~a@~s~a" 
                                              sym 
                                              (match constantness
                                                ['constant ":c"]
                                                ['fixed ":f"]
                                                [(function-shape a pm?) 
                                                 (if pm? ":P" ":p")]
                                                [(struct-type-shape c) ":t"]
                                                [(constructor-shape a) ":mk"]
                                                [(predicate-shape) ":?"]
                                                [(accessor-shape c) ":ref"]
                                                [(mutator-shape c) ":set!"]
                                                [else ""])
                                              (mpi->string modidx) 
                                              (if (zero? phase)
                                                  ""
                                                  (format "/~a" phase)))))]
                 [else (error 'explore-prefix "bad toplevel: ~e" tl)])}
             toplevels)
        stx-ids
        (if (null? stx-ids) null '(#%stx-array))
        lift-ids))]
    [else (error 'explore-prefix "huh?: ~e" a-prefix)])}



{define (mpi->string modidx)
  (cond
   [(symbol? modidx) modidx]
   [else 
    (collapse-module-path-index modidx (build-path
                                        (or (current-load-relative-directory)
                                            (current-directory))
                                        "here.rkt"))])}

{define (explore-module mod-form orig-stack stx-ht mod-name)
  (match mod-form
    [(struct mod (name srcname self-modidx prefix provides requires body syntax-bodies unexported 
                       max-let-depth dummy lang-info internal-context flags pre-submodules post-submodules))
     (let ([globs (explore-prefix prefix stx-ht)]
           [stack (append '(#%modvars) orig-stack)]
           [closed (make-hasheq)])
       `(,mod-name ,(if (symbol? name) name (last name)) ....
           ,@(if (null? flags) '() (list `(quote ,flags)))
           ,@(let ([l (apply
                       append
                       (for/list ([req (in-list requires)]
                                  #:when (pair? (cdr req)))
                         (define l (for/list ([mpi (in-list (cdr req))])
                                     (define p (mpi->string mpi))
                                     (if (path? p)
                                         (let ([d (current-load-relative-directory)])
                                           (path->string (if d
                                                             (find-relative-path (simplify-path d #t)
                                                                                 (simplify-path p #f) 
                                                                                 #:more-than-root? #t)
                                                             p)))
                                         p)))
                         (if (eq? 0 (car req))
                             l
                             `((,@(case (car req)
                                    [(#f) `(for-label)]
                                    [(1) `(for-syntax)]
                                    [else `(for-meta ,(car req))])
                                ,@l)))))])
               (if (null? l)
                   null
                   `((require ,@l))))
          ,@(for/list ([submod (in-list pre-submodules)])
              (explore-module submod orig-stack stx-ht 'module))
          ,@(for/list ([b (in-list syntax-bodies)])
              (let loop ([n (sub1 (car b))])
                (if (zero? n)
                    (cons 'begin
                          (for/list ([form (in-list (cdr b))])
                            (explore-form form globs stack closed stx-ht)))
                    (list 'begin-for-syntax (loop (sub1 n))))))
          ,@(map {lambda (form)
                   (explore-form form globs stack closed stx-ht)}
                 body)
          ,@(for/list ([submod (in-list post-submodules)])
              (explore-module submod orig-stack stx-ht 'module*))))]
    [else (error 'explore-module "huh?: ~e" mod-form)])}

{define (explore-form form globs stack closed stx-ht)
  (match form
    [(? mod?)
     (explore-module form stack stx-ht 'module)]
    [(struct def-values (ids rhs))
     (if (inline-variant? rhs)
         (explore-expr (inline-variant-direct rhs) globs stack closed)
         (explore-expr rhs globs stack closed))]
    [(struct def-syntaxes (ids rhs prefix max-let-depth dummy))
     (let ([globs (explore-prefix prefix stx-ht)])
       (explore-form rhs globs '(#%globals) closed stx-ht))]
    [(struct seq-for-syntax (exprs prefix max-let-depth dummy))
     (let ([globs (explore-prefix prefix stx-ht)])
       (for/list ([rhs (in-list exprs)])
         (explore-form rhs globs '(#%globals) closed stx-ht)))]
    [(struct seq (forms))
     (map {lambda (form)
            (explore-form form globs stack closed stx-ht)}
          forms)]
    [(struct splice (forms))
     (map {lambda (form)
            (explore-form form globs stack closed stx-ht)}
          forms)]
    [(struct req (reqs dummy))
     (void)]
    [else
     (explore-expr form globs stack closed)])}

{define (extract-name name)
  (if (symbol? name)
      (gensym name)
      (if (vector? name)
          (gensym (vector-ref name 0))
          #f))}

{define (extract-id expr)
  (match expr
    [(struct lam (name flags num-params arg-types rest? closure-map closure-types tl-map max-let-depth body))
     (extract-name name)]
    [(struct case-lam (name lams))
     (extract-name name)]
    [(struct closure (lam gen-id))
     (extract-id lam)]
    [else #f])}

{define (extract-ids body count)
  (define (extract-ids! body ids)
    (match body
      [(struct let-rec (procs body))
       (for ([proc (in-list procs)]
             [delta (in-naturals)])
         (when (< -1 delta (vector-length ids))
           (vector-set! ids delta (extract-id proc))))
       (extract-ids! body ids)]
      [(struct install-value (val-count pos boxes? rhs body))
       (extract-ids! body ids)]
      [(struct boxenv (pos body))
       (extract-ids! body ids)]
      [else #f]))
  (define ids (make-vector count #f))
  (extract-ids! body ids)
  (vector->list ids)}

{define (explore-var expr globs stack closed)
  (let ([v (match expr
             [(struct toplevel (depth pos const? ready?))
              (list-ref/protect globs pos 'toplevel)]
             [(struct primval (id))
              (hash-ref primitive-table id {lambda () (error "unknown primitive")})]
             [(struct localref (unbox? offset clear? other-clears? type))
              (list-ref/protect stack offset 'localref)]
             [else #f])])
    ((current-explore-found) v))}
      
{define (explore-expr expr globs stack closed)
  (match expr
    [(struct assign (id rhs undef-ok?))
     (explore-expr rhs globs stack closed)]
    [(? lam?)
     (explore-lam expr globs stack closed)]
    [(struct case-lam (name lams))
     (for ([lam (in-list lams)])
       (explore-lam lam globs stack closed))]
    [(struct let-one (rhs body type unused?))
     (let ([id (or (extract-id rhs) (gensym (or type (if unused? 'unused 'local))))])
       (explore-expr rhs globs (cons id stack) closed)
       (explore-expr body globs (cons id stack) closed))]
    [(struct let-void (count boxes? body))
     (let ([ids (extract-ids body count)])
       (let ([vars (for/list ([id (in-list ids)])
                     (or id (gensym (if boxes? 'localvb 'localv))))])
         (explore-expr body globs (append vars stack) closed)))]
    [(struct let-rec (procs body))
     (begin
       (for ([proc (in-list procs)])
         (explore-expr proc globs stack closed))
       (explore-expr body globs stack closed))]
    [(struct install-value (count pos boxes? rhs body))
     (begin
       (explore-expr rhs globs stack closed)
       (explore-expr body globs stack closed))]
    [(struct boxenv (pos body))
     (explore-expr body globs stack closed)]
    [(struct branch (test then else))
     (begin
       (explore-expr test globs stack closed)
       (explore-expr then globs stack closed)
       (explore-expr else globs stack closed))]
    [(struct application (rator rands))
     (let ([vars (for/list ([i (in-list rands)]) (gensym 'rand))])
       (explore-var rator globs (append vars stack) closed)
       (explore-expr rator globs (append vars stack) closed)
       (for ([rand (in-list rands)])
         (explore-expr rand globs (append vars stack) closed)))]
    [(struct apply-values (proc args-expr))
     (begin
       (explore-var proc globs stack closed)
       (explore-expr proc globs stack closed) 
       (explore-expr args-expr globs stack closed))]
    [(struct seq (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct beg0 (exprs))
     (for ([expr (in-list exprs)])
       (explore-expr expr globs stack closed))]
    [(struct with-cont-mark (key val body))
      (begin
        (explore-expr key globs stack closed)
        (explore-expr val globs stack closed)
        (explore-expr body globs stack closed))]
    [(struct closure (lam gen-id))
     (unless (hash-ref closed gen-id #f)
       (hash-set! closed gen-id #t)
       (explore-expr lam globs stack closed))]
    [else (void)])}

{define (explore-lam expr globs stack closed)
  (match expr
    [(struct closure (lam gen-id)) (explore-lam lam globs stack closed)]
    [(struct lam (name flags num-params arg-types rest? closure-map closure-types tl-map max-let-depth body))
     (let ([vars (for/list ([i (in-range num-params)]
                            [type (in-list arg-types)])
                   (gensym (format "~a~a-" 
                                   (case type 
                                     [(ref) "argbox"] 
                                     [(val) "arg"]
                                     [else (format "arg~a" type)])
                                   i)))]
           [rest-vars (if rest? (list (gensym 'rest)) null)]
           [captures (map {lambda (v)
                            (list-ref/protect stack v 'lam)}
                          (vector->list closure-map))])
       (explore-expr body globs (append captures vars rest-vars) closed))])}