Restore form performance #89

countvajhula · 2023-01-12T22:03:35Z

Summary of Changes

Restore performance of Qi forms to pre-stratification levels.

restore performance of forms
add some missing tests
incorporate regression checking into the form performance report script qi-sdk/profile/report.rkt instead of having it be a separate script
introduce the notion of the "loophole in qi space," i.e. ways to add definitions in qi space that aren't shadowable by Racket bindings, and which could, as a result, be used to define optimized implementations for built-in Qi macros (i.e. the "extended language") without these macros needing to expand to the core language nor be part of the core language.

See Return to baseline performance for more information.

Public Domain Dedication

In contributing, I relinquish any copyright claims on my contribution and freely release it into the public domain in the simple hope that it will provide value.

(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)

For unadorned identifiers to be treated as function identifiers, ensure that qi functions take precedence over racket functions. This allows us to define functions that may be treated as part of the language (and not be shadowed by calling-scope identifiers) without actually being syntactically part of the language as core forms or macros.

countvajhula · 2023-01-26T21:18:30Z

Notes from today's meeting: A Loophole in Qi Space. Esp @benknoble you may be interested in this "loophole" that Michael came up with as an alternative to some of the options we've talked about before ("restorative/peephole optimizations" vs inflating the core language).

benknoble · 2023-02-27T15:50:07Z

I attempted to do some benchmarking of none?'s implementation, and found that (a) for arities of less than 100k, the timing of any implementation I used was sub-millisecond; (b) in most cases, using for/and over not was faster than composing not with the for/or version of any?; and (c) the overhead of threads was too great to allow "racing" both versions to get the best of both worlds.

I tried with different distributions of the value #f and random numbers in [0,1) (called the "frequency") and with different orders of thread creation. I was actually quite surprised that for/or didn't win out significantly for lower frequencies of #f and for/and in higher frequencies. The for/or does clearly do far better than for/and at frequency 0, but only "dramatically" so for n = 100k. At higher orders of magnitude, they appear maybe roughly the same?

The bars show the average of 5 runs, with error bars the size of the standard deviation.

Note that the y-axis is on a logarithmic (base-10) transform; otherwise, you wouldn't be able to see all sizes on the same graph.

countvajhula · 2023-02-28T21:59:37Z

👁️ 🍭 That's great data! We can aim to discuss it at this week's Qi meetup (happening on Thursday just FYI). Those charts seriously look amazing, I'm curious how you generated those. It could be useful to add the recipe to the Developer's Guide.

benknoble · 2023-03-02T00:03:35Z

Hand-typed from the original; forgive typos 😅

#lang racket

(require plot file/convertible math/statistics pict)

(define (write-pict-to-svg p f)
  (with-output-to-file f
    (thunk
      (write-bytes (convert p 'svg-bytes)))))

(define (not-o-any . args)
  (not (for/or ([arg (in-list args)])
         arg)))

(define (for/and-over-not . args)
  (for/and ([arg (in-list args)])
    (not arg)))

(define (race-threads1 . args)
  (define me (current-thread))
  (define t1 (thread (thunk (thread-send me (apply not-o-any args)))))
  (define t2 (thread (thunk (thread-send me (apply for/and-over-not args)))))
  (begin0 (thread-receive)
    (kill-thread t1)
    (kill-thread t2)))

(define (race-threads2 . args)
  (define me (current-thread))
  (define t2 (thread (thunk (thread-send me (apply for/and-over-not args)))))
  (define t1 (thread (thunk (thread-send me (apply not-o-any args)))))
  (begin0 (thread-receive)
    (kill-thread t1)
    (kill-thread t2)))

(define (generate n freq)
  (build-list n (λ (_i)
                  (define n (random))
                  (and (>= n freq) n))))

(define ns (list #e1e5 #e1e6 #e1e7))
(define freqs (list 0.0 0.3 0.7 1.0))
(define fs (list not-o-any for/and-over-not race-threads1 race-threads2))

(struct experiment [f n freq stats] #:transparent)
(struct stats [min mean max stddev] #:transparent)

(define experiments
  (map (match-lambda [(list n freq f) (experiment f n freq #f)])
       (cartesian-product ns freqs fs)))

(define n-trials 5)

(define/match (run-experiment _e)
  [((experiment f n freq #f))
   (define times
     (for/list ([_i n-trials])
       (define-values (_res _cpu real _garbage) (time-apply f (generate n freq)))
       real))
   (experiment f n freq (make-stats times))])

(define (make-stats times)
  (define μ (mean times))
  (define σ (stddev/mean μ times))
  (stats (apply min times) μ (apply max times) σ))

(define (experiments->renderers es skip x-min)
  (define (x i) (+ 0.5 x-min (* i skip)))
  (define label (object-name (experiment-f (first es))))
  (define es-by-n (sort es < #:key experiment-n))
  (define histogram
    (discrete-histogram
      (for/list ([e (in-list es-by-n)])
        (match-define (experiment _ n _ (stats _ mean _ _ )) e)
        (vector (format "n = ~a" n) mean))
      #:skip skip #:x-min x-min #:label (~a label) #:color (add1 x-min)))
  (define annotations
    (list
      (error-bars
        (for/list ([(e i) (in-indexed (in-list es-by-n))])
          (match-define (stats _ mean _ stddev) (experiment-stats e))
          (vector (x i) mean stddev)))
      #;
      (for/list ([(e i) (in-indexed (in-list es-by-n))])
        (match-define (stats min _ max _) (experiment-stats e))
        (list (point-label (vector (x i) min) "min")
              (point-label (vector (x i) max) "max")))))
  (list histogram annotations))

(module+ main
  (random-seed 0)
  (displayln "running experiments")
  (define results (time (map run-experiment experiments)))
  (displayln "plotting")
  (define results-by-freq (group-by experiment-freq results))
  (define results-by-freq-by-f (map (curry group-by experiment-f) results-by-freq))
  (define results-by-freq-by-f-sorted-by-freq
    (sort results-by-freq-by-f < #:key (compose1 experiment-freq caar)))

  (define skip (length fs))
  (define p
    (apply vc-append
           (for/list ([group (in-list results-by-freq-by-f-sorted-by-freq)])
             (define freq (experiment-freq (caar group)))
             (parameterize ([plot-y-transform (axis-transform-bound log-transform 0.01 +inf.0)]
                            [plot-y-ticks (log-ticks)])
               (plot-pict
                 (for/list ([(exps i) (in-indexed group)])
                   (experiments->renderers exps skip i))
                 #:title (format "Frequency: ~a" freq)
                 #:width (* 4 (plot-width))
                 #:y-min 0.001
                 #:x-label "n"
                 #:y-label "t")))))
  (show-pict p)
  (write-pict-to-svg p "bench.svg"))

countvajhula · 2023-03-03T01:57:43Z

@benknoble We reviewed the data and after playing around with the expansions of the two alternatives we seemed to conclude that they should perform almost identically in practice since they both short-circuit and have very similar expansions. Lmk if you think we've misunderstood though!

Notes from today

Also the code works perfectly and took almost ten minutes to run on my laptop. I'll add it to the wiki soon 👍

benknoble · 2023-03-03T15:06:28Z

We reviewed the data and after playing around with the expansions of the two alternatives we seemed to conclude that they should perform almost identically in practice since they both short-circuit and have very similar expansions. Lmk if you think we've misunderstood though!

No, I think that's right.

Notes from today

Thanks for the link. I think for partition, the easiest way to to make it non-core would be to have a compiler optimization that recognizes nested sieves and converts it to partition-values (so that partition -> nested sieve -> partition-values); this also optimizes human-written nested sieves, if there are any.

Also the code works perfectly and took almost ten minutes to run on my laptop. I'll add it to the wiki soon 👍

Great! Yeah, it isn't fast: it would be better to generate the data once, save it, and then chart whatever you want from there. I think it was roughly similar on my machine (I recall using 2 frequencies being 2–3 minutes).

countvajhula · 2023-03-04T04:59:46Z

Thanks for the link. I think for partition, the easiest way to to make it non-core would be to have a compiler optimization that recognizes nested sieves and converts it to partition-values (so that partition -> nested sieve -> partition-values); this also optimizes human-written nested sieves, if there are any.

Great idea! We can try it in the first optimizations PR.

countvajhula · 2023-03-04T05:09:57Z

This PR is ready -- any input welcome! As it will be reviewed at the end anyway as part of merging the integration branch into main, there's no pressing need for review at this stage and we can just merge it in a few days in any case.

countvajhula · 2023-03-06T19:49:00Z

Benchmarking recipe added to the wiki 🙂

countvajhula added 2 commits January 12, 2023 13:14

improve amp performance

45464aa

try restoring original amp implementation

beb8223

countvajhula mentioned this pull request Jan 12, 2023

Let's Write a Qi Compiler! #74

Merged

29 tasks

countvajhula had a problem deploying to test-env January 12, 2023 22:11 — with GitHub Actions Failure

add a test for loop with multi-valued map flow

bda2dcd

countvajhula had a problem deploying to test-env January 12, 2023 22:26 — with GitHub Actions Failure

countvajhula added 2 commits January 13, 2023 12:57

restore not implementation

a079f14

remove extraneous threading forms in some tests

7972f1f

countvajhula had a problem deploying to test-env January 20, 2023 05:14 — with GitHub Actions Failure

countvajhula added 2 commits January 26, 2023 01:17

incorporate regression checking into form performance report

651e8c2

regression module doesn't need to be executable anymore

55aa3ff

countvajhula had a problem deploying to test-env January 26, 2023 16:24 — with GitHub Actions Failure

countvajhula added 5 commits January 26, 2023 10:26

macro to create value definitions in the qi binding space (pairing..)

02b8853

define count and live? as qi functions

df69eda

define all? and AND as qi functions

c2c42bf

remove unused import

7a9ea71

countvajhula had a problem deploying to test-env January 26, 2023 19:10 — with GitHub Actions Failure

put define-for-qi in a separate module for binding space provisions

125468b

countvajhula had a problem deploying to test-env January 26, 2023 19:21 — with GitHub Actions Failure

countvajhula added 8 commits January 27, 2023 02:38

restore OR and any?

0b1d946

restore none?

5b735dd

define qi functions in a uniform way

20ab7bf

add an explanatory comment re: bindings in qi space

3154175

reinstate all and any as core forms

d5ab909

restore original pass implementation

d771223

reinstate fanout as a core form, for now

780b602

reinstate partition as core for now

4a15e10

countvajhula added 2 commits February 28, 2023 17:05

remove extraneous wrapping thread in partition

d876bbb

make thresholds configurable in regression report

9c1f973

improve performance of feedback

93f2ff8

countvajhula marked this pull request as ready for review March 4, 2023 05:00

countvajhula merged commit da1437d into drym-org:lets-write-a-qi-compiler Mar 6, 2023

countvajhula deleted the restore-form-performance branch March 6, 2023 23:02

countvajhula restored the restore-form-performance branch March 7, 2023 00:16

countvajhula mentioned this pull request Jul 21, 2023

Charts to visualize performance data #108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore form performance #89

Restore form performance #89

countvajhula commented Jan 12, 2023 •

edited

Loading

countvajhula commented Jan 26, 2023

benknoble commented Feb 27, 2023 •

edited

Loading

countvajhula commented Feb 28, 2023

benknoble commented Mar 2, 2023

countvajhula commented Mar 3, 2023

benknoble commented Mar 3, 2023

countvajhula commented Mar 4, 2023

countvajhula commented Mar 4, 2023

countvajhula commented Mar 6, 2023

Restore form performance #89

Restore form performance #89

Conversation

countvajhula commented Jan 12, 2023 • edited Loading

Summary of Changes

Public Domain Dedication

countvajhula commented Jan 26, 2023

benknoble commented Feb 27, 2023 • edited Loading

countvajhula commented Feb 28, 2023

benknoble commented Mar 2, 2023

countvajhula commented Mar 3, 2023

benknoble commented Mar 3, 2023

countvajhula commented Mar 4, 2023

countvajhula commented Mar 4, 2023

countvajhula commented Mar 6, 2023

countvajhula commented Jan 12, 2023 •

edited

Loading

benknoble commented Feb 27, 2023 •

edited

Loading