Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore form performance #89

Conversation

countvajhula
Copy link
Collaborator

@countvajhula countvajhula commented Jan 12, 2023

Summary of Changes

Restore performance of Qi forms to pre-stratification levels.

  • restore performance of forms
  • add some missing tests
  • incorporate regression checking into the form performance report script qi-sdk/profile/report.rkt instead of having it be a separate script
  • introduce the notion of the "loophole in qi space," i.e. ways to add definitions in qi space that aren't shadowable by Racket bindings, and which could, as a result, be used to define optimized implementations for built-in Qi macros (i.e. the "extended language") without these macros needing to expand to the core language nor be part of the core language.

See Return to baseline performance for more information.

Public Domain Dedication

  • In contributing, I relinquish any copyright claims on my contribution and freely release it into the public domain in the simple hope that it will provide value.

(Why: The freely released, copyright-free work in this repository represents an investment in a better way of doing things called attribution-based economics. Attribution-based economics is based on the simple idea that we gain more by giving more, not by holding on to things that, truly, we could only create because we, in our turn, received from others. As it turns out, an economic system based on attribution -- where those who give more are more empowered -- is significantly more efficient than capitalism while also being stable and fair (unlike capitalism, on both counts), giving it transformative power to elevate the human condition and address the problems that face us today along with a host of others that have been intractable since the beginning. You can help make this a reality by releasing your work in the same way -- freely into the public domain in the simple hope of providing value. Learn more about attribution-based economics at drym.org, tell your friends, do your part.)

For unadorned identifiers to be treated as function identifiers,
ensure that qi functions take precedence over racket functions. This
allows us to define functions that may be treated as part of the
language (and not be shadowed by calling-scope identifiers) without
actually being syntactically part of the language as core forms or
macros.
@countvajhula
Copy link
Collaborator Author

Notes from today's meeting: A Loophole in Qi Space. Esp @benknoble you may be interested in this "loophole" that Michael came up with as an alternative to some of the options we've talked about before ("restorative/peephole optimizations" vs inflating the core language).

@benknoble
Copy link
Collaborator

benknoble commented Feb 27, 2023

I attempted to do some benchmarking of none?'s implementation, and found that (a) for arities of less than 100k, the timing of any implementation I used was sub-millisecond; (b) in most cases, using for/and over not was faster than composing not with the for/or version of any?; and (c) the overhead of threads was too great to allow "racing" both versions to get the best of both worlds.

I tried with different distributions of the value #f and random numbers in [0,1) (called the "frequency") and with different orders of thread creation. I was actually quite surprised that for/or didn't win out significantly for lower frequencies of #f and for/and in higher frequencies. The for/or does clearly do far better than for/and at frequency 0, but only "dramatically" so for n = 100k. At higher orders of magnitude, they appear maybe roughly the same?

The bars show the average of 5 runs, with error bars the size of the standard deviation.

Note that the y-axis is on a logarithmic (base-10) transform; otherwise, you wouldn't be able to see all sizes on the same graph.

bench

@countvajhula
Copy link
Collaborator Author

👁️ 🍭 That's great data! We can aim to discuss it at this week's Qi meetup (happening on Thursday just FYI). Those charts seriously look amazing, I'm curious how you generated those. It could be useful to add the recipe to the Developer's Guide.

@benknoble
Copy link
Collaborator

Hand-typed from the original; forgive typos 😅

#lang racket

(require plot file/convertible math/statistics pict)

(define (write-pict-to-svg p f)
  (with-output-to-file f
    (thunk
      (write-bytes (convert p 'svg-bytes)))))

(define (not-o-any . args)
  (not (for/or ([arg (in-list args)])
         arg)))

(define (for/and-over-not . args)
  (for/and ([arg (in-list args)])
    (not arg)))

(define (race-threads1 . args)
  (define me (current-thread))
  (define t1 (thread (thunk (thread-send me (apply not-o-any args)))))
  (define t2 (thread (thunk (thread-send me (apply for/and-over-not args)))))
  (begin0 (thread-receive)
    (kill-thread t1)
    (kill-thread t2)))

(define (race-threads2 . args)
  (define me (current-thread))
  (define t2 (thread (thunk (thread-send me (apply for/and-over-not args)))))
  (define t1 (thread (thunk (thread-send me (apply not-o-any args)))))
  (begin0 (thread-receive)
    (kill-thread t1)
    (kill-thread t2)))

(define (generate n freq)
  (build-list n (λ (_i)
                  (define n (random))
                  (and (>= n freq) n))))

(define ns (list #e1e5 #e1e6 #e1e7))
(define freqs (list 0.0 0.3 0.7 1.0))
(define fs (list not-o-any for/and-over-not race-threads1 race-threads2))

(struct experiment [f n freq stats] #:transparent)
(struct stats [min mean max stddev] #:transparent)

(define experiments
  (map (match-lambda [(list n freq f) (experiment f n freq #f)])
       (cartesian-product ns freqs fs)))

(define n-trials 5)

(define/match (run-experiment _e)
  [((experiment f n freq #f))
   (define times
     (for/list ([_i n-trials])
       (define-values (_res _cpu real _garbage) (time-apply f (generate n freq)))
       real))
   (experiment f n freq (make-stats times))])

(define (make-stats times)
  (define μ (mean times))
  (define σ (stddev/mean μ times))
  (stats (apply min times) μ (apply max times) σ))

(define (experiments->renderers es skip x-min)
  (define (x i) (+ 0.5 x-min (* i skip)))
  (define label (object-name (experiment-f (first es))))
  (define es-by-n (sort es < #:key experiment-n))
  (define histogram
    (discrete-histogram
      (for/list ([e (in-list es-by-n)])
        (match-define (experiment _ n _ (stats _ mean _ _ )) e)
        (vector (format "n = ~a" n) mean))
      #:skip skip #:x-min x-min #:label (~a label) #:color (add1 x-min)))
  (define annotations
    (list
      (error-bars
        (for/list ([(e i) (in-indexed (in-list es-by-n))])
          (match-define (stats _ mean _ stddev) (experiment-stats e))
          (vector (x i) mean stddev)))
      #;
      (for/list ([(e i) (in-indexed (in-list es-by-n))])
        (match-define (stats min _ max _) (experiment-stats e))
        (list (point-label (vector (x i) min) "min")
              (point-label (vector (x i) max) "max")))))
  (list histogram annotations))

(module+ main
  (random-seed 0)
  (displayln "running experiments")
  (define results (time (map run-experiment experiments)))
  (displayln "plotting")
  (define results-by-freq (group-by experiment-freq results))
  (define results-by-freq-by-f (map (curry group-by experiment-f) results-by-freq))
  (define results-by-freq-by-f-sorted-by-freq
    (sort results-by-freq-by-f < #:key (compose1 experiment-freq caar)))

  (define skip (length fs))
  (define p
    (apply vc-append
           (for/list ([group (in-list results-by-freq-by-f-sorted-by-freq)])
             (define freq (experiment-freq (caar group)))
             (parameterize ([plot-y-transform (axis-transform-bound log-transform 0.01 +inf.0)]
                            [plot-y-ticks (log-ticks)])
               (plot-pict
                 (for/list ([(exps i) (in-indexed group)])
                   (experiments->renderers exps skip i))
                 #:title (format "Frequency: ~a" freq)
                 #:width (* 4 (plot-width))
                 #:y-min 0.001
                 #:x-label "n"
                 #:y-label "t")))))
  (show-pict p)
  (write-pict-to-svg p "bench.svg"))

@countvajhula
Copy link
Collaborator Author

@benknoble We reviewed the data and after playing around with the expansions of the two alternatives we seemed to conclude that they should perform almost identically in practice since they both short-circuit and have very similar expansions. Lmk if you think we've misunderstood though!

Notes from today

Also the code works perfectly and took almost ten minutes to run on my laptop. I'll add it to the wiki soon 👍

@benknoble
Copy link
Collaborator

We reviewed the data and after playing around with the expansions of the two alternatives we seemed to conclude that they should perform almost identically in practice since they both short-circuit and have very similar expansions. Lmk if you think we've misunderstood though!

No, I think that's right.

Notes from today

Thanks for the link. I think for partition, the easiest way to to make it non-core would be to have a compiler optimization that recognizes nested sieves and converts it to partition-values (so that partition -> nested sieve -> partition-values); this also optimizes human-written nested sieves, if there are any.

Also the code works perfectly and took almost ten minutes to run on my laptop. I'll add it to the wiki soon 👍

Great! Yeah, it isn't fast: it would be better to generate the data once, save it, and then chart whatever you want from there. I think it was roughly similar on my machine (I recall using 2 frequencies being 2–3 minutes).

@countvajhula
Copy link
Collaborator Author

Thanks for the link. I think for partition, the easiest way to to make it non-core would be to have a compiler optimization that recognizes nested sieves and converts it to partition-values (so that partition -> nested sieve -> partition-values); this also optimizes human-written nested sieves, if there are any.

Great idea! We can try it in the first optimizations PR.

@countvajhula countvajhula marked this pull request as ready for review March 4, 2023 05:00
@countvajhula
Copy link
Collaborator Author

This PR is ready -- any input welcome! As it will be reviewed at the end anyway as part of merging the integration branch into main, there's no pressing need for review at this stage and we can just merge it in a few days in any case.

@countvajhula
Copy link
Collaborator Author

Benchmarking recipe added to the wiki 🙂

@countvajhula countvajhula merged commit da1437d into drym-org:lets-write-a-qi-compiler Mar 6, 2023
@countvajhula countvajhula deleted the restore-form-performance branch March 6, 2023 23:02
@countvajhula countvajhula restored the restore-form-performance branch March 7, 2023 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants