Skip to content

Qi Compiler Sync Oct 20 2023

Siddhartha Kasivajhula edited this page Nov 8, 2023 · 3 revisions

Challenges with Multiple Values in Streams

Qi Compiler Sync Oct 20 2023

Adjacent meetings: Previous | Up | Next

Summary

We looked at a proof of concept of using multiple values in a stream, which works, but so far, results in worse performance. We considered some options for how this could be improved. We also discussed other things like Actor Basic and Frosthaven Manager.

Background

Recently, we observed that Racket's map accepts multiple input lists (functioning as zip in that case) whereas our deforestation implementation assumes a single input list. We had agreed that it would be necessary to support multiple lists here to preserve parity with Racket.

A Proof of Concept

Dominik had modified the stream implementation as a proof of concept:

(define-inline (cstream->list next)
  (λ state
      (let loop ([state state])
        (apply
         (next (λ () null)
               (λ (state) (loop state))
               (λ (value state)
                 ;; Must be a list with single value
                 (cons (car value) (loop state))))
         state))))

(define-inline (foldr-cstream op init next)
  (λ state
    (let loop ([state state])
      (apply
       (next (λ () init)
             (λ (state) (loop state))
             (λ (vals state)
               ;; Vals must be a list with single value, the result
               ;; must be single value as it is technically being
               ;; merged into implicit accumulator (see foldl-cstream)
               (op (car vals) (loop state))))
       state))))

(define-inline (foldl-cstream op init next)
  (λ state
    (let loop ([acc init] [state state])
      (apply
       (next (λ () acc)
             (λ (state) (loop acc state))
             (λ (value state)
               ;; Value must be a list with single value and the value
               ;; stored in the accumulator must be a single value,
               ;; not a list of results
               (loop (op (car value) acc) state)))
       state))))

;; Proper name should probably be lists->cstream-next
(define-inline (list->cstream-next done skip yield)
  (lambda states
    (cond ((andmap null? states)
           (done))
          ;; yield is always called with a list of values taken from
          ;; car of all lists passed as arguments of this procedure
          (else (yield (map car states) (map cdr states))))))

(define-inline (map-cstream-next f next)
  (lambda (done skip yield)
    (next done
          skip
          (lambda (vals states)
            ;; The resulting value must be wrapped in a list as any
            ;; yield expects list of values as its first argument
            (yield (list (apply f vals)) states)))))

(define-inline (filter-cstream-next f next)
    (λ (done skip yield)
      (next done
            skip
            (λ (vals state)
              (if (f (car vals))
                  ;; vals is already a list of values
                  (yield vals state)
                  (skip state))))))

(Note: some variable names were not changed for clarity to keep the diff small, e.g. statestates)

This worked but resulted in dramatically worse performance (> 10x) from the case with single values:

 (filter-map 1.93)
 (filter-map (large list) 5.91)
 (filter-map-foldl 4.3)
 (filter-map-foldr 3.97)

(these are compared against Racket rather than the single-value stream implementation, which was about 3x faster than Racket)

Multiple Options

We considered a few options:

  1. Use case-lambda to provide independent implementations for a small number of arguments like 1,2,3, and a default implementation that packs "any number of arguments" into a list (based on a suggestion by @usao on Discord).
  2. Change the semantics of "next" functions to accept the input lists as a single argument -- a list of lists -- instead of multiple values packed into a list.
  3. Even if it is slower on the above benchmarks, if we are actually avoiding intermediate list construction (TBD), then it may still end up being faster on functional pipelines that are long enough.
  4. Apply the single-value implementation if the leading functional stage is not map (since map is the only one that supports multiple input lists), or if it is map but if we know there is only one input, and the multiple-value implementation otherwise.
  5. Since map with multiple values is the same as zip, we considered whether St-Amour's writeup on the stream implementation for zip would have any ideas that could help here.

For (1), we seemed to conclude that that works with built-in functions like map because in those cases, we typically syntactically indicate the number of arguments, like (map f as bs), and it would be possible to select the appropriate implementation at compile time based on this. But in our case here, we did not immediately see a way to avoid packing the arguments into a list, and consequently avoid the use of apply in cstream->list. We considered writing a simple implementation assuming specifically two input values (not an arbitrary number) instead of one, to see how that would look and if it would suggest ways that case-lambda could help us avoid list packing and apply, or any other ideas.

For (2), we agreed that it would be good to try it and see what the performance looks like.

For (3), we agreed that we should add another benchmark for "long functional pipeline," to be able to track the impact there.

For (4), this could be done partially at the syntax pattern level, by specifically looking for map in the leading position and using the single-value implementation if it is not map. In addition, even if it is map, we could use runtime dispatch to use the appropriate implementation conditioned on the number of arguments. We can do this the output of map is a single value, and so we would not need to worry about multiple input values in the middle of a functional pipeline. This is also viable since it would just be a single check for number of at the beginning and wouldn't be expensive.

Note that we can only make this assumption (in (4)) because we are assuming that we are using Racket's map function, which is constrained to produce a single output value from each value in the input list. In the past, we have considered something like this:

(map (☯ (-< _ _)) (list 1 2 3)) ;=> '(1 1 2 2 3 3)

If we did provide a map that supported this behavior, then the consideration for (4) above that we only need to worry about multiple values in the beginning and not in the middle would not be valid anymore.

For (5), on the face of it, the implementation of zip in St-Amour seemed to be the special case of two input lists, rather than an arbitrary number. Aside from that, there may yet be something there and we will aim to review it further.

Frosthaven Manager as Test Suite

Frosthaven Manager is a large codebase employing Qi, including what could be considered to be a small DSL gluing together Qi and Gui-easy. We've discussed using this codebase to validate the changes in the compiler branch as one of the pre-merge steps before merging into main. Recently we ran into some installation issues that we will look into further.

Actor Basic and Qi

We discussed whether it would make sense for Actor Basic to be implemented as macros compiling to Qi. Dominik felt that since the language is quite specialized, "only two people in the world" are likely to use it -- himself, and Hendrik. So it would be best to simply adopt features developed for Actor Basic (like continuations) into Qi, so that they would be available in a general-purpose language. From this perspective, Actor Basic provides a simpler environment in which to prototype features that could be added to Qi. The two main unique aspects in Actor Basic that likely would not be incorporated into Qi in this way are (1) asynchronous message passing, and (2) a cooperative scheduler. On the other hand, compiling to Qi could make the implementation of AB itself simpler, so that it could be implemented mostly as functions rather than as macros, and could benefit from future compiler optimizations in Qi. Yet another consideration is that as AB doesn't need multiple values, it could be faster to keep it as a separate language. On the other hand, for the same reason that the Qi compiler allows Qi to be faster than Racket in some cases, it may be possible to have an AB compiler that would allow AB to be faster than Qi in some cases, while still benefiting from general purpose optimizations.

Next Steps

(Some of these are carried over from last time)

  • Add a "long functional pipeline" benchmark.
  • Continue to integrate macroexpansion events into compiler rewrite rules to afford visibility in e.g. the Macro Stepper.
  • Provide good error messages in optimized (e.g. deforested) expressions in terms of source expressions.
  • Validate that optimization passes can prematurely terminate if they don't return false when they don't transform the input (and then fix it).
  • Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr zip average upto). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
  • Write unit tests for more compiler rewrite rules, as well as end-to-end tests for pairs of input expressions, and sanity tests for the expander.
  • Find a way to reliably detect list-like operations on values.
  • Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
  • Add additional nonlocal benchmarks to validate all cases optimized by stream fusion.
  • Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
  • Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
  • Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
  • Design a core continuation form(s?) for Qi, and show how try can be implemented in terms of it.
  • Describe the sequence of compilation for input expressions with the optimizing compiler against the reference compiler, showing cases in which these do not terminate.
  • Come up with a methodology to formalize and theoretically validate compiler behavior.
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Dominik, Sid

Clone this wiki locally