(Question) Performance on SIMD #391

hardBSDk · 2024-05-19T03:16:50Z

How do you plan to optimize serial processing on SIMD?

Another thing, how your project bypassed this law?

chadbrewbaker · 2024-05-19T04:17:09Z

I reached out to Geoff Langdale to see if he would be interested in porting his SIMD superoptimizer for an interaction combinator implementation. I have one on mothballs that uses e-graphs https://github.com/chadbrewbaker/egg/blob/finset/tests/finset.rs

Another in Haskell https://github.com/chadbrewbaker/endoscope/blob/master/src/Endoscope.hs - janky and shells out to z3py for some stuff but generic over all semigroups.

My advice is to keep things 32bit in the generic case for now so you can bit-blast the full 4GB range of a reference implementation and a candidate implementation to prove correctness.

As I understand MOJO - you might be able to pick up some of Chris Lattner's MLIR tricks for free if you can target his MLIR.

A lot of the MOJO tricks are just making the runtime ergonomic for benchmarking such as tuning buffer sizes for a new arch.

You can't bypass Amdhal's law, but you can do my grand-advisor Gustafson's trick of using smaller problem blocks in parallel so you are hitting less levels of cache latency. https://en.wikipedia.org/wiki/Gustafson%27s_law

For many problems you have to use cache-oblivious algorithms and succinct data structures. The compiler can only get you so far.

32bit operation bit-blaster to compare candidate and reference 4GB ranges of two functions.
IO buffer size auto-benchmarking.
lifting of simple/small interaction graphs to combinatorial species e-graphs.
Berger COZ style solver - adding intentional latency to experiment how slowdown effects runtime. Slowdown insensitive code not worth optimizing as something else is a bottleneck

developedby · 2024-05-19T12:25:22Z

About sequential simd operations, I haven't thought about it yet, maybe victor has some ideas. But it most likely will be done at some point.
We plan on working very hard to optimize all these common programming practices and have a fast single-thread performance.
We're only on the first version, the only thing we've really tried to optimize so far is proper distribution of work across threads.

hardBSDk · 2024-05-19T16:16:27Z

@developedby Amazing to know that you aim to improve the single-thread performance!

I saw the website benchmark and the performance is only good with many GPU cores.

developedby added the question Further information is requested label May 19, 2024

TheKindlerofWildfires mentioned this issue Jun 1, 2024

Integrate perf enhancers as they mature TheKindlerofWildfires/nonphysical#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Question) Performance on SIMD #391

(Question) Performance on SIMD #391

hardBSDk commented May 19, 2024 •

edited

Loading

chadbrewbaker commented May 19, 2024 •

edited

Loading

developedby commented May 19, 2024

hardBSDk commented May 19, 2024

(Question) Performance on SIMD #391

(Question) Performance on SIMD #391

Comments

hardBSDk commented May 19, 2024 • edited Loading

chadbrewbaker commented May 19, 2024 • edited Loading

developedby commented May 19, 2024

hardBSDk commented May 19, 2024

hardBSDk commented May 19, 2024 •

edited

Loading

chadbrewbaker commented May 19, 2024 •

edited

Loading