Speed up C++ logging for many individual log calls #4287

Wumpf · 2023-11-21T10:10:05Z

The recent performance improvement

Significantly improve C++ logging performance by using C FFI instead of arrow IPC #4273

got the C++ sdk a lot faster. But compared to Rust we're still behind for individual log calls (like time series scalars!).

An obvious candidate to improve is not building & sending the schema every time: Right now on every log call we convert the schema to C FFI and then create a Rust/arrow2 representation from it. Add a simple lazy schema registry/handle system for this!

Should do a little bit more profiling though to get an idea where the perf goes. E.g. there's likely many many other needless allocs on the way.

…ing a component type registry (#4296) ### What * Fixes #4287 * Follow-up to #4273 As expected, not doing the C++ datatype -> C FFI schema -> Rust datatype roundtrip for each log call helps perf quite a bit, especially when we do a lot of smaller log calls. The registry a single RwLock protected Vec (we never deregister) which is exposed via a single c entry point. On the C++ side we use the local `static` variable mechanism for threadsafe lazy registration (slight codegen adjustment). Indicator components had some special handling before and were refactored to fit in this system - in the process I made their arrow array shared across all instantiations, further cutting down on per-log work. --- Benchmark results: * large point cloud: `0.15s` -> `0.14s` * many points: `7.52s` -> `4.52s` * large images: `0.57s` -> `0.51s` Old values from previous PR. New values are median over three runs, single executable run (this makes more and more of a difference with all these registries!), timings without prepare step, same M1 macbook. A quick look over the profiler for running `log_benchmark points3d_many_individual` in isolation tells us that of the actual benchmark running time we spend.. * 35% of the the time in `rr_recording_stream_log` (of which in turn 20%, so 7% overall, is still arrow FFI translation of the array!!) * 30% in the various `to_data_cell` methods * 10% in exporting arrow arrays to C FFI * 6% in setting the time * the rest in various allocations along the way (taken via `Instruments` on my Mac) <img width="969" alt="image" src="https://github.com/rerun-io/rerun/assets/1220815/5632589f-52b1-4e92-b7a0-1482e69528ad"> --- ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/4296) (if applicable) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/4296) - [Docs preview](https://rerun.io/preview/8bf1ee59d9a2bc5e192c1c8169c98dd40b621100/docs)  - [Examples preview](https://rerun.io/preview/8bf1ee59d9a2bc5e192c1c8169c98dd40b621100/examples)  - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)

Wumpf added 🚀 performance Optimization, memory use, etc 🌊 C++ API C/C++ API specific labels Nov 21, 2023

Wumpf added this to the 0.11 milestone Nov 21, 2023

Wumpf self-assigned this Nov 21, 2023

Wumpf mentioned this issue Nov 21, 2023

Further improve C++ logging for many individual log calls by introducing a component type registry #4296

Merged

4 tasks

Wumpf closed this as completed in #4296 Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up C++ logging for many individual log calls #4287

Speed up C++ logging for many individual log calls #4287

Wumpf commented Nov 21, 2023

Speed up C++ logging for many individual log calls #4287

Speed up C++ logging for many individual log calls #4287

Comments

Wumpf commented Nov 21, 2023