Chunk ingestion performance regression because of compaction logic #8101

rgolovanov · 2024-11-12T14:52:31Z

Describe the bug
The latency in the Viewer is rapidly growing when dozens of scalar values are logged. The performance has degraded between 0.17.0 and 0.19.0 (or maybe even 0.18.1)

To Reproduce
Steps to reproduce the behavior:

pip install rerun-sdk==0.17.0
python rerun_latency_test.py (see code below)
Check latency in performance metrics in Viewer
Repeat steps (1) - (3) for version 0.19.0 and see how quickly the latency goes up and above a few seconds.

Expected behavior
The latency stays within 10-15ms range as it was in earlier versions.

Screenshots

Test script rerun_latency_test.py

probably, numbers can be different depending on the machine, but in my case the problem is visible when I log 10x5x3=150 scalar values.

import rerun as rr
import time
import random
import threading
import itertools

def log_scalar_values():
    rr.init(f"TestLatency", spawn=True, recording_id="default")
    while not stop_event.is_set():
        for i, j, k in itertools.product(range(0, 10), range(0, 5), range(0, 3)):
            scalar_value = random.uniform(0, 1)  # Generate a random scalar value
            rr.log(f"AAA/{i}/{j}/{k}/value", rr.Scalar(scalar_value))
            print(f"Log {scalar_value}")
        time.sleep(0.001)  # Sleep for 10ms to achieve approximately 100Hz


stop_event = threading.Event()

logging_thread = threading.Thread(target=log_scalar_values)
logging_thread.start()

input("Press Enter to stop logging...\n")
stop_event.set()
logging_thread.join()
print("Logging stopped.")

Desktop (please complete the following information):

OS: Windows 11

Rerun version
Detected in 0.19.0 but likely appeared in 0.18.0 or 0.18.1

The text was updated successfully, but these errors were encountered:

* Fixes #8101 Before: ![image](https://github.com/user-attachments/assets/339211da-e82d-4646-9946-784f24cea88f) After: ![image](https://github.com/user-attachments/assets/4a89eae9-6692-4415-a8c9-c336b6afe558)

teh-cmc · 2024-11-12T18:16:24Z

Thanks for the neat little repro @rgolovanov, that made it a breeze 👍

rgolovanov · 2024-11-12T19:19:22Z

@teh-cmc Thanks for the extremely fast fix 🙏 We will be looking forward for new release 😊

rgolovanov · 2024-11-13T08:13:10Z

@teh-cmc I installed rerun_sdk-0.20.0a4+dev-cp38-abi3-win_amd64.whl from latest development build which includes your fix. For me, it still shows growing latency...

teh-cmc · 2024-11-13T08:38:45Z

@teh-cmc I installed rerun_sdk-0.20.0a4+dev-cp38-abi3-win_amd64.whl from latest development build which includes your fix. For me, it still shows growing latency...

Both 0.17 and the latest development wheel seem to have the same latencies for me @rgolovanov, i.e. they can both sustain the above script at 100Hz but fall behind at 1kHz.

Note that the script in the issue description has a typo in it -- it's running at 1kHz, not 100Hz:

- time.sleep(0.001)  # Sleep for 10ms to achieve approximately 100Hz
+ time.sleep(0.010)  # Sleep for 10ms to achieve approximately 100Hz

rgolovanov · 2024-11-13T08:57:10Z

Yes, frequency matters of course. I think I started local testing with 100Hz and then switched to 1KHz because it was easier and faster way to reveal the issue. With time.sleep(0.010) intervals the latency also starts growing but takes more time. In my case, it reached 40s latency when recorded data was ~130MiB...

teh-cmc · 2024-11-13T09:22:28Z

In my case, it reached 40s latency when recorded data was ~130MiB...

The amount of data already present shouldn't have any impact on the ingestion speed.

What I assume you're experiencing here is another issue: the fact that ingestion and rendering both happen on the same thread as of today:

By the time you've ingested 130MiB worth of scalars, and assuming your plot view is configured to display them all, rendering times will skyrocket and therefore ingestion will start lagging behind because the thread is being hogged by the renderer.
Try hiding the plot view, and you should see the issue go away (or use the selection panel to set its time bounds).

The reason this might appear to be faster on 0.17 is simply because scalars consumes at least 1 order of magnitude more memory than on 0.18+, so 130MiB worth of scalars on 0.17 is way, way less scalars than on 0.20, and therefore takes far less time to render.

rgolovanov · 2024-11-18T13:33:40Z

@teh-cmc I updated to new official release 0.20.0 and can confirm that the performance is still low. the latency is rapidly growing even if only 10sec of recording (current time with offset) is shown in the view. As a workaround we use older version =0.16.1.

I know the description of the issue mentions 0.17.0 but could you profile performance difference between 0.16.1 and 0.20.0?
Perhaps there is something else which could be optimized?

teh-cmc · 2024-11-18T16:58:49Z

@teh-cmc I updated to new official release 0.20.0 and can confirm that the performance is still low. the latency is rapidly growing even if only 10sec of recording (current time with offset) is shown in the view. As a workaround we use older version =0.16.1.

I know the description of the issue mentions 0.17.0 but could you profile performance difference between 0.16.1 and 0.20.0? Perhaps there is something else which could be optimized?

😞 I'll have a look.

emilk · 2024-11-20T08:45:12Z

Here is a profile dump:

I see two separate problems

Slow ingestion

We spend 10ms ingesting input (which is our limit). This is slow because there are so many entities, i.e. this is part of this problem:

Rerun viewer becomes slow when there are several hundreds of entities #7115

zooming in ona ingestion of a single chunk:

Slow plots

There are several reasons why the plots are slow.

Many plot lintes

One is that there are so many lines. Each line needs processing:

Slow egui rendering

The second problem is that they are rendered using egui, and apparently using way too many line primitives leading to slow tessellation:

And also a lot of primitives to upload to the GPU:

Conclusions

The slow ingestion is a real problem, and needs to be fixed with:

Rerun viewer becomes slow when there are several hundreds of entities #7115

The slow rendering is just an effect of rendering this crazyness:

This is not a realistic plot view of real data. This is madnenss :)

As soon as I zoom in, the performance is good again:

A potential solution is to just aggregate the line plots a lot more aggressively, i.e. render them more coarsely.

teh-cmc · 2024-11-20T19:03:35Z

TL;DR:

The slow rendering of the plot itself is nothing new -- there is always room to improve plot rendering, and we should always strive to do so, but this is not needed for 0.20.1.
The slow ingestion is because the reproduction script in the top-comment represents the worst possible case when logging (i.e.: lots of different entities with just enough time spent sleeping that the client-side micro-batcher never has any opportunity to actually batch anything (when using the default settings, at least)).
This leads to a regression in ingestion time in that case, because all these tiny tiny chunks that couldnt be compacted on the client-side now have to be compacted on the server-side.
I'll try and see if there are some very very low-hanging fruits to make that faster for 0.20.1, but more likely than not this will require some more serious work and will have to wait for 0.21.

emilk · 2024-11-21T14:03:14Z

It is likely that some of this ingestion overhead can be mitigated with more aggressive SDK-side chunk compaction. This can be enabled by changiung the environment variable RERUN_FLUSH_TICK_SECS, e.g. RERUN_FLUSH_TICK_SECS=0.2. This will trade a little bit of latency (in this case 0.2 seconds) for larger chunks, that should ingest much faster.

rgolovanov added 👀 needs triage This issue needs to be triaged by the Rerun team 🪳 bug Something isn't working labels Nov 12, 2024

teh-cmc self-assigned this Nov 12, 2024

teh-cmc added 🚀 performance Optimization, memory use, etc 🦟 regression A thing that used to work in an earlier release and removed 👀 needs triage This issue needs to be triaged by the Rerun team labels Nov 12, 2024

teh-cmc added this to the 0.20 - Maps, H.264 milestone Nov 12, 2024

teh-cmc mentioned this issue Nov 12, 2024

EntityTree: only check for entity deletions when necessary #8103

Merged

6 tasks

teh-cmc closed this as completed in #8103 Nov 12, 2024

teh-cmc reopened this Nov 18, 2024

emilk modified the milestones: 0.20 - Maps, H.264, 0.21 - Streaming video, Undo Nov 19, 2024

teh-cmc modified the milestones: 0.21 - Streaming video, Undo, Next patch release Nov 19, 2024

emilk mentioned this issue Nov 20, 2024

Rerun viewer becomes slow when there are several hundreds of entities #7115

Open

emilk unassigned teh-cmc Nov 20, 2024

emilk changed the title ~~Performance degradation in Viewer when log multiple scalar values~~ Chunk ingestion performance regression because of compaction logic Nov 20, 2024

emilk assigned teh-cmc Nov 20, 2024

teh-cmc removed this from the Next patch release milestone Nov 27, 2024

teh-cmc added the project-many-entities-perf label Nov 27, 2024

teh-cmc mentioned this issue Nov 27, 2024

Project: Performant visualization of scenes with large number of entities #8233

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk ingestion performance regression because of compaction logic #8101

Chunk ingestion performance regression because of compaction logic #8101

rgolovanov commented Nov 12, 2024 •

edited

Loading

teh-cmc commented Nov 12, 2024

rgolovanov commented Nov 12, 2024

rgolovanov commented Nov 13, 2024

teh-cmc commented Nov 13, 2024 •

edited

Loading

rgolovanov commented Nov 13, 2024

teh-cmc commented Nov 13, 2024

rgolovanov commented Nov 18, 2024

teh-cmc commented Nov 18, 2024

emilk commented Nov 20, 2024

teh-cmc commented Nov 20, 2024 •

edited

Loading

emilk commented Nov 21, 2024

Chunk ingestion performance regression because of compaction logic #8101

Chunk ingestion performance regression because of compaction logic #8101

Comments

rgolovanov commented Nov 12, 2024 • edited Loading

teh-cmc commented Nov 12, 2024

rgolovanov commented Nov 12, 2024

rgolovanov commented Nov 13, 2024

teh-cmc commented Nov 13, 2024 • edited Loading

rgolovanov commented Nov 13, 2024

teh-cmc commented Nov 13, 2024

rgolovanov commented Nov 18, 2024

teh-cmc commented Nov 18, 2024

emilk commented Nov 20, 2024

Slow ingestion

Slow plots

Many plot lintes

Slow egui rendering

Conclusions

teh-cmc commented Nov 20, 2024 • edited Loading

emilk commented Nov 21, 2024

rgolovanov commented Nov 12, 2024 •

edited

Loading

teh-cmc commented Nov 13, 2024 •

edited

Loading

teh-cmc commented Nov 20, 2024 •

edited

Loading