Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary caching 8: implement latest-at data-time cache entry deduplication #4712

Merged
merged 3 commits into from
Jan 15, 2024

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Jan 5, 2024

Introduces the notion of cache deduplication: given a query at time 4 and a query at time 8 that both returns data at time 2, they must share a single cache entry.

I.e. starting with this PR, scrubbing through the OPF example will not result if more cache memory being used.


Part of the primary caching series of PR (index search, joins, deserialization):


Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG

@teh-cmc teh-cmc added 🔍 re_query affects re_query itself 🚀 performance Optimization, memory use, etc do-not-merge Do not merge this PR exclude from changelog PRs with this won't show up in CHANGELOG.md labels Jan 5, 2024
@teh-cmc teh-cmc changed the base branch from main to cmc/primcache_7_data_time January 5, 2024 16:30
@teh-cmc teh-cmc changed the title Primary caching 7: implement latest-at data-time cache entry deduplication Primary caching 8: implement latest-at data-time cache entry deduplication Jan 5, 2024
@teh-cmc teh-cmc force-pushed the cmc/primcache_7_data_time branch from bcc7ab9 to 035a111 Compare January 8, 2024 16:59
@teh-cmc teh-cmc force-pushed the cmc/primcache_8_dedupe branch from 4d4418c to 31e55d2 Compare January 8, 2024 17:26
crates/re_query_cache/src/query.rs Outdated Show resolved Hide resolved
crates/re_query_cache/src/query.rs Outdated Show resolved Hide resolved
@teh-cmc teh-cmc force-pushed the cmc/primcache_8_dedupe branch from e080a99 to 1517ec4 Compare January 9, 2024 07:58
@teh-cmc teh-cmc force-pushed the cmc/primcache_7_data_time branch from 64f9cf4 to 88989df Compare January 9, 2024 10:19
@teh-cmc teh-cmc force-pushed the cmc/primcache_8_dedupe branch from 1517ec4 to 317c0ec Compare January 9, 2024 10:25
@teh-cmc teh-cmc marked this pull request as ready for review January 9, 2024 10:32
teh-cmc added a commit that referenced this pull request Jan 10, 2024
This implements the most barebone latest-at caching support.

The goal is merely to introduce all the machinery and boilerplate
required to get the primary cache running, actual caching features will
be implemented on top of this foundation in follow up PRs.

The [existing benchmark
suite](https://github.com/rerun-io/rerun/blob/790f391/crates/re_query/benches/query_benchmark.rs)
has been ported as-is to the cached APIs (5950X, Arch):
```
group                           primcache_3_vanilla                         primcache_3_cached                  
-----                           -------------------                         ------------------                  
arrow_batch_points2/insert      1.02  1015.0±11.07µs 939.6 MElem/sec      1.00   1000.0±7.37µs 953.7 MElem/sec
arrow_batch_points2/query       2.90      3.4±0.02µs 276.5 MElem/sec      1.00  1190.5±41.55ns 801.0 MElem/sec
arrow_batch_strings2/insert     1.00   1045.7±7.85µs 912.0 MElem/sec      1.00  1042.1±14.01µs 915.1 MElem/sec
arrow_batch_strings2/query      1.91     21.3±0.17µs  44.7 MElem/sec      1.00     11.2±0.04µs  85.2 MElem/sec 
arrow_mono_points2/insert       1.01   1789.2±3.40ms 545.8 KElem/sec      1.00  1773.6±23.00ms 550.6 KElem/sec
arrow_mono_points2/query        6.78  1102.4±18.79µs 885.9 KElem/sec      1.00    162.6±3.39µs   5.9 MElem/sec 
arrow_mono_strings2/insert      1.00   1777.3±5.89ms 549.5 KElem/sec      1.00   1777.3±7.53ms 549.5 KElem/sec
arrow_mono_strings2/query       6.30  1149.9±15.36µs 849.3 KElem/sec      1.00    182.5±0.41µs   5.2 MElem/sec 
```

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726
teh-cmc added a commit that referenced this pull request Jan 10, 2024
Make it possible to toggle primary caching on and off at runtime, for
both latest-at and range queries.


![image](https://github.com/rerun-io/rerun/assets/2910679/46404d8d-ea27-441c-9bae-ba5e3476adef)


---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726
teh-cmc added a commit that referenced this pull request Jan 10, 2024
Integrates the cached APIs with the 2D & 3D spatial views, which is a
pretty tough thing to do because there's a lot of abstraction going on
in there.

`main` vs. cache disabled vs. cache enable (5950X, Arch):
```
group                        main                                    primcache_5_uncached                        primcache_5_cached                  
-----                        ----                                    --------------------                        ------------------                  
Points3D/load_all            1.68     10.1±0.14ms  94.2 MElem/sec        1.00      6.0±0.07ms 157.9 MElem/sec    1.01      6.1±0.06ms 155.7 MElem/sec
Points3D/load_colors         1.44      3.8±0.02ms 252.6 MElem/sec        1.00      2.6±0.05ms 364.0 MElem/sec    1.07      2.8±0.06ms 339.5 MElem/sec
Points3D/load_picking_ids    15.16  1859.6±7.01µs 512.9 MElem/sec        1.01    124.3±3.92µs   7.5 GElem/sec    1.00    122.7±3.86µs   7.6 GElem/sec 
Points3D/load_positions      2.29    420.1±0.76µs   2.2 GElem/sec        1.03    189.3±7.44µs   4.9 GElem/sec    1.00    183.4±5.56µs   5.1 GElem/sec 
Points3D/load_radii          1.46      3.3±0.04ms 290.1 MElem/sec        1.05      2.4±0.03ms 404.8 MElem/sec    1.00      2.2±0.00ms 423.9 MElem/sec
Points3D/query_archetype     2.51    676.1±7.59ns        ? ?/sec     15859.98      4.3±0.06ms         ? ?/sec    1.00    268.9±3.39ns         ? ?/sec 
```

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726
teh-cmc added a commit that referenced this pull request Jan 10, 2024
Integrates the cached APIs with the TextLog & TimeSeries views, which is
pretty trivial.

This of course does nothing, since the cache doesn't cache range queries
yet.

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726
@teh-cmc teh-cmc force-pushed the cmc/primcache_7_data_time branch from 88989df to 636eb92 Compare January 10, 2024 16:15
@teh-cmc teh-cmc force-pushed the cmc/primcache_8_dedupe branch from 317c0ec to a7e6e8a Compare January 10, 2024 16:22
@teh-cmc teh-cmc force-pushed the cmc/primcache_7_data_time branch 2 times, most recently from b955acc to 6921781 Compare January 15, 2024 09:36
Base automatically changed from cmc/primcache_7_data_time to main January 15, 2024 09:39
teh-cmc added a commit that referenced this pull request Jan 15, 2024
)

_99% grunt work, the only somewhat interesting thing happens in
`query_archetype`_

Our query model always operates with two distinct timestamps: the
timestamp you're querying for (`query_time`) vs. the timestamp of the
data you get back (`data_time`).

This is the result of our latest-at semantics: a query for a point at
time `10` can return a point at time `2`.
This is important to know when caching the data: a query at time `4` and
a query at time `8` that both return the data at time `2` must share the
same single entry or the memory budget would explode.

This PR just updates all existing latest-at APIs so they return the data
time in their response.
This was already the case for range APIs.

Note that in the case of `query_archetype`, which is a compound API that
emits multiple queries, the data time of the final result is the most
recent data time among all of its components.

A follow-up PR will use the data time to deduplicate entries in the
latest-at cache.

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
@teh-cmc teh-cmc force-pushed the cmc/primcache_8_dedupe branch from a7e6e8a to 3252a8c Compare January 15, 2024 09:44
@teh-cmc teh-cmc force-pushed the cmc/primcache_8_dedupe branch from 1a41374 to 95fdb2e Compare January 15, 2024 11:08
@teh-cmc teh-cmc merged commit ccfd21a into main Jan 15, 2024
22 of 31 checks passed
@teh-cmc teh-cmc deleted the cmc/primcache_8_dedupe branch January 15, 2024 11:10
@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Jan 15, 2024
teh-cmc added a commit that referenced this pull request Jan 15, 2024
Introduces a dedicated cache bucket for timeless data and properly
forwards the information through all APIs downstream.

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
teh-cmc added a commit that referenced this pull request Jan 15, 2024
This implements cache invalidation via a `StoreSubscriber`.

We keep track of the timestamps to invalidate in the `StoreSubscriber`,
but we only do the actual removal of components at query time.
This is similar to how we handle bucket sorting in the main store: doing
it at query time has the benefit that the frame time effectively behaves
as natural micro-batching mechanism that vastly improves performance.

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
teh-cmc added a commit that referenced this pull request Jan 15, 2024
)

The primary cache now tracks memory statistics and display them in the
memory panel.

This immediately highlights a very stupid thing that the cache does:
missing optional components that have been turned into streams of
default values by the `ArchetypeView` are materialized as such
:man_facepalming:
- #4779


https://github.com/rerun-io/rerun/assets/2910679/876b264a-3f77-4d91-934e-aa8897bb32fe



- Fixes #4730 


---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
teh-cmc added a commit that referenced this pull request Jan 15, 2024
**Prefer on a per-commit basis, stuff has moved around**

Range queries are back!... in the most primitive form possible.

No invalidation, no bucketing, no optimization, no nothing. Just putting
everything in place.


https://github.com/rerun-io/rerun/assets/2910679/a65281e4-9843-4598-9547-ce7e45197995

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
teh-cmc added a commit that referenced this pull request Jan 15, 2024
teh-cmc added a commit that referenced this pull request Jan 15, 2024
… range queries (#4793)

Our low-level range APIs used to bake the latest-at results at
`range.min - 1` into the range results, which is a big problem in a
multi tenant setting because `range(1, 10)` vs. `latestat(1) + range(2,
10)` are two completely different things.

Side-effect: a plot with a window of len 1 now behaves as expected:



https://github.com/rerun-io/rerun/assets/2910679/957ac367-35a6-4bea-9f40-59d51c556639

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
teh-cmc added a commit that referenced this pull request Jan 15, 2024
The most obvious and most important performance optimization when doing
cached range queries: only upsert data at the edges of the bucket /
ring-buffer.

This works because our buckets (well, singular, at the moment) are
always dense.

- #4793  

![image](https://github.com/rerun-io/rerun/assets/2910679/7246827c-4977-4b3f-9ef9-f8e96b8a9bea)
- #4800:

![image](https://github.com/rerun-io/rerun/assets/2910679/ab78643b-a98b-4568-b510-2b8827467095)

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
teh-cmc added a commit that referenced this pull request Jan 23, 2024
Range queries used to A) return the frame a T-1, B) accumulate state
starting at T-1 and then C) yield frames starting at T.

A) was a huge issue for many reasons, which #4793 took care of by
eliminating both A) and B).

But we need B) for range queries to be context-free, i.e. to be
guaranteed that `Range(5, 10)` and `Range(4, 10)` will return the exact
same data for frame `5`.
This is crucial for multi-tenant settings where those 2 example queries
would share the same cache.

It also is the nicer-nicer version of the range semantics that we wanted
anyway, I just didn't realize back then that it would require so little
changes, or I would've gone straight for that.

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
- #4851
- #4852
- #4853
- #4856
teh-cmc added a commit that referenced this pull request Jan 23, 2024
Simply add a timeless path for the range cache, and actually only
iterate over the range the user asked for (we were still blindly
iterating over everything until now).

Also some very minimal clean up related to #4832, but we have a long way
to go...
- #4832

---

- Fixes #4821 

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
- #4851
- #4852
- #4853
- #4856
teh-cmc added a commit that referenced this pull request Jan 23, 2024
Implement range invalidation and do a quality pass over all the size
tracking stuff in the cache.

**Range caching is now enabled by default!**

- Fixes #4809 
- Fixes #374

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
- #4851
- #4852
- #4853
- #4856
teh-cmc added a commit that referenced this pull request Jan 23, 2024
- Quick sanity pass over all the intermediary locks and refcounts to
make sure we don't hold anything for longer than we need.
- Get rid of all static globals and let the caches live with their
associated stores in `EntityDb`.
- `CacheKey` no longer requires a `StoreId`.

---

- Fixes #4815 

---

Part of the primary caching series of PR (index search, joins,
deserialization):
- #4592
- #4593
- #4659
- #4680 
- #4681
- #4698
- #4711
- #4712
- #4721 
- #4726 
- #4773
- #4784
- #4785
- #4793
- #4800
- #4851
- #4852
- #4853
- #4856
@teh-cmc teh-cmc added include in changelog and removed exclude from changelog PRs with this won't show up in CHANGELOG.md labels Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include in changelog 🚀 performance Optimization, memory use, etc 🔍 re_query affects re_query itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants