feat: improve prom write requests decode performance #3478

v0y4g3r · 2024-03-11T07:48:54Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

N/A

What's changed and what's your intention?

This PR aims to improve the decode performance of PromWriteRequest, which has been addressed in #3425, but still we observe some room for optimization.

Decoding `WriteRequest`

First we can eliminate one copy_to_bytes invocation by inling prost::encoding::bytes::merge
Then we dive into copy_to_bytes, and we observe that copy_to_bytes is way slower than Golang's byte slice operation
- For a bytes = bytes[..idx] operation repeated for 120k times (which is the case when decoding lables from 10k timeseries in a WriteRequest), Rust's Bytes takes 1.2ms, while Golang's byte slice takes 30us
- The reason for this performance difference is that Bytes also handles reference counting when built from Vec<u8>, while Golang's byte slice only has ptr, len, cap fields and Garbage collector handles the reference counting, which bring little overhead when bytes are pooled.
- Considering the bytes fields of PromLabel are short-lived and are converted to string and added to TableBuilder soon after a PromTimeseries decoding is finished, so that the original decompressed Bytes always outlive PromLabel::name and PromLabel::value, we can introduce some unsafe operation, such as directly construct a new Bytes from raw pointer and len/cap fields, that's what servers::proto::copy_to_bytes does.
- For the same test mentioned above, servers::proto::copy_to_bytes takes 200us, still slower than Golang, but fairly acceptable.
- With this optimization, decoding a 10k timeseries WriteRequest cost time further improved from 3.0ms to 1.8ms. For refence, VictoriaMetrics' WriteRequets decoding takes 1.2ms.

Decode-only benchmark result:

Running benches/bench_prom.rs (target/release/deps/bench_prom-9768faca31abd429)
Gnuplot not found, using plotters backend 
decode/write_request  time:     [7.1515 ms 7.1673 ms 7.1832 ms]
                      change:   [-0.7329% -0.3899% -0.0453%]  (p=0.03<0.05)
                      Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

decode/prom_write_request
                        time:   [1.8225 ms 1.8308 ms 1.8379 ms]
                        change: [-1.2093% -0.5679% +0. 0813%] (p = 0.08 › 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  15 (15.00%) low mild
  1 (1.00%) high mild

Building `RowInsertRequests`

Aside from decoding, we also find out that PromWriteRequest::as_row_insert_requests takes more time than decoding.

During decoding PromTimeseries, we need to build the schemas for each metric (table), which involves two hashmap lookups per label, that sums to 120k hashmap lookup for a 10k timeseries request.
We can use ahasher hashmap instead of Rust's default siphasher hashmap, to get a roughly 40% improvement in terms of hashtable lookups (Rust's default hashmap is quite slow btw).
Compared to siphasher, ahasher is not a cryptographically secure hasher, but it doesn't matter in our case.
I also compared other hashmaps like fxhashmap which is used internally in rustc, but did not notice significant improvent.

Results

Still, decoding a WriteRequest with 10k timeseries and 60k labels and convert it to RowInsertRequests:

     Running benches/bench_prom.rs (target/release/deps/bench_prom-ac6fa2abc4d24957)
decode/write_request    time:   [13.816 ms 13.882 ms 13.954 ms]
                        change: [-0.9281% -0.3477% +0.2455%] (p = 0.23 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
decode/prom_write_request
                        time:   [5.7520 ms 5.7591 ms 5.7666 ms]
                        change: [+0.5254% +0.7116% +0.9096%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Checklist

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR does not require documentation updates.

codecov · 2024-03-11T08:34:31Z

Codecov Report

Attention: Patch coverage is 73.68421% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 84.93%. Comparing base (a9d42f7) to head (07710c6).
Report is 24 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3478      +/-   ##
==========================================
- Coverage   85.44%   84.93%   -0.51%     
==========================================
  Files         895      900       +5     
  Lines      147093   149647    +2554     
==========================================
+ Hits       125685   127105    +1420     
- Misses      21408    22542    +1134

killme2008 · 2024-03-11T10:03:21Z

Great job! Do you think we could submit these enhancements to the main prost project?

src/servers/src/prom_row_builder.rs

src/servers/src/proto.rs

v0y4g3r · 2024-03-11T11:22:40Z

Great job! Do you think we could submit these enhancements to the main prost project?

No, this optimization is safe if and only if:

Buf is backed by a continuous byte array
callers can ensure the decode result never outlive original Bytes.

src/servers/src/proto.rs

src/servers/src/prom_row_builder.rs

src/servers/src/proto.rs

tisonkun · 2024-03-13T07:56:56Z

src/servers/src/proto.rs

                    }
                    if buf.remaining() != limit {
                        return Err(DecodeError::new("delimited length exceeded"));
                    }
                    self.series.add_to_table_data(&mut self.table_data);
                }
                3u32 => {
-                    // we can ignore metadata for now.
-                    prost::encoding::skip_field(wire_type, tag, &mut buf, ctx.clone())?;
+                    // todo(hl): metadata are skipped.


What exactly the TODO is?

May we create an issue for it with a bit description?

tisonkun · 2024-03-14T03:24:05Z

src/servers/src/proto.rs

        }
    }
 }

+#[inline(always)]
+fn copy_to_bytes(data: &mut Bytes, len: usize) -> Bytes {


IIRC this is already the impl of bytes::Bytes?

We copy it here for inline?

https://github.com/tokio-rs/bytes/blob/ca004117f86afccd36148dee7c8413cfaf9de6a4/src/bytes.rs#L583-L591

feat: optimize decode performance

12dbc81

github-actions bot added the docs-not-required This change does not impact docs. label Mar 11, 2024

v0y4g3r marked this pull request as ready for review March 11, 2024 08:22

v0y4g3r requested review from evenyag and waynexia March 11, 2024 08:25

killme2008 reviewed Mar 11, 2024

View reviewed changes

src/servers/src/prom_row_builder.rs Show resolved Hide resolved

src/servers/src/proto.rs Outdated Show resolved Hide resolved

src/servers/src/proto.rs Outdated Show resolved Hide resolved

src/servers/src/proto.rs Outdated Show resolved Hide resolved

evenyag reviewed Mar 11, 2024

View reviewed changes

src/servers/src/proto.rs Outdated Show resolved Hide resolved

src/servers/src/proto.rs Show resolved Hide resolved

src/servers/src/proto.rs Show resolved Hide resolved

waynexia reviewed Mar 12, 2024

View reviewed changes

src/servers/src/prom_row_builder.rs Show resolved Hide resolved

src/servers/src/proto.rs Show resolved Hide resolved

fix: some cr comments

07710c6

v0y4g3r requested review from killme2008, evenyag and waynexia March 12, 2024 09:30

waynexia enabled auto-merge March 12, 2024 11:45

waynexia approved these changes Mar 12, 2024

View reviewed changes

evenyag approved these changes Mar 12, 2024

View reviewed changes

waynexia added this pull request to the merge queue Mar 12, 2024

Merged via the queue into GreptimeTeam:main with commit 9afe327 Mar 12, 2024
18 checks passed

tisonkun reviewed Mar 13, 2024

View reviewed changes

tisonkun reviewed Mar 14, 2024

View reviewed changes

tisonkun mentioned this pull request Mar 15, 2024

feat: decode prom requests to grpc #3425

Merged

3 tasks

tisonkun mentioned this pull request Mar 25, 2024

refactor: avoid the GitHelper and instead keep data on the stack korandoru/hawkeye#126

Merged

tisonkun mentioned this pull request Apr 9, 2024

feat(changelog): support count_tags option orhun/git-cliff#599

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve prom write requests decode performance #3478

feat: improve prom write requests decode performance #3478

v0y4g3r commented Mar 11, 2024 •

edited

Loading

codecov bot commented Mar 11, 2024 •

edited

Loading

killme2008 commented Mar 11, 2024

v0y4g3r commented Mar 11, 2024

tisonkun Mar 13, 2024

tisonkun Mar 14, 2024

feat: improve prom write requests decode performance #3478

feat: improve prom write requests decode performance #3478

Conversation

v0y4g3r commented Mar 11, 2024 • edited Loading

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Decoding WriteRequest

Building RowInsertRequests

Results

Checklist

codecov bot commented Mar 11, 2024 • edited Loading

Codecov Report

killme2008 commented Mar 11, 2024

v0y4g3r commented Mar 11, 2024

tisonkun Mar 13, 2024

Choose a reason for hiding this comment

tisonkun Mar 14, 2024

Choose a reason for hiding this comment

v0y4g3r commented Mar 11, 2024 •

edited

Loading

Decoding `WriteRequest`

Building `RowInsertRequests`

codecov bot commented Mar 11, 2024 •

edited

Loading