Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use inplace normalization and add benchmark #506

Merged
merged 15 commits into from
Jul 5, 2024

Conversation

VianneyRuhlmann
Copy link
Contributor

What does this PR do?

Work from @paullegranddc in #458

  • Use inplace normalization to improve performance
  • Handle chars as uint instead of utf8 chars to improve performance
  • Add benchmark for trace-normalization

Motivation

Improve trace normalization performance

Additional Notes

How to test the change?

@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2024

Codecov Report

Attention: Patch coverage is 93.96226% with 16 lines in your changes missing coverage. Please review.

Project coverage is 70.46%. Comparing base (ce2afd9) to head (d570cdf).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #506      +/-   ##
==========================================
+ Coverage   70.41%   70.46%   +0.05%     
==========================================
  Files         204      205       +1     
  Lines       27890    27939      +49     
==========================================
+ Hits        19639    19688      +49     
  Misses       8251     8251              
Components Coverage Δ
crashtracker 16.72% <ø> (ø)
datadog-alloc 98.76% <ø> (ø)
data-pipeline 51.15% <ø> (ø)
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 86.58% <ø> (ø)
ddcommon-ffi 75.31% <ø> (ø)
ddtelemetry 59.33% <ø> (ø)
ipc 84.66% <ø> (ø)
profiling 78.63% <ø> (ø)
profiling-ffi 58.19% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 36.19% <ø> (ø)
sidecar-ffi 0.00% <ø> (ø)
spawn-worker 54.98% <ø> (ø)
trace-mini-agent 69.70% <ø> (ø)
trace-normalization 98.24% <96.29%> (+0.44%) ⬆️
trace-obfuscation 95.75% <ø> (ø)
trace-protobuf 77.16% <ø> (ø)
trace-utils 90.76% <68.18%> (-0.09%) ⬇️

@VianneyRuhlmann VianneyRuhlmann force-pushed the vianney/inplace-trace-normalization branch from d6a0f46 to 15186f3 Compare June 26, 2024 13:11
@VianneyRuhlmann VianneyRuhlmann marked this pull request as ready for review June 26, 2024 15:49
@VianneyRuhlmann VianneyRuhlmann requested review from a team as code owners June 26, 2024 15:49
@pr-commenter
Copy link

pr-commenter bot commented Jun 28, 2024

Benchmarks

Comparison

Parameters

Baseline Candidate
config baseline candidate
git_branch main vianney/inplace-trace-normalization
git_commit_date 1720105540 1720183895
git_commit_sha 3142b2d 899ff91
iterations 706.0 713.0
See matching parameters
Baseline Candidate
ci_job_date 1720184138 1720184138
ci_job_id 563452051 563452051
ci_pipeline_id 38437982 38437982
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

Candidate

Candidate benchmark details

Group 1

iterations config cpu_model ci_job_date ci_job_id ci_pipeline_id git_commit_sha git_commit_date git_branch
713.0 candidate Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 1720184138 563452051 38437982 899ff91 1720183895 vianney/inplace-trace-normalization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 70.083µs 70.143µs ± 0.030µs 70.139µs ± 0.020µs 70.162µs 70.194µs 70.212µs 70.219µs 0.11% 0.249 -0.426 0.04% 0.003µs 1 100
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [70.137µs; 70.149µs] or [-0.008%; +0.008%] None None None

Warnings

Scenario: sql/obfuscate_sql_string, Metric: execution_time

Measurements are autocorrelated.

Autocorrelation is present for lags 1..10.

The measurements are not independent, thus confidence intervals
may be less precise.
---------------------------------------------------------------------------
Scenario: sql/obfuscate_sql_string, Metric: execution_time

Sample size is 100, which is lower than 105.

The minimal sample size in case of normal distribution to achieve significance
level of 0.05 for difference of means with effect size Cohen's d = 0.5 must be at
least 105.

The conclusions from confidence intervals may be invalid.
---------------------------------------------------------------------------

Baseline

Baseline benchmark details

Group 1

iterations config cpu_model ci_job_date ci_job_id ci_pipeline_id git_commit_sha git_commit_date git_branch
706.0 baseline Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 1720184138 563452051 38437982 3142b2d 1720105540 main
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 70.503µs 70.595µs ± 0.028µs 70.594µs ± 0.016µs 70.611µs 70.637µs 70.661µs 70.671µs 0.11% -0.129 0.993 0.04% 0.003µs 1 100
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [70.590µs; 70.601µs] or [-0.008%; +0.008%] None None None

Warnings

Scenario: sql/obfuscate_sql_string, Metric: execution_time

Measurements are autocorrelated.

Autocorrelation is present for lags 1..10.

The measurements are not independent, thus confidence intervals
may be less precise.
---------------------------------------------------------------------------
Scenario: sql/obfuscate_sql_string, Metric: execution_time

Sample size is 100, which is lower than 105.

The minimal sample size in case of normal distribution to achieve significance
level of 0.05 for difference of means with effect size Cohen's d = 0.5 must be at
least 105.

The conclusions from confidence intervals may be invalid.
---------------------------------------------------------------------------

@VianneyRuhlmann
Copy link
Contributor Author

VianneyRuhlmann commented Jul 3, 2024

Benchmark summary for performance compared to main:

  • normalization/normalize_service/normalize_service/[empty string]
    time: [15.976 ns 16.035 ns 16.115 ns]
    change: [+45.893% +53.746% +61.239%]
    Performance has regressed.
  • normalization/normalize_service/normalize_service/test_ASCII
    time: [23.852 ns 23.901 ns 23.959 ns]
    change: [-55.481% -55.187% -54.886%]
    Performance has improved.
  • normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&() Characters**
    time: [78.656 ns 78.763 ns 78.899 ns]
    change: [-37.716% -37.450% -37.171%]
    Performance has improved.
  • normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて
    time: [232.03 ns 232.56 ns 233.08 ns]
    change: [+15.116% +15.588% +16.041%]
    Performance has regressed.
  • normalization/normalize_service/normalize_service/A000000000000000000000000000000000000000000...
    time: [192.13 ns 192.68 ns 193.33 ns]
    change: [-41.727% -41.477% -41.240%]
    Performance has improved.
  • normalization/normalize_name/normalize_name/good
    time: [12.262 ns 12.283 ns 12.305 ns]
    change: [-58.695% -58.573% -58.440%]
    Performance has improved.
  • normalization/normalize_name/normalize_name/bad-name
    time: [15.269 ns 15.332 ns 15.412 ns]
    change: [-77.235% -76.850% -76.358%]
    Performance has improved.
  • normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...
    time: [121.96 ns 122.10 ns 122.27 ns]
    change: [-77.022% -76.825% -76.491%]
    Performance has improved.
  • normalization/normalize_trace/test_trace
    time: [119.46 ns 120.80 ns 123.10 ns]
    change: [-66.853% -66.205% -65.721%]
    Performance has improved.

The regression for the empty service name case is probably due to the fact that in main the default value is written by normalize_span instead of normalize_service.
There is also a regression for string using non-ASCII characters.

paullegranddc and others added 4 commits July 4, 2024 11:07
* Fix early exit in ascci case
* Make unicode handling faster by checking lowercase class before computing the lowercasing
@VianneyRuhlmann VianneyRuhlmann force-pushed the vianney/inplace-trace-normalization branch 2 times, most recently from d570cdf to 32e23d9 Compare July 4, 2024 13:37
@VianneyRuhlmann VianneyRuhlmann force-pushed the vianney/inplace-trace-normalization branch from 32e23d9 to be613d2 Compare July 4, 2024 13:42
@VianneyRuhlmann VianneyRuhlmann force-pushed the vianney/inplace-trace-normalization branch from 43f7d45 to 4ecddd0 Compare July 4, 2024 13:59
Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only some small questions.

trace-normalization/src/normalize_utils.rs Show resolved Hide resolved
trace-normalization/src/normalize_utils.rs Show resolved Hide resolved
@VianneyRuhlmann VianneyRuhlmann merged commit 6f91123 into main Jul 5, 2024
32 checks passed
@VianneyRuhlmann VianneyRuhlmann deleted the vianney/inplace-trace-normalization branch July 5, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants