Use inplace normalization and add benchmark #506

VianneyRuhlmann · 2024-06-26T11:36:17Z

What does this PR do?

Work from @paullegranddc in #458

Use inplace normalization to improve performance
Handle chars as uint instead of utf8 chars to improve performance
Add benchmark for trace-normalization

Motivation

Improve trace normalization performance

Additional Notes

How to test the change?

codecov-commenter · 2024-06-26T11:46:16Z

Codecov Report

Attention: Patch coverage is 93.96226% with 16 lines in your changes missing coverage. Please review.

Project coverage is 70.46%. Comparing base (ce2afd9) to head (d570cdf).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #506      +/-   ##
==========================================
+ Coverage   70.41%   70.46%   +0.05%     
==========================================
  Files         204      205       +1     
  Lines       27890    27939      +49     
==========================================
+ Hits        19639    19688      +49     
  Misses       8251     8251

Components	Coverage Δ
crashtracker	`16.72% <ø> (ø)`
datadog-alloc	`98.76% <ø> (ø)`
data-pipeline	`51.15% <ø> (ø)`
data-pipeline-ffi	`0.00% <ø> (ø)`
ddcommon	`86.58% <ø> (ø)`
ddcommon-ffi	`75.31% <ø> (ø)`
ddtelemetry	`59.33% <ø> (ø)`
ipc	`84.66% <ø> (ø)`
profiling	`78.63% <ø> (ø)`
profiling-ffi	`58.19% <ø> (ø)`
serverless	`0.00% <ø> (ø)`
sidecar	`36.19% <ø> (ø)`
sidecar-ffi	`0.00% <ø> (ø)`
spawn-worker	`54.98% <ø> (ø)`
trace-mini-agent	`69.70% <ø> (ø)`
trace-normalization	`98.24% <96.29%> (+0.44%)`	⬆️
trace-obfuscation	`95.75% <ø> (ø)`
trace-protobuf	`77.16% <ø> (ø)`
trace-utils	`90.76% <68.18%> (-0.09%)`	⬇️

pr-commenter · 2024-06-28T12:00:13Z

Benchmarks

Comparison

Parameters

	Baseline	Candidate
config	baseline	candidate
git_branch	main	vianney/inplace-trace-normalization
git_commit_date	1720105540	1720183895
git_commit_sha	`3142b2d`	`899ff91`
iterations	706.0	713.0

See matching parameters

	Baseline	Candidate
ci_job_date	1720184138	1720184138
ci_job_id	563452051	563452051
ci_pipeline_id	38437982	38437982
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

Candidate

Candidate benchmark details

Group 1

iterations	config	cpu_model	ci_job_date	ci_job_id	ci_pipeline_id	git_commit_sha	git_commit_date	git_branch
713.0	candidate	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	1720184138	563452051	38437982	`899ff91`	1720183895	vianney/inplace-trace-normalization

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
sql/obfuscate_sql_string	execution_time	70.083µs	70.143µs ± 0.030µs	70.139µs ± 0.020µs	70.162µs	70.194µs	70.212µs	70.219µs	0.11%	0.249	-0.426	0.04%	0.003µs	1	100

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
sql/obfuscate_sql_string	execution_time	[70.137µs; 70.149µs] or [-0.008%; +0.008%]	None	None	None

Warnings

Scenario: sql/obfuscate_sql_string, Metric: execution_time

Measurements are autocorrelated.

Autocorrelation is present for lags 1..10.

The measurements are not independent, thus confidence intervals
may be less precise.
---------------------------------------------------------------------------

Scenario: sql/obfuscate_sql_string, Metric: execution_time

Sample size is 100, which is lower than 105.

The minimal sample size in case of normal distribution to achieve significance
level of 0.05 for difference of means with effect size Cohen's d = 0.5 must be at
least 105.

The conclusions from confidence intervals may be invalid.
---------------------------------------------------------------------------

Baseline

Baseline benchmark details

Group 1

iterations	config	cpu_model	ci_job_date	ci_job_id	ci_pipeline_id	git_commit_sha	git_commit_date	git_branch
706.0	baseline	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	1720184138	563452051	38437982	`3142b2d`	1720105540	main

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
sql/obfuscate_sql_string	execution_time	70.503µs	70.595µs ± 0.028µs	70.594µs ± 0.016µs	70.611µs	70.637µs	70.661µs	70.671µs	0.11%	-0.129	0.993	0.04%	0.003µs	1	100

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
sql/obfuscate_sql_string	execution_time	[70.590µs; 70.601µs] or [-0.008%; +0.008%]	None	None	None

Warnings

Scenario: sql/obfuscate_sql_string, Metric: execution_time

Measurements are autocorrelated.

Autocorrelation is present for lags 1..10.

The measurements are not independent, thus confidence intervals
may be less precise.
---------------------------------------------------------------------------

Scenario: sql/obfuscate_sql_string, Metric: execution_time

Sample size is 100, which is lower than 105.

The minimal sample size in case of normal distribution to achieve significance
level of 0.05 for difference of means with effect size Cohen's d = 0.5 must be at
least 105.

The conclusions from confidence intervals may be invalid.
---------------------------------------------------------------------------

VianneyRuhlmann · 2024-07-03T14:00:08Z

Benchmark summary for performance compared to main:

normalization/normalize_service/normalize_service/[empty string]
time: [15.976 ns 16.035 ns 16.115 ns]
change: [+45.893% +53.746% +61.239%]
Performance has regressed.
normalization/normalize_service/normalize_service/test_ASCII
time: [23.852 ns 23.901 ns 23.959 ns]
change: [-55.481% -55.187% -54.886%]
Performance has improved.
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&() Characters**
time: [78.656 ns 78.763 ns 78.899 ns]
change: [-37.716% -37.450% -37.171%]
Performance has improved.
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて
time: [232.03 ns 232.56 ns 233.08 ns]
change: [+15.116% +15.588% +16.041%]
Performance has regressed.
normalization/normalize_service/normalize_service/A000000000000000000000000000000000000000000...
time: [192.13 ns 192.68 ns 193.33 ns]
change: [-41.727% -41.477% -41.240%]
Performance has improved.
normalization/normalize_name/normalize_name/good
time: [12.262 ns 12.283 ns 12.305 ns]
change: [-58.695% -58.573% -58.440%]
Performance has improved.
normalization/normalize_name/normalize_name/bad-name
time: [15.269 ns 15.332 ns 15.412 ns]
change: [-77.235% -76.850% -76.358%]
Performance has improved.
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...
time: [121.96 ns 122.10 ns 122.27 ns]
change: [-77.022% -76.825% -76.491%]
Performance has improved.
normalization/normalize_trace/test_trace
time: [119.46 ns 120.80 ns 123.10 ns]
change: [-66.853% -66.205% -65.721%]
Performance has improved.

The regression for the empty service name case is probably due to the fact that in main the default value is written by normalize_span instead of normalize_service.
There is also a regression for string using non-ASCII characters.

* Fix early exit in ascci case * Make unicode handling faster by checking lowercase class before computing the lowercasing

This reverts commit 46d2da2.

bantonsson

Looks good. Only some small questions.

trace-normalization/src/normalize_utils.rs

Use inplace normalization and add benchmark

f541477

github-actions bot added the mini-agent label Jun 26, 2024

VianneyRuhlmann added the data-pipeline label Jun 26, 2024

VianneyRuhlmann self-assigned this Jun 26, 2024

VianneyRuhlmann force-pushed the vianney/inplace-trace-normalization branch from d951b20 to d6a0f46 Compare June 26, 2024 13:08

github-actions bot removed the data-pipeline label Jun 26, 2024

Fix clippy errors

15186f3

VianneyRuhlmann force-pushed the vianney/inplace-trace-normalization branch from d6a0f46 to 15186f3 Compare June 26, 2024 13:11

VianneyRuhlmann added 2 commits June 26, 2024 16:57

Remove normalization result in benchmark

6a12d1c

Merge branch 'main' into vianney/inplace-trace-normalization

40b0f4a

VianneyRuhlmann marked this pull request as ready for review June 26, 2024 15:49

VianneyRuhlmann requested review from a team as code owners June 26, 2024 15:49

Merge branch 'main' into vianney/inplace-trace-normalization

2c3e69a

VianneyRuhlmann added 2 commits July 2, 2024 10:54

Merge branch 'main' into vianney/inplace-trace-normalization

1dd4304

unwrap loop in bench

e7822db

paullegranddc and others added 4 commits July 4, 2024 11:07

* Fix ascii range (#520)

21f938d

* Fix early exit in ascci case * Make unicode handling faster by checking lowercase class before computing the lowercasing

Add benchmark to ci

46d2da2

Revert "Add benchmark to ci"

a76f22f

This reverts commit 46d2da2.

Merge branch 'main' into vianney/inplace-trace-normalization

ab18944

VianneyRuhlmann force-pushed the vianney/inplace-trace-normalization branch 2 times, most recently from d570cdf to 32e23d9 Compare July 4, 2024 13:37

Add macro benchmark

be613d2

VianneyRuhlmann force-pushed the vianney/inplace-trace-normalization branch from 32e23d9 to be613d2 Compare July 4, 2024 13:42

rename benchmark

4f6aecf

rename benchmark

4ecddd0

VianneyRuhlmann force-pushed the vianney/inplace-trace-normalization branch from 43f7d45 to 4ecddd0 Compare July 4, 2024 13:59

bantonsson approved these changes Jul 5, 2024

View reviewed changes

trace-normalization/src/normalize_utils.rs Show resolved Hide resolved

trace-normalization/src/normalize_utils.rs Show resolved Hide resolved

Merge branch 'main' into vianney/inplace-trace-normalization

899ff91

VianneyRuhlmann merged commit 6f91123 into main Jul 5, 2024
32 checks passed

VianneyRuhlmann deleted the vianney/inplace-trace-normalization branch July 5, 2024 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use inplace normalization and add benchmark #506

Use inplace normalization and add benchmark #506

VianneyRuhlmann commented Jun 26, 2024

codecov-commenter commented Jun 26, 2024 •

edited

Loading

pr-commenter bot commented Jun 28, 2024 •

edited

Loading

Group 1

Warnings

Group 1

Warnings

VianneyRuhlmann commented Jul 3, 2024 •

edited

Loading

bantonsson left a comment

Use inplace normalization and add benchmark #506

Use inplace normalization and add benchmark #506

Conversation

VianneyRuhlmann commented Jun 26, 2024

What does this PR do?

Motivation

Additional Notes

How to test the change?

codecov-commenter commented Jun 26, 2024 • edited Loading

Codecov Report

pr-commenter bot commented Jun 28, 2024 • edited Loading

Benchmarks

Comparison

Parameters

Summary

Candidate

Group 1

Warnings

Baseline

Group 1

Warnings

VianneyRuhlmann commented Jul 3, 2024 • edited Loading

bantonsson left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 26, 2024 •

edited

Loading

pr-commenter bot commented Jun 28, 2024 •

edited

Loading

VianneyRuhlmann commented Jul 3, 2024 •

edited

Loading