Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddtrace/tracer: optimize span tags storage in the hot path #2799

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

darccio
Copy link
Member

@darccio darccio commented Jul 26, 2024

What does this PR do?

Implements an automated optimizer based on selecting the best size for maps to store metas and metrics (also known as tags). It's going to be based on sketches-go and it'll be configurable, so costumers can choose the percentile used to initialize the Meta map size.

Also considering if there is benefit on pooling maps that are returned when the span is finished.

Motivation

Gathered data from our intake shows that each services has different usage patterns and requirements for tags. While some services may generate spans with a minimal number of tags (around 5 meta tags with no metrics), others may generate thousands of meta tags or thousands of metric tags.

In general, our internal data offers insights like an 80/20 proportion between meta tags (strings) and metric tags (float64).

This points that there may be wildly different needs in the same organization among services. So, this proposal tries to offer a solution that adapts better than just applying some sort of universal or average that doesn't apply to all services.

Pros

This approach has multiple advantages:

  • It reduces maps' regrowth, as maps will tend to have an average size.
  • It automatically adapts to each service's workload memory usage.

Cons

Some disadvantages are:

  • It'll consume more memory than required if newer spans use less tags.

Benchmarks

Original: map-based

The following results show the impact, and prove the idea, that map allocations have in performance:

BenchmarkSpanSetMeta/baseline-10                100000000               12.19 ns/op              1.000 tags/op         0 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=8)-10               447457              2372 ns/op                28.00 tags/op       3661 B/op          4 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=14)-10              748792              1545 ns/op                28.00 tags/op       1586 B/op          1 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=22)-10              780546              1455 ns/op                28.00 tags/op       1589 B/op          1 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=31)-10             1000000              1002 ns/op                28.00 tags/op        300 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=39)-10             1202672              1019 ns/op                28.00 tags/op        300 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=43)-10             1247433              1025 ns/op                28.00 tags/op        299 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=54)-10             1315690               893.9 ns/op              28.00 tags/op          0 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=70)-10             1407741               863.2 ns/op              28.00 tags/op          0 B/op          0 allocs/op

The average number of tags is based on the distribution obtained from our intake. Baseline is just setting a single tag over and over (b.N times) to map with size of 5. Allocations are caused by map's regrowth, and the average time corresponds to setting from 5 to 70 tags per span - selected randomly by the distribution-based number generator - on b.N span.

The upcoming Go 1.24 shows different results from previous versions due to swiss maps enabled by default.

BenchmarkSpanSetMeta/baseline-10                53694176                21.88 ns/op              1.000 tags/op         0 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=8)-10               534418              2026 ns/op                28.00 tags/op       3178 B/op          6 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=14)-10              740268              1626 ns/op                28.00 tags/op       2280 B/op          2 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=22)-10             1000000              1133 ns/op                28.00 tags/op       1214 B/op          1 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=31)-10             1614978               747.3 ns/op              28.00 tags/op         48 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=39)-10             1574341               731.1 ns/op              28.00 tags/op         48 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=43)-10             1489554               736.3 ns/op              28.00 tags/op         49 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=54)-10             1504778               736.9 ns/op              28.00 tags/op         48 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=70)-10             1441506               935.5 ns/op              28.00 tags/op          0 B/op          0 allocs/op

Other implementations considered

Other approaches have been explored:

  • A linked-list of tag holder structs with a tailor pool. It used unsafe. Reference implementation here.
  • Use channels to pool tag holder structs. Not pushed to the branch, it had poor concurrent performance.
  • puzpuzpuz/xsync-based implementation. Results below.
  • jongyunha/shrinkmap-based implementation. Results below.
# github.com/puzpuzpuz/xsync

BenchmarkSpanSetMeta/baseline-10         	12127081	        98.99 ns/op	         1.000 tags/op	      48 B/op	       3 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=8)-10         	  254242	      4554 ns/op	        28.00 tags/op	    5942 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=14)-10        	  288242	      4407 ns/op	        28.00 tags/op	    5942 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=22)-10        	  275898	      4385 ns/op	        28.00 tags/op	    5942 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=31)-10        	  277908	      4278 ns/op	        28.00 tags/op	    5941 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=39)-10        	  282535	      4247 ns/op	        28.00 tags/op	    5944 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=43)-10        	  285816	      4537 ns/op	        28.00 tags/op	    5941 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=54)-10        	  301698	      4333 ns/op	        28.00 tags/op	    5943 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=70)-10        	  299659	      4131 ns/op	        28.00 tags/op	    5943 B/op	      92 allocs/op

# github.com/jongyunha/shrinkmap

BenchmarkSpanSetMeta/baseline-10         	45800142	        26.68 ns/op	         1.000 tags/op	       0 B/op	       0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=8)-10         	  228518	      4714 ns/op	        28.00 tags/op	    4354 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=14)-10        	  310806	      4780 ns/op	        28.00 tags/op	    4346 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=22)-10        	  303002	      4720 ns/op	        28.00 tags/op	    4366 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=31)-10        	  310052	      4018 ns/op	        28.00 tags/op	    4390 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=39)-10        	  317164	      4015 ns/op	        28.00 tags/op	    4341 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=43)-10        	  286071	      4449 ns/op	        28.00 tags/op	    4363 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=54)-10        	  290245	      4154 ns/op	        28.00 tags/op	    4312 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=70)-10        	  286952	      4350 ns/op	        28.00 tags/op	    4365 B/op	      16 allocs/op

These approaches were discarded because of their performance, complexity or type safety.

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

Base automatically changed from dario.castane/apmlp-87/handle-ptrs-settag to main August 5, 2024 09:40
@pr-commenter
Copy link

pr-commenter bot commented Aug 6, 2024

Benchmarks

Benchmark execution time: 2024-12-02 16:08:30

Comparing candidate commit 51cf835 in PR branch dario.castane/exp-optimize-settags with baseline commit 697219e in branch main.

Found 0 performance improvements and 8 performance regressions! Performance is the same for 50 metrics, 1 unstable metrics.

scenario:BenchmarkPartialFlushing/Disabled-24

  • 🟥 execution_time [+7.092ms; +10.172ms] or [+2.638%; +3.784%]

scenario:BenchmarkPartialFlushing/Enabled-24

  • 🟥 execution_time [+6.160ms; +9.569ms] or [+2.237%; +3.475%]

scenario:BenchmarkSetTagMetric-24

  • 🟥 execution_time [+9.520ns; +12.040ns] or [+8.033%; +10.160%]

scenario:BenchmarkSetTagString-24

  • 🟥 execution_time [+10.311ns; +13.289ns] or [+9.113%; +11.746%]

scenario:BenchmarkSetTagStringer-24

  • 🟥 execution_time [+10.961ns; +16.159ns] or [+7.834%; +11.549%]

scenario:BenchmarkSingleSpanRetention/no-rules-24

  • 🟥 execution_time [+6.987µs; +10.504µs] or [+2.950%; +4.435%]

scenario:BenchmarkSingleSpanRetention/with-rules/match-all-24

  • 🟥 execution_time [+6.893µs; +10.079µs] or [+2.887%; +4.222%]

scenario:BenchmarkSingleSpanRetention/with-rules/match-half-24

  • 🟥 execution_time [+7.761µs; +11.563µs] or [+3.261%; +4.859%]

Copy link
Contributor

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale Stuck for more than 1 month label Aug 28, 2024
@darccio darccio added do-not-merge/WIP and removed stale Stuck for more than 1 month labels Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant