Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datadog_traces sink): APM stats payloads are sent independent of trace payloads and at a set interval. #15084

Merged
merged 18 commits into from
Nov 7, 2022

Conversation

neuronull
Copy link
Contributor

@neuronull neuronull commented Nov 2, 2022

Reviewer notes:

Behavioral changes:

  • APM stats are now "flushed" in a separate thread, detached from the sink's stream loop.
  • The stats flushing thread flushes the oldest bucket every 10 seconds, and caches the last two 10 second buckets.
  • When sink is shutting down, the APM stats thread flushes remaining buckets.

Refactor (re-organizing only) changes:

  • The single file stats.rs was up to 1.2k lines of code. That's too much for my sanity. So I reorganized the code to more closely match the Agent's Golang code organization.
  • Note that there wasn't any need to create new structures or anything like that as part of the re-org- I simple moved logic from one file to a handful of smaller files.
  • Took the opportunity to add some more documentation/comments to the code

Manual testing results

The two plotted metrics should be "collapsed" into the same line.

// Agent only (base case)

agent_only_latest

// Agent -> Vector (before fix)

vector_and_agent_before_fix

// Agent -> Vector (after fix)

vector_and_agent_with_fix

@neuronull neuronull self-assigned this Nov 2, 2022
@netlify
Copy link

netlify bot commented Nov 2, 2022

Deploy Preview for vector-project ready!

Name Link
🔨 Latest commit 83c5c55
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/63692af223c81b00092d5c43
😎 Deploy Preview https://deploy-preview-15084--vector-project.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions github-actions bot added domain: ci Anything related to Vector's CI environment domain: sinks Anything related to the Vector's sinks labels Nov 2, 2022
@netlify
Copy link

netlify bot commented Nov 2, 2022

Deploy Preview for vrl-playground canceled.

Name Link
🔨 Latest commit 83c5c55
🔍 Latest deploy log https://app.netlify.com/sites/vrl-playground/deploys/63692af014e54c000852d86b

@neuronull neuronull added the ci-condition: integration tests enable Run integration tests on this PR label Nov 2, 2022
@neuronull neuronull changed the title Neuronull/apm stats fix payload flushing fix(datadog_traces sink): APM stats payloads are sent independent of trace payloads and at a set interval. Nov 2, 2022
@neuronull neuronull requested a review from a team November 2, 2022 21:57
@neuronull neuronull marked this pull request as ready for review November 2, 2022 22:25
@neuronull neuronull added sink: datadog_traces Anything `datadog_traces` sink related domain: enterprise Anything related to Vector's enterprise features (Observability Pipelines) and removed domain: enterprise Anything related to Vector's enterprise features (Observability Pipelines) labels Nov 2, 2022
@jszwedko jszwedko requested a review from tobz November 3, 2022 15:22
Copy link
Contributor

@spencergilbert spencergilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few nity comments, just around the inconsistent grammar/capitalization of code comments (I didn't note them all...) - and then one question on an error type. Otherwise the logic seems sound, and the refactoring is appreciated.

src/sinks/datadog/traces/sink.rs Outdated Show resolved Hide resolved
src/sinks/datadog/traces/sink.rs Outdated Show resolved Hide resolved
src/sinks/datadog/traces/request_builder.rs Outdated Show resolved Hide resolved
src/sinks/datadog/traces/request_builder.rs Show resolved Hide resolved
src/sinks/datadog/traces/config.rs Outdated Show resolved Hide resolved
src/sinks/datadog/traces/apm_stats/flusher.rs Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Nov 3, 2022

Soak Test Results

Baseline: d620352
Comparison: d41efba
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
socket_to_socket_blackhole 338.69KiB 1.41 100.00% 23.41MiB 230.5KiB 4.71KiB 0 0.00961511 23.74MiB 368.57KiB 7.52KiB 0 0.0151601 False False
http_pipelines_blackhole_acks 13.18KiB 1.07 100.00% 1.2MiB 117.52KiB 2.39KiB 0 0.0953332 1.22MiB 71.42KiB 1.46KiB 0 0.0573226 False False
http_text_to_http_json 404.3KiB 1.02 100.00% 38.56MiB 806.79KiB 16.47KiB 0 0.0204266 38.96MiB 788.0KiB 16.09KiB 0 0.0197487 False False
datadog_agent_remap_blackhole_acks 505.11KiB 0.87 100.00% 56.74MiB 4.48MiB 93.31KiB 0 0.0789903 57.23MiB 3.22MiB 67.39KiB 0 0.0562781 False False
syslog_log2metric_humio_metrics 67.54KiB 0.7 100.00% 9.36MiB 188.18KiB 3.84KiB 0 0.0196202 9.43MiB 427.12KiB 8.7KiB 0 0.0442226 False False
datadog_agent_remap_blackhole 239.56KiB 0.42 98.31% 56.37MiB 3.62MiB 75.42KiB 0 0.064195 56.6MiB 3.17MiB 66.01KiB 0 0.0559203 False False
splunk_hec_to_splunk_hec_logs_noack 21.02KiB 0.09 91.09% 23.82MiB 503.11KiB 10.27KiB 0 0.0206243 23.84MiB 337.19KiB 6.88KiB 0 0.0138108 False False
splunk_hec_to_splunk_hec_logs_acks 12.0KiB 0.05 37.91% 23.75MiB 860.24KiB 17.5KiB 0 0.0353627 23.76MiB 825.73KiB 16.8KiB 0 0.0339272 False False
enterprise_http_to_http -679.96B -0 7.09% 23.85MiB 258.6KiB 5.28KiB 0 0.0105884 23.85MiB 257.61KiB 5.27KiB 0 0.010548 False False
file_to_blackhole -45.43KiB -0.05 38.14% 95.34MiB 2.97MiB 61.49KiB 0 0.0311048 95.3MiB 3.24MiB 67.41KiB 0 0.0340091 False False
splunk_hec_indexer_ack_blackhole -15.03KiB -0.06 43.47% 23.76MiB 888.89KiB 18.09KiB 0 0.0365303 23.74MiB 927.53KiB 18.86KiB 0 0.0381418 False False
splunk_hec_route_s3 -31.43KiB -0.14 39.29% 21.5MiB 2.17MiB 45.22KiB 0 0.101004 21.47MiB 1.96MiB 41.1KiB 0 0.0914647 False False
http_pipelines_blackhole -2.79KiB -0.17 73.08% 1.61MiB 48.21KiB 1008.58B 0 0.0293253 1.6MiB 114.22KiB 2.33KiB 0 0.0695965 False False
http_to_http_json -44.56KiB -0.18 99.84% 23.85MiB 374.36KiB 7.64KiB 0 0.0153276 23.8MiB 583.65KiB 11.91KiB 0 0.0239401 False False
syslog_regex_logs2metric_ddmetrics -20.03KiB -0.26 53.77% 7.57MiB 941.26KiB 19.17KiB 0 0.121481 7.55MiB 950.28KiB 19.37KiB 0 0.122962 False False
http_to_http_noack -86.59KiB -0.35 99.98% 23.84MiB 413.74KiB 8.46KiB 0 0.0169462 23.75MiB 1.05MiB 21.93KiB 0 0.0442703 False False
datadog_agent_remap_datadog_logs -295.28KiB -0.54 99.97% 53.47MiB 1.38MiB 28.83KiB 0 0.0257138 53.18MiB 3.68MiB 76.71KiB 0 0.0692506 False False
syslog_splunk_hec_logs -105.4KiB -0.66 100.00% 15.65MiB 847.08KiB 17.23KiB 0 0.0528586 15.54MiB 714.78KiB 14.58KiB 0 0.0448982 False False
datadog_agent_remap_datadog_logs_acks -435.15KiB -0.78 100.00% 54.58MiB 2.89MiB 60.47KiB 0 0.0529923 54.16MiB 3.92MiB 81.57KiB 0 0.0723421 False False
fluent_elasticsearch -756.47KiB -0.93 100.00% 79.47MiB 52.54KiB 1.06KiB 0 0.000645492 78.73MiB 6.46MiB 132.38KiB 0 0.0820314 False False
http_pipelines_no_grok_blackhole -107.6KiB -0.98 100.00% 10.72MiB 253.72KiB 5.18KiB 0 0.0231167 10.61MiB 1.05MiB 21.93KiB 0 0.099232 False False
syslog_humio_logs -186.09KiB -1.12 100.00% 16.27MiB 109.76KiB 2.24KiB 0 0.00658634 16.09MiB 113.9KiB 2.33KiB 0 0.00691225 False False
syslog_loki -351.79KiB -2.23 100.00% 15.39MiB 545.31KiB 11.15KiB 0 0.0346047 15.04MiB 842.1KiB 17.12KiB 0 0.0546594 False False
syslog_log2metric_splunk_hec_metrics -485.0KiB -2.98 100.00% 15.87MiB 686.41KiB 13.99KiB 0 0.0422294 15.4MiB 821.2KiB 16.72KiB 0 0.0520758 False False
http_to_http_acks -704.89KiB -3.91 99.59% 17.59MiB 8.24MiB 172.13KiB 0 0.468107 16.9MiB 8.36MiB 174.75KiB 0 0.494682 True True

tobz
tobz previously requested changes Nov 3, 2022
Copy link
Contributor

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are sort of a "take it or leave it", and I would have otherwise approved... but the possible infinite loop bug on the error path in flush_apm_stats_thread actually stands out as a potentially big issue to me.

src/sinks/datadog/traces/apm_stats/flusher.rs Outdated Show resolved Hide resolved
src/sinks/datadog/traces/apm_stats/flusher.rs Outdated Show resolved Hide resolved
src/sinks/datadog/traces/apm_stats/flusher.rs Outdated Show resolved Hide resolved
src/sinks/datadog/traces/apm_stats/flusher.rs Show resolved Hide resolved
src/sinks/datadog/traces/sink.rs Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Nov 4, 2022

Regression Test Results

Baseline: cf73729
Comparison: b5f603f
Total vector CPUs: 4

Explanation

A regression test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their bytes_written_per_cpu_second performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5% change in mean bytes_written_per_cpu_second are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in bytes_written_per_cpu_second with confidence ≥ 90.00% and absolute Δ mean >= ±5%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_loki 445.3KiB/CPU-s 3.19 100.00% 13.65MiB/CPU-s 660.41KiB/CPU-s 8.53KiB/CPU-s 0.0 0.047235 14.09MiB/CPU-s 323.78KiB/CPU-s 4.18KiB/CPU-s 0.0 0.022443 False False
datadog_agent_remap_datadog_logs_acks 357.5KiB/CPU-s 1.07 100.00% 32.77MiB/CPU-s 1.3MiB/CPU-s 17.24KiB/CPU-s 0.0 0.039810 33.12MiB/CPU-s 802.97KiB/CPU-s 10.37KiB/CPU-s 0.0 0.023672 False False
http_to_http_acks 79.94KiB/CPU-s 0.78 52.14% 9.96MiB/CPU-s 6.09MiB/CPU-s 80.46KiB/CPU-s 0.0 0.611040 10.04MiB/CPU-s 5.98MiB/CPU-s 79.07KiB/CPU-s 0.0 0.595444 True False
socket_to_socket_blackhole 149.1KiB/CPU-s 0.64 100.00% 22.74MiB/CPU-s 544.93KiB/CPU-s 7.04KiB/CPU-s 0.0 0.023399 22.89MiB/CPU-s 354.62KiB/CPU-s 4.58KiB/CPU-s 0.0 0.015131 False False
datadog_agent_remap_datadog_logs 192.42KiB/CPU-s 0.64 100.00% 29.26MiB/CPU-s 850.87KiB/CPU-s 10.98KiB/CPU-s 0.0 0.028394 29.45MiB/CPU-s 1009.04KiB/CPU-s 13.02KiB/CPU-s 0.0 0.033458 False False
datadog_agent_remap_blackhole 240.19KiB/CPU-s 0.57 100.00% 40.81MiB/CPU-s 1.11MiB/CPU-s 14.64KiB/CPU-s 0.0 0.027125 41.05MiB/CPU-s 1.37MiB/CPU-s 18.13KiB/CPU-s 0.0 0.033438 False False
syslog_regex_logs2metric_ddmetrics 24.92KiB/CPU-s 0.41 99.60% 5.97MiB/CPU-s 456.66KiB/CPU-s 5.89KiB/CPU-s 0.0 0.074687 5.99MiB/CPU-s 491.45KiB/CPU-s 6.34KiB/CPU-s 0.0 0.080049 False False
http_pipelines_blackhole 3.81KiB/CPU-s 0.39 100.00% 966.78KiB/CPU-s 20.06KiB/CPU-s 265.41B/CPU-s 0.0 0.020751 970.59KiB/CPU-s 46.89KiB/CPU-s 619.62B/CPU-s 0.0 0.048312 False False
splunk_hec_route_s3 51.34KiB/CPU-s 0.38 99.77% 13.25MiB/CPU-s 914.11KiB/CPU-s 11.8KiB/CPU-s 0.0 0.067369 13.3MiB/CPU-s 931.52KiB/CPU-s 12.02KiB/CPU-s 0.0 0.068394 False False
syslog_humio_logs 1.66KiB/CPU-s 0.01 20.96% 13.95MiB/CPU-s 402.97KiB/CPU-s 5.2KiB/CPU-s 0.0 0.028200 13.96MiB/CPU-s 267.75KiB/CPU-s 3.46KiB/CPU-s 0.0 0.018735 False False
enterprise_http_to_http 152.29B/CPU-s 0.00 2.45% 23.84MiB/CPU-s 266.25KiB/CPU-s 3.44KiB/CPU-s 0.0 0.010905 23.84MiB/CPU-s 264.57KiB/CPU-s 3.42KiB/CPU-s 0.0 0.010836 False False
splunk_hec_to_splunk_hec_logs_acks -1.39KiB/CPU-s -0.01 6.92% 18.18MiB/CPU-s 894.99KiB/CPU-s 11.55KiB/CPU-s 0.0 0.048069 18.18MiB/CPU-s 853.48KiB/CPU-s 11.01KiB/CPU-s 0.0 0.045843 False False
splunk_hec_indexer_ack_blackhole -3.51KiB/CPU-s -0.01 35.21% 23.83MiB/CPU-s 411.01KiB/CPU-s 5.3KiB/CPU-s 0.0 0.016840 23.83MiB/CPU-s 431.67KiB/CPU-s 5.57KiB/CPU-s 0.0 0.017689 False False
fluent_elasticsearch -16.07KiB/CPU-s -0.02 82.38% 79.47MiB/CPU-s 52.78KiB/CPU-s 690.71B/CPU-s 0.0 0.000649 79.46MiB/CPU-s 928.8KiB/CPU-s 11.86KiB/CPU-s 0.0 0.011414 False False
file_to_blackhole -39.35KiB/CPU-s -0.04 12.33% 92.51MiB/CPU-s 13.54MiB/CPU-s 178.69KiB/CPU-s 0.0 0.146327 92.48MiB/CPU-s 13.65MiB/CPU-s 180.07KiB/CPU-s 0.0 0.147565 True False
http_to_http_noack -11.36KiB/CPU-s -0.05 91.49% 23.84MiB/CPU-s 264.34KiB/CPU-s 3.41KiB/CPU-s 0.0 0.010826 23.83MiB/CPU-s 437.43KiB/CPU-s 5.65KiB/CPU-s 0.0 0.017923 False False
otlp_http_to_blackhole -2.46KiB/CPU-s -0.10 40.99% 2.36MiB/CPU-s 248.33KiB/CPU-s 3.21KiB/CPU-s 0.0 0.102849 2.36MiB/CPU-s 252.13KiB/CPU-s 3.25KiB/CPU-s 0.0 0.104530 True False
http_pipelines_blackhole_acks -1.04KiB/CPU-s -0.14 96.42% 750.12KiB/CPU-s 22.17KiB/CPU-s 293.04B/CPU-s 0.0 0.029549 749.08KiB/CPU-s 31.53KiB/CPU-s 416.54B/CPU-s 0.0 0.042084 False False
splunk_hec_to_splunk_hec_logs_noack -38.77KiB/CPU-s -0.20 99.34% 18.74MiB/CPU-s 790.46KiB/CPU-s 10.2KiB/CPU-s 0.0 0.041195 18.7MiB/CPU-s 774.75KiB/CPU-s 10.0KiB/CPU-s 0.0 0.040458 False False
http_pipelines_no_grok_blackhole -23.6KiB/CPU-s -0.41 100.00% 5.64MiB/CPU-s 79.72KiB/CPU-s 1.03KiB/CPU-s 0.0 0.013813 5.61MiB/CPU-s 162.28KiB/CPU-s 2.09KiB/CPU-s 0.0 0.028231 False False
datadog_agent_remap_blackhole_acks -209.7KiB/CPU-s -0.49 100.00% 42.13MiB/CPU-s 899.2KiB/CPU-s 11.61KiB/CPU-s 0.0 0.020844 41.92MiB/CPU-s 842.32KiB/CPU-s 10.88KiB/CPU-s 0.0 0.019621 False False
http_to_http_json -141.91KiB/CPU-s -0.58 100.00% 23.84MiB/CPU-s 375.07KiB/CPU-s 4.84KiB/CPU-s 0.0 0.015362 23.7MiB/CPU-s 667.27KiB/CPU-s 8.61KiB/CPU-s 0.0 0.027491 False False
http_text_to_http_json -383.79KiB/CPU-s -1.00 100.00% 37.64MiB/CPU-s 704.81KiB/CPU-s 9.1KiB/CPU-s 0.0 0.018285 37.26MiB/CPU-s 881.95KiB/CPU-s 11.39KiB/CPU-s 0.0 0.023111 False False
syslog_log2metric_humio_metrics -81.29KiB/CPU-s -1.04 100.00% 7.66MiB/CPU-s 265.9KiB/CPU-s 3.43KiB/CPU-s 0.0 0.033904 7.58MiB/CPU-s 236.81KiB/CPU-s 3.06KiB/CPU-s 0.0 0.030511 False False
otlp_grpc_to_blackhole -17.37KiB/CPU-s -1.05 100.00% 1.62MiB/CPU-s 69.65KiB/CPU-s 920.73B/CPU-s 0.0 0.042004 1.6MiB/CPU-s 89.59KiB/CPU-s 1.16KiB/CPU-s 0.0 0.054598 False False
syslog_log2metric_splunk_hec_metrics -157.44KiB/CPU-s -1.12 100.00% 13.74MiB/CPU-s 266.26KiB/CPU-s 3.44KiB/CPU-s 0.0 0.018923 13.59MiB/CPU-s 427.69KiB/CPU-s 5.52KiB/CPU-s 0.0 0.030740 False False
syslog_splunk_hec_logs -184.65KiB/CPU-s -1.31 100.00% 13.8MiB/CPU-s 622.74KiB/CPU-s 8.04KiB/CPU-s 0.0 0.044075 13.62MiB/CPU-s 659.47KiB/CPU-s 8.52KiB/CPU-s 0.0 0.047292 False False

@github-actions
Copy link

github-actions bot commented Nov 4, 2022

Soak Test Results

Baseline: cf73729
Comparison: b5f603f
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_log2metric_splunk_hec_metrics 384.0KiB 2.33 100.00% 16.07MiB 610.14KiB 12.44KiB 0 0.0370749 16.44MiB 563.27KiB 11.48KiB 0 0.0334464 False False
syslog_humio_logs 293.01KiB 1.78 100.00% 16.09MiB 100.81KiB 2.06KiB 0 0.00611653 16.38MiB 109.05KiB 2.23KiB 0 0.00650107 False False
syslog_splunk_hec_logs 252.32KiB 1.55 100.00% 15.85MiB 722.98KiB 14.71KiB 0 0.0445225 16.1MiB 595.55KiB 12.14KiB 0 0.036114 False False
http_pipelines_blackhole_acks 14.2KiB 1.13 100.00% 1.23MiB 121.14KiB 2.46KiB 0 0.0960705 1.25MiB 62.64KiB 1.28KiB 0 0.0491195 False False
syslog_loki 135.71KiB 0.85 100.00% 15.57MiB 398.54KiB 8.15KiB 0 0.0249894 15.7MiB 834.93KiB 16.97KiB 0 0.0519101 False False
datadog_agent_remap_blackhole 288.92KiB 0.51 99.77% 55.1MiB 3.78MiB 78.64KiB 0 0.0685175 55.39MiB 2.54MiB 52.94KiB 0 0.0457995 False False
splunk_hec_route_s3 93.05KiB 0.44 83.41% 20.82MiB 2.31MiB 48.12KiB 0 0.110992 20.91MiB 2.24MiB 46.84KiB 0 0.107084 False False
datadog_agent_remap_blackhole_acks 235.1KiB 0.39 98.07% 58.98MiB 3.98MiB 82.91KiB 0 0.0674936 59.21MiB 2.71MiB 56.75KiB 0 0.0458274 False False
splunk_hec_to_splunk_hec_logs_noack 13.39KiB 0.05 55.52% 23.82MiB 514.66KiB 10.5KiB 0 0.0210993 23.83MiB 687.5KiB 14.04KiB 0 0.0281696 False False
splunk_hec_indexer_ack_blackhole 3.36KiB 0.01 9.82% 23.75MiB 925.34KiB 18.82KiB 0 0.0380464 23.75MiB 965.71KiB 19.65KiB 0 0.0397006 False False
enterprise_http_to_http 955.65B 0 10.02% 23.84MiB 255.93KiB 5.22KiB 0 0.0104793 23.85MiB 257.07KiB 5.26KiB 0 0.0105258 False False
splunk_hec_to_splunk_hec_logs_acks -1.45KiB -0.01 4.73% 23.75MiB 843.67KiB 17.16KiB 0 0.0346767 23.75MiB 850.65KiB 17.3KiB 0 0.0349656 False False
file_to_blackhole -83.23KiB -0.09 52.84% 95.34MiB 3.47MiB 71.96KiB 0 0.0364015 95.26MiB 4.35MiB 90.49KiB 0 0.0456654 False False
http_to_http_json -34.73KiB -0.14 99.06% 23.85MiB 381.83KiB 7.8KiB 0 0.0156331 23.81MiB 531.98KiB 10.87KiB 0 0.0218117 False False
syslog_regex_logs2metric_ddmetrics -15.23KiB -0.2 41.84% 7.41MiB 960.71KiB 19.57KiB 0 0.12654 7.4MiB 957.14KiB 19.51KiB 0 0.126322 False False
http_text_to_http_json -106.08KiB -0.26 100.00% 39.33MiB 1021.11KiB 20.84KiB 0 0.0253492 39.23MiB 764.12KiB 15.6KiB 0 0.0190194 False False
fluent_elasticsearch -235.35KiB -0.29 100.00% 79.47MiB 53.04KiB 1.07KiB 0 0.000651674 79.24MiB 1.95MiB 40.03KiB 0 0.0245571 False False
http_to_http_noack -95.04KiB -0.39 99.99% 23.84MiB 411.9KiB 8.43KiB 0 0.0168716 23.74MiB 1.11MiB 23.07KiB 0 0.046594 False False
socket_to_socket_blackhole -101.33KiB -0.42 100.00% 23.39MiB 148.42KiB 3.03KiB 0 0.0061962 23.29MiB 177.62KiB 3.63KiB 0 0.00744694 False False
datadog_agent_remap_datadog_logs_acks -271.13KiB -0.48 99.45% 55.06MiB 2.63MiB 54.99KiB 0 0.047756 54.79MiB 3.87MiB 80.65KiB 0 0.0707003 False False
http_pipelines_blackhole -14.21KiB -0.81 100.00% 1.71MiB 10.01KiB 209.51B 0 0.00570366 1.7MiB 123.06KiB 2.51KiB 0 0.0707137 False False
syslog_log2metric_humio_metrics -89.76KiB -0.9 100.00% 9.71MiB 252.21KiB 5.15KiB 0 0.02536 9.62MiB 417.68KiB 8.5KiB 0 0.042381 False False
datadog_agent_remap_datadog_logs -628.43KiB -1.1 100.00% 55.59MiB 234.6KiB 4.8KiB 0 0.00412073 54.97MiB 3.35MiB 69.81KiB 0 0.0609332 False False
http_pipelines_no_grok_blackhole -139.08KiB -1.28 100.00% 10.64MiB 42.77KiB 894.11B 0 0.00392276 10.51MiB 1.02MiB 21.17KiB 0 0.0967047 False False
http_to_http_acks -233.82KiB -1.3 66.26% 17.62MiB 8.26MiB 172.61KiB 0 0.468509 17.39MiB 8.24MiB 172.01KiB 0 0.473689 True True

Copy link
Contributor

@StephenWakely StephenWakely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This all looks really good to me. I really appreciate the extensive commenting here.

Just have a number of minor nits.

src/sinks/datadog/traces/apm_stats/flusher.rs Show resolved Hide resolved
src/sinks/datadog/traces/apm_stats/mod.rs Outdated Show resolved Hide resolved
src/sinks/datadog/traces/apm_stats/aggregation.rs Outdated Show resolved Hide resolved
src/sinks/datadog/traces/apm_stats/flusher.rs Show resolved Hide resolved
src/sinks/datadog/traces/sink.rs Outdated Show resolved Hide resolved
@neuronull neuronull enabled auto-merge (squash) November 7, 2022 16:46
@github-actions
Copy link

github-actions bot commented Nov 7, 2022

Regression Test Results

Baseline: ba02a47
Comparison: 83c5c55
Total vector CPUs: 4

Explanation

A regression test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their bytes_written_per_cpu_second performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±5% change in mean bytes_written_per_cpu_second are discarded. An experiment is erratic if its coefficient of variation is greater than 0.1. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in bytes_written_per_cpu_second with confidence ≥ 90.00% and absolute Δ mean >= ±5%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_log2metric_splunk_hec_metrics 319.72KiB/CPU-s 2.31 100.00% 13.52MiB/CPU-s 761.91KiB/CPU-s 9.84KiB/CPU-s 0.0 0.055029 13.83MiB/CPU-s 273.79KiB/CPU-s 3.53KiB/CPU-s 0.0 0.019328 False False
syslog_log2metric_humio_metrics 100.26KiB/CPU-s 1.27 100.00% 7.72MiB/CPU-s 238.08KiB/CPU-s 3.07KiB/CPU-s 0.0 0.030124 7.82MiB/CPU-s 216.81KiB/CPU-s 2.8KiB/CPU-s 0.0 0.027089 False False
http_pipelines_blackhole_acks 8.06KiB/CPU-s 1.07 100.00% 753.03KiB/CPU-s 18.36KiB/CPU-s 242.68B/CPU-s 0.0 0.024376 761.09KiB/CPU-s 22.43KiB/CPU-s 296.46B/CPU-s 0.0 0.029462 False False
syslog_regex_logs2metric_ddmetrics 33.48KiB/CPU-s 0.53 100.00% 6.12MiB/CPU-s 417.61KiB/CPU-s 5.39KiB/CPU-s 0.0 0.066679 6.15MiB/CPU-s 422.89KiB/CPU-s 5.46KiB/CPU-s 0.0 0.067162 False False
splunk_hec_to_splunk_hec_logs_noack 87.83KiB/CPU-s 0.46 100.00% 18.64MiB/CPU-s 805.9KiB/CPU-s 10.4KiB/CPU-s 0.0 0.042211 18.73MiB/CPU-s 789.45KiB/CPU-s 10.18KiB/CPU-s 0.0 0.041161 False False
datadog_agent_remap_datadog_logs_acks 112.1KiB/CPU-s 0.33 100.00% 32.81MiB/CPU-s 1.14MiB/CPU-s 15.1KiB/CPU-s 0.0 0.034833 32.91MiB/CPU-s 679.67KiB/CPU-s 8.78KiB/CPU-s 0.0 0.020164 False False
datadog_agent_remap_blackhole_acks 133.83KiB/CPU-s 0.31 100.00% 42.01MiB/CPU-s 929.51KiB/CPU-s 12.0KiB/CPU-s 0.0 0.021606 42.14MiB/CPU-s 1.37MiB/CPU-s 18.05KiB/CPU-s 0.0 0.032405 False False
syslog_splunk_hec_logs 19.78KiB/CPU-s 0.14 99.97% 13.92MiB/CPU-s 272.95KiB/CPU-s 3.53KiB/CPU-s 0.0 0.019154 13.93MiB/CPU-s 326.72KiB/CPU-s 4.22KiB/CPU-s 0.0 0.022896 False False
datadog_agent_remap_datadog_logs 37.89KiB/CPU-s 0.13 97.58% 29.38MiB/CPU-s 872.92KiB/CPU-s 11.27KiB/CPU-s 0.0 0.029012 29.42MiB/CPU-s 967.47KiB/CPU-s 12.48KiB/CPU-s 0.0 0.032114 False False
http_pipelines_no_grok_blackhole 5.87KiB/CPU-s 0.10 97.70% 5.73MiB/CPU-s 48.74KiB/CPU-s 644.35B/CPU-s 0.0 0.008311 5.73MiB/CPU-s 193.88KiB/CPU-s 2.5KiB/CPU-s 0.0 0.033024 False False
http_to_http_noack -352.01B/CPU-s -0.00 3.33% 23.83MiB/CPU-s 449.43KiB/CPU-s 5.8KiB/CPU-s 0.0 0.018415 23.83MiB/CPU-s 451.9KiB/CPU-s 5.83KiB/CPU-s 0.0 0.018516 False False
splunk_hec_indexer_ack_blackhole -816.55B/CPU-s -0.00 8.25% 23.83MiB/CPU-s 418.43KiB/CPU-s 5.4KiB/CPU-s 0.0 0.017145 23.83MiB/CPU-s 424.98KiB/CPU-s 5.48KiB/CPU-s 0.0 0.017414 False False
enterprise_http_to_http 833.8B/CPU-s 0.00 13.33% 23.84MiB/CPU-s 266.17KiB/CPU-s 3.44KiB/CPU-s 0.0 0.010901 23.84MiB/CPU-s 264.86KiB/CPU-s 3.42KiB/CPU-s 0.0 0.010847 False False
fluent_elasticsearch -40.4KiB/CPU-s -0.05 97.69% 79.47MiB/CPU-s 51.97KiB/CPU-s 680.15B/CPU-s 0.0 0.000639 79.43MiB/CPU-s 1.36MiB/CPU-s 17.76KiB/CPU-s 0.0 0.017098 False False
file_to_blackhole -68.92KiB/CPU-s -0.07 23.85% 92.86MiB/CPU-s 12.17MiB/CPU-s 160.67KiB/CPU-s 0.0 0.131035 92.8MiB/CPU-s 12.16MiB/CPU-s 160.48KiB/CPU-s 0.0 0.131044 True False
http_pipelines_blackhole -759.78B/CPU-s -0.08 91.98% 971.12KiB/CPU-s 16.63KiB/CPU-s 220.03B/CPU-s 0.0 0.017123 970.38KiB/CPU-s 28.31KiB/CPU-s 374.34B/CPU-s 0.0 0.029176 False False
splunk_hec_to_splunk_hec_logs_acks -39.52KiB/CPU-s -0.21 98.97% 18.24MiB/CPU-s 860.48KiB/CPU-s 11.1KiB/CPU-s 0.0 0.046076 18.2MiB/CPU-s 827.81KiB/CPU-s 10.68KiB/CPU-s 0.0 0.044421 False False
syslog_humio_logs -29.97KiB/CPU-s -0.21 100.00% 13.91MiB/CPU-s 313.06KiB/CPU-s 4.04KiB/CPU-s 0.0 0.021971 13.88MiB/CPU-s 354.94KiB/CPU-s 4.59KiB/CPU-s 0.0 0.024963 False False
splunk_hec_route_s3 -37.33KiB/CPU-s -0.27 97.67% 13.4MiB/CPU-s 900.01KiB/CPU-s 11.62KiB/CPU-s 0.0 0.065603 13.36MiB/CPU-s 903.12KiB/CPU-s 11.66KiB/CPU-s 0.0 0.066009 False False
otlp_http_to_blackhole -7.21KiB/CPU-s -0.30 88.11% 2.34MiB/CPU-s 246.88KiB/CPU-s 3.19KiB/CPU-s 0.0 0.103166 2.33MiB/CPU-s 259.86KiB/CPU-s 3.35KiB/CPU-s 0.0 0.108916 True False
datadog_agent_remap_blackhole -130.22KiB/CPU-s -0.31 100.00% 41.39MiB/CPU-s 1.03MiB/CPU-s 13.63KiB/CPU-s 0.0 0.024910 41.26MiB/CPU-s 1.25MiB/CPU-s 16.51KiB/CPU-s 0.0 0.030283 False False
otlp_grpc_to_blackhole -7.26KiB/CPU-s -0.44 100.00% 1.62MiB/CPU-s 70.68KiB/CPU-s 934.28B/CPU-s 0.0 0.042593 1.61MiB/CPU-s 73.0KiB/CPU-s 964.78B/CPU-s 0.0 0.044184 False False
socket_to_socket_blackhole -104.41KiB/CPU-s -0.44 100.00% 23.11MiB/CPU-s 318.11KiB/CPU-s 4.11KiB/CPU-s 0.0 0.013442 23.01MiB/CPU-s 493.97KiB/CPU-s 6.38KiB/CPU-s 0.0 0.020966 False False
syslog_loki -83.7KiB/CPU-s -0.58 100.00% 14.09MiB/CPU-s 395.04KiB/CPU-s 5.1KiB/CPU-s 0.0 0.027381 14.01MiB/CPU-s 399.3KiB/CPU-s 5.15KiB/CPU-s 0.0 0.027838 False False
http_text_to_http_json -427.16KiB/CPU-s -1.10 100.00% 37.87MiB/CPU-s 791.06KiB/CPU-s 10.22KiB/CPU-s 0.0 0.020396 37.46MiB/CPU-s 997.94KiB/CPU-s 12.89KiB/CPU-s 0.0 0.026016 False False
http_to_http_json -439.57KiB/CPU-s -1.80 100.00% 23.84MiB/CPU-s 377.08KiB/CPU-s 4.87KiB/CPU-s 0.0 0.015445 23.41MiB/CPU-s 940.87KiB/CPU-s 12.15KiB/CPU-s 0.0 0.039243 False False
http_to_http_acks -228.96KiB/CPU-s -2.21 95.54% 10.11MiB/CPU-s 6.06MiB/CPU-s 80.06KiB/CPU-s 0.0 0.598813 9.89MiB/CPU-s 6.13MiB/CPU-s 81.11KiB/CPU-s 0.0 0.620233 True False

@github-actions
Copy link

github-actions bot commented Nov 7, 2022

Soak Test Results

Baseline: ba02a47
Comparison: 83c5c55
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
syslog_log2metric_humio_metrics 512.86KiB 5.44 100.00% 9.21MiB 816.83KiB 16.67KiB 0 0.0865601 9.71MiB 423.51KiB 8.62KiB 0 0.0425658 False False
http_to_http_acks 387.45KiB 2.23 89.73% 16.97MiB 7.87MiB 164.61KiB 0 0.463806 17.35MiB 8.19MiB 171.02KiB 0 0.471757 True True
syslog_regex_logs2metric_ddmetrics 160.26KiB 2.17 100.00% 7.21MiB 990.04KiB 20.17KiB 0 0.133985 7.37MiB 977.59KiB 19.92KiB 0 0.129491 False False
http_pipelines_blackhole_acks 25.07KiB 0.74 99.88% 3.31MiB 337.49KiB 6.86KiB 0 0.099496 3.34MiB 177.62KiB 3.62KiB 0 0.0519786 False False
datadog_agent_remap_blackhole_acks 315.64KiB 0.53 99.89% 57.94MiB 4.0MiB 83.32KiB 0 0.0690598 58.24MiB 2.31MiB 48.3KiB 0 0.0396197 False False
http_text_to_http_json 183.9KiB 0.45 100.00% 39.5MiB 714.83KiB 14.59KiB 0 0.0176688 39.68MiB 719.64KiB 14.69KiB 0 0.0177072 False False
socket_to_socket_blackhole 88.79KiB 0.38 100.00% 23.03MiB 533.27KiB 10.89KiB 0 0.0226087 23.12MiB 517.02KiB 10.56KiB 0 0.0218376 False False
splunk_hec_route_s3 60.75KiB 0.28 69.21% 21.33MiB 2.03MiB 42.31KiB 0 0.0951841 21.39MiB 2.01MiB 41.94KiB 0 0.0937643 False False
syslog_humio_logs 39.57KiB 0.24 99.99% 15.98MiB 404.43KiB 8.26KiB 0 0.0247061 16.02MiB 263.32KiB 5.39KiB 0 0.0160469 False False
splunk_hec_to_splunk_hec_logs_noack 18.23KiB 0.07 87.94% 23.82MiB 464.05KiB 9.47KiB 0 0.0190193 23.84MiB 339.88KiB 6.94KiB 0 0.0139196 False False
splunk_hec_indexer_ack_blackhole 2.14KiB 0.01 6.70% 23.75MiB 886.13KiB 18.02KiB 0 0.0364282 23.75MiB 884.11KiB 17.99KiB 0 0.0363419 False False
enterprise_http_to_http -1.45KiB -0.01 14.87% 23.85MiB 269.23KiB 5.5KiB 0 0.0110229 23.85MiB 265.08KiB 5.42KiB 0 0.0108535 False False
splunk_hec_to_splunk_hec_logs_acks -6.92KiB -0.03 22.46% 23.76MiB 826.38KiB 16.81KiB 0 0.033961 23.75MiB 858.15KiB 17.46KiB 0 0.0352766 False False
file_to_blackhole -49.37KiB -0.05 37.19% 95.34MiB 3.09MiB 63.98KiB 0 0.0323634 95.29MiB 3.82MiB 79.35KiB 0 0.0400426 False False
datadog_agent_remap_datadog_logs_acks -57.45KiB -0.11 43.09% 52.58MiB 2.68MiB 56.08KiB 0 0.0509586 52.52MiB 4.03MiB 83.85KiB 0 0.0766843 False False
syslog_log2metric_splunk_hec_metrics -17.88KiB -0.11 68.51% 16.16MiB 578.09KiB 11.78KiB 0 0.0349322 16.14MiB 654.07KiB 13.33KiB 0 0.0395667 False False
http_to_http_json -31.92KiB -0.13 98.28% 23.84MiB 385.17KiB 7.86KiB 0 0.015772 23.81MiB 531.22KiB 10.85KiB 0 0.021781 False False
datadog_agent_remap_blackhole -93.57KiB -0.17 61.06% 55.14MiB 3.79MiB 79.09KiB 0 0.0687177 55.05MiB 3.58MiB 74.58KiB 0 0.0649396 False False
syslog_loki -30.18KiB -0.19 89.58% 15.3MiB 402.52KiB 8.24KiB 0 0.0256816 15.27MiB 818.46KiB 16.64KiB 0 0.05232 False False
fluent_elasticsearch -195.46KiB -0.24 100.00% 79.47MiB 51.79KiB 1.05KiB 0 0.00063627 79.28MiB 1.7MiB 34.94KiB 0 0.0214202 False False
datadog_agent_remap_datadog_logs -194.02KiB -0.34 99.03% 54.93MiB 336.18KiB 6.88KiB 0 0.00597514 54.74MiB 3.58MiB 74.6KiB 0 0.0654153 False False
http_to_http_noack -110.64KiB -0.45 99.99% 23.83MiB 523.43KiB 10.69KiB 0 0.0214484 23.72MiB 1.21MiB 25.26KiB 0 0.0510963 False False
http_pipelines_blackhole -67.81KiB -1.51 100.00% 4.38MiB 277.05KiB 5.66KiB 0 0.0617521 4.31MiB 394.16KiB 8.03KiB 0 0.0892033 False False
http_pipelines_no_grok_blackhole -216.78KiB -2.02 100.00% 10.5MiB 127.8KiB 2.61KiB 0 0.0118814 10.29MiB 1.0MiB 20.84KiB 0 0.0972123 False False
syslog_splunk_hec_logs -502.95KiB -3.12 100.00% 15.72MiB 805.67KiB 16.39KiB 0 0.050035 15.23MiB 1010.66KiB 20.64KiB 0 0.0647894 False False

@neuronull neuronull merged commit f4a363f into master Nov 7, 2022
@neuronull neuronull deleted the neuronull/apm_stats_fix_payload_flushing branch November 7, 2022 18:38
davidhuie-dd pushed a commit that referenced this pull request Nov 7, 2022
…trace payloads and at a set interval. (#15084)

- APM stats are now "flushed" in a separate thread, detached from the sink's stream loop.
- The stats flushing thread flushes the oldest bucket every 10 seconds, and caches the last two 10 second buckets.
- When sink is shutting down, the APM stats thread flushes remaining buckets.
@spencergilbert spencergilbert added this to the Vector 0.25.2 milestone Nov 23, 2022
spencergilbert pushed a commit that referenced this pull request Nov 23, 2022
…trace payloads and at a set interval. (#15084)

- APM stats are now "flushed" in a separate thread, detached from the sink's stream loop.
- The stats flushing thread flushes the oldest bucket every 10 seconds, and caches the last two 10 second buckets.
- When sink is shutting down, the APM stats thread flushes remaining buckets.
jszwedko pushed a commit that referenced this pull request Nov 28, 2022
…trace payloads and at a set interval. (#15084)

- APM stats are now "flushed" in a separate thread, detached from the sink's stream loop.
- The stats flushing thread flushes the oldest bucket every 10 seconds, and caches the last two 10 second buckets.
- When sink is shutting down, the APM stats thread flushes remaining buckets.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-condition: integration tests enable Run integration tests on this PR domain: ci Anything related to Vector's CI environment domain: sinks Anything related to the Vector's sinks sink: datadog_traces Anything `datadog_traces` sink related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants