[statsd] Add caching to tag normalization for Python3.2+ #674

sgnn7 · 2021-06-30T21:31:53Z

What does this PR do?

On Python3.2 we now use the built-in @lru_cache decorator to add a small (size: 512) cache on
normalize_tags to avoid expensive re.sub calls when previously-seen tags are used in metrics.
This decreases the statsd latency, CPU, and benchmark test duration significantly (~10-30%) on
Python3.2+ with negligible impact on Python2.

Since this function is used in the submit() API too, it may offer
significant performance improvement there as well.

Description of the Change

Since tag normalization is still the highest bottleneck in metrics
submission latency, this change adds small caching (512 entries) to
that method's calls via built-in @lru_cache where available
(Python3.2+). When the cache is hit, we avoid the ultra-expensive
re.sub operation and increase the performance.

Fixes #673

Alternate Designs

We could either add a new dependency or roll our own lru_cache to support ancient Python versions but that
seems like possibly wasted effort and/or bloat increase.

Possible Drawbacks

Possible memory usage increase in clients that use a lot of custom tags but the cache size limit should keep
that in check.

Verification Process

Run general tests on Python2 and Python3 (or do manual statsd metric sending)
Ensure that there are no failures

Additional Notes

Benchmark results:

Note: Benchmark code uses a limited amount of mostly-static global and metric tags
Single-threaded:

Python2:
- Single-threaded UDP: +1% CPU/rss/test duration
- Single-threaded UDS: +3% CPU/rss/test duration
Python3:
- Single-threaded UDP: -22% CPU/rss/test duration
- Single-threaded UDS: -23% CPU/rss/test duration

Multi-threaded:

Python2:
- Multi-threaded UDP: +5.2% CPU/rss/test duration
- Multi-threaded UDS: +3.2% CPU/rss/test duration
Python3:
- Multi-threaded UDP: -10% CPU/rss/test duration
- Multi-threaded UDS: -29% CPU/rss/test duration

Memory overhead: Negligible (see note about tags)

Release Notes

Review checklist (to be filled by reviewers)

Feature or bug fix MUST have appropriate tests (unit, integration, etc...)
PR title must be written as a CHANGELOG entry (see why)
Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
PR must have one changelog/ label attached. If applicable it should have the backward-incompatible label attached.
PR should not have do-not-merge/ label attached.
If Applicable, issue must have kind/ and severity/ labels attached at least.

Since tag normalization is still the highest bottleneck in metrics submission latency, this change adds small caching (512 entries) to that method's calls via built-in `@lru_cache` where available (Python3.2+). When the cache is hit, we avoid the ultra-expensive `re.sub` operation and increase the performance.

sgnn7 added changelog/Changed Changed features results into a major version bump kind/feature-request Feature request related issue severity/normal Normal severity issue labels Jun 30, 2021

sgnn7 added this to the Next milestone Jun 30, 2021

sgnn7 requested review from a team as code owners June 30, 2021 21:31

sgnn7 force-pushed the sgnn7/cache-tag-normalization-results-2 branch from 3a054f2 to db5a014 Compare June 30, 2021 21:34

therve approved these changes Jul 1, 2021

View reviewed changes

ogaca-dd approved these changes Jul 1, 2021

View reviewed changes

sgnn7 merged commit 6b18d6c into master Jul 1, 2021

sgnn7 deleted the sgnn7/cache-tag-normalization-results-2 branch July 1, 2021 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[statsd] Add caching to tag normalization for Python3.2+ #674

[statsd] Add caching to tag normalization for Python3.2+ #674

sgnn7 commented Jun 30, 2021

[statsd] Add caching to tag normalization for Python3.2+ #674

[statsd] Add caching to tag normalization for Python3.2+ #674

Conversation

sgnn7 commented Jun 30, 2021

What does this PR do?

Description of the Change

Alternate Designs

Possible Drawbacks

Verification Process

Additional Notes

Release Notes

Review checklist (to be filled by reviewers)