Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Rewrite enricher to use map lookup #1523

Merged
merged 15 commits into from
Nov 15, 2019
Merged

Rewrite enricher to use map lookup #1523

merged 15 commits into from
Nov 15, 2019

Conversation

replay
Copy link
Contributor

@replay replay commented Nov 7, 2019

This replaces the enricher implementation to make enrichment faster. There are mainly two important changes:

  • When enriching a metric the enricher now simply needs to lookup the metric keys from a map, which is much cheaper than running it through a filter for each present meta record. This map needs to be kept up2date, which means enrichment must happen whenever metrics or meta records get added.
  • All events which change the state of the enricher (add metric, del metric, add meta record, del meta record) get processed asynchronously via a queue. Furthermore, all events to add metrics get buffered and executed in batches because this allows us to further decrease the amount of time it takes to enrich each of them. This means that when a new metric gets added to the index it can take a few seconds until its meta tags show up as well, but it is necessary to do it this way because otherwise the enrichment would slow down the ingest speed too much.

I will create a diagram to illustrate how it all works and post it here.

this replaces the enricher implementation to improve its enrichment
performance. it drops the enrichment cache and it also doesn't filter
metrics based on the meta records at the enrichment stage anymore.
instead it is now building a map from which it can lookup the metric
keys and resolve them into meta records, from which it then gets the
meta tags.
especially in scenarios where there is a large number of meta records in
the index this performs much better than the old implementation.
this has the purpose of improving the addMetric performance when a large
number of metrics gets added to the index concurrently. previously each
of them would have been checked against the filter requirements of each
existing meta record, due to how we now process them in batches this
process is more efficient. instead of checking each new metric one by
one against each meta record criteria, we're now building a small
temporary index out of all the added metrics in the buffer, then we run
each meta record as a query on that small index. this change improves
the addMetric event processing performance by a huge factor in
situations where a lot of metrics get added at once.
@replay
Copy link
Contributor Author

replay commented Nov 7, 2019

FYI I'm planning to create a follow-up PR which will not change any logic. It will only move code around and rename stuff to create a separate package for all tag and meta tag related stuff and make names more explanatory. This is mostly just going to be house keeping to avoid namespace pollution of the memory index package. I'll probably also reorganize some of the tests and benchmarks into separate files to make their distribution more logical.

@replay
Copy link
Contributor Author

replay commented Nov 7, 2019

This shows how the enrichment works now. At the time a new metric gets added to the index we asynchronously look up all meta records that need to be associated with it, using a tag query on a temporarily instantiated index. At the time the metric gets queried we then only need to do a few key lookups.
meta tag enricher

@replay replay merged commit f833d45 into master Nov 15, 2019
@replay replay deleted the rewrite_enricher branch November 15, 2019 15:14
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants