Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor metric samples to use Atlas for tags and have time series #2594

Merged
merged 15 commits into from
Oct 4, 2022

Conversation

codebien
Copy link
Contributor

@codebien codebien commented Jul 11, 2022

Integrated https://github.com/mstoykov/atlas as the internal data structure for SampleTags. Atlas is a Btree implementation based on [2]string array for the nodes (the stored tags pairs). It uses pointers most of the time for comparisons so it's very fast and read locks can be avoided.

It allows allocating a specific Tag pair just once. The reduction in terms of memory from my basic benchmarks is very important. For example, the benchmark on a similar function reduces the overall allocated memory from 600MB to 87MB (note: this is a very extreme and limit case, the average is not so high).

Also, the time series concept has been introduced as defined from the Data model and Storage parts in #2580.

Closes #1831 #2580

@codebien codebien self-assigned this Jul 11, 2022
@github-actions github-actions bot requested review from na-- and oleiade July 11, 2022 12:59
metrics/sample.go Outdated Show resolved Hide resolved
@na--
Copy link
Member

na-- commented Jul 11, 2022

Hmm I'd definitely want to see more benchmarks and tests before we merge something like this 🤔 Considering we'll have just a single copy of the tags per TimeSeries, I don't think we need to replace a slice with such a more complicated struct, the memory savings won't be as impressive as what you claim. It can probably reduce the allocations if you have gradual construction of the final tag set, at the cost of lock contention, but not as it's currently used in this PR (since you already have constructed the map you pass to NewSampleTags()).

Besides, I have just skimmed the atlas code, but I am not sure it works like how we'd need it to work. A tag set with keys {a: b, c: d} should be equivalent to a tag set with the same keys in a different order, {c: d, a: b}, which I don't think is the case now.

@codebien codebien requested a review from mstoykov July 11, 2022 13:56
Copy link
Member

@oleiade oleiade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aligned with @na-- remarks. Also feeling somewhat uncomfortable with merging code using a "POC" library in master (but not a blocker, happy to be reassured 🤞🏻 )

metrics/sample.go Outdated Show resolved Hide resolved
@mstoykov
Copy link
Contributor

Hmm I'd definitely want to see more benchmarks and tests before we merge something like this

We are definittely not just merging this. I would even expect that we would rewrite most of the code to actually not have a global root, but have it in the metrics.Registry before we even think of merging.

I would also expect at least some of the code to actually use the root node directly (the http path definitely) instead of NewSampleTags as that will also be a lot more performant then creating a map and then making it into an atlas node.

Considering we'll have just a single copy of the tags per TimeSeries

How do we build and have only one copy and keep it only one copy? From the code that I have seen so far - we always build a new slice of tags and then we hash it and go get it from the registry. But we still made a new copy - we just threw it away in the mean time.

Besides, I have just skimmed the atlas code, but I am not sure it works like how we'd need it to work. A tag set with keys {a: b, c: d} should be equivalent to a tag set with the same keys in a different order, {c: d, a: b}, which I don't think is the case now.

It should, but you can test it and open an issue if it doesn't ;)

To be honest I would expect the reduction in memory to be negligable for most real uses case, but the reduction in GC time in those same cases should be significant once we are using this directly building from lib.State#Tags instead of going from the root node up for each set.

One of the points of this structure is that it "memorizes" how it was constructed and optimizes the same route for the next time. So as long as the code that generates it takes the same route it will be faster the following times. But building it from a map has more chances of it being randomized.

I still would like to see if even like that it makes a difference and in which direction if you just have a fairly basic test. I would expect thsi will be slightly ... worse, before we stop generating the whole map and instead use the atlas.Node for that as well

@mstoykov mstoykov added this to the v0.40.0 milestone Jul 21, 2022
@codebien codebien force-pushed the atlas branch 3 times, most recently from 41823e4 to d7225a4 Compare July 26, 2022 10:38
@codebien
Copy link
Contributor Author

@na-- @mstoykov @oleiade I pushed a new iteration with a basic Atlas integration. Please, take a look and let me know your opinions. I expect several points of discussion so I didn't complete the entire migration, in any case, the js/http package should work as expected.

metrics/tagindex.go Outdated Show resolved Hide resolved
js/modules/k6/metrics/metrics.go Outdated Show resolved Hide resolved
js/modules/k6/ws/ws.go Show resolved Hide resolved
lib/netext/grpcext/conn.go Outdated Show resolved Hide resolved
lib/netext/httpext/request.go Outdated Show resolved Hide resolved
metrics/tagindex.go Outdated Show resolved Hide resolved
vendor/github.com/mstoykov/atlas/atlas.go Outdated Show resolved Hide resolved
@codebien codebien force-pushed the atlas branch 2 times, most recently from a31b1b6 to d55c5d3 Compare July 27, 2022 13:15
cmd/options.go Outdated Show resolved Hide resolved
@codebien
Copy link
Contributor Author

codebien commented Aug 3, 2022

Updates:

  • Restored lib.State.Tags and renamed metrics.TagMap to metrics.TagSet
  • metrics.TagSet is now always lock free
  • lib.RunTags from SampleTags to map[string]string
  • Root set in PreInitState
  • Resolved RunTags set in TestState

Note: as you can see from the failing xk6 test, we are breaking the extensions removing the ability to create a SampleTags from a map.

lib/test_state.go Outdated Show resolved Hide resolved
metrics/tags.go Outdated Show resolved Hide resolved
codebien and others added 11 commits October 4, 2022 12:05
This implements the consensus from the PR comment (#issuecomment-1205273413), with some minor modifications.
This will allow metric Samples to be easily grouped with other Samples with the same metric and tags. It implements the consensus described in my second PR comment (#issuecomment-1205359198)
After the introduction of Atlas, we decided to keep `url` as a normal tag
but it always set to have the same value of the `name` tag.
In this way, if the `url` contains high-cardinality values it wouldn't
affect the system.
This is the same thing that currently the cloud output does.
Remove for a temporary commit the experimental module to allow bumping
the k6 dependency in the extension's repository.
@codebien
Copy link
Contributor Author

codebien commented Oct 4, 2022

As a reminder, otherwise, the xk6-websockets extension will be broken:

  • from now rebasing is denied for this PR
  • for merging this PR use the classic Merge feature for keep the commit history

go.mod Outdated Show resolved Hide resolved
Uses the xk6-websockets main version that is now migrated to the time
series version.
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🤞 🙏 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change for PRs that need to be mentioned in the breaking changes section of the release notes documentation-needed A PR which will need a separate PR for documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Put metric samples in time series (immutable tag sets) on creation
5 participants