Refactor metric samples to use Atlas for tags and have time series #2594

codebien · 2022-07-11T12:59:07Z

Integrated https://github.com/mstoykov/atlas as the internal data structure for SampleTags. Atlas is a Btree implementation based on [2]string array for the nodes (the stored tags pairs). It uses pointers most of the time for comparisons so it's very fast and read locks can be avoided.

It allows allocating a specific Tag pair just once. The reduction in terms of memory from my basic benchmarks is very important. For example, the benchmark on a similar function reduces the overall allocated memory from 600MB to 87MB (note: this is a very extreme and limit case, the average is not so high).

Also, the time series concept has been introduced as defined from the Data model and Storage parts in #2580.

Closes #1831 #2580

metrics/sample.go

na-- · 2022-07-11T13:40:20Z

Hmm I'd definitely want to see more benchmarks and tests before we merge something like this 🤔 Considering we'll have just a single copy of the tags per TimeSeries, I don't think we need to replace a slice with such a more complicated struct, the memory savings won't be as impressive as what you claim. It can probably reduce the allocations if you have gradual construction of the final tag set, at the cost of lock contention, but not as it's currently used in this PR (since you already have constructed the map you pass to NewSampleTags()).

Besides, I have just skimmed the atlas code, but I am not sure it works like how we'd need it to work. A tag set with keys {a: b, c: d} should be equivalent to a tag set with the same keys in a different order, {c: d, a: b}, which I don't think is the case now.

oleiade

I'm aligned with @na-- remarks. Also feeling somewhat uncomfortable with merging code using a "POC" library in master (but not a blocker, happy to be reassured 🤞🏻 )

metrics/sample.go

mstoykov · 2022-07-11T15:31:21Z

Hmm I'd definitely want to see more benchmarks and tests before we merge something like this

We are definittely not just merging this. I would even expect that we would rewrite most of the code to actually not have a global root, but have it in the metrics.Registry before we even think of merging.

I would also expect at least some of the code to actually use the root node directly (the http path definitely) instead of NewSampleTags as that will also be a lot more performant then creating a map and then making it into an atlas node.

Considering we'll have just a single copy of the tags per TimeSeries

How do we build and have only one copy and keep it only one copy? From the code that I have seen so far - we always build a new slice of tags and then we hash it and go get it from the registry. But we still made a new copy - we just threw it away in the mean time.

Besides, I have just skimmed the atlas code, but I am not sure it works like how we'd need it to work. A tag set with keys {a: b, c: d} should be equivalent to a tag set with the same keys in a different order, {c: d, a: b}, which I don't think is the case now.

It should, but you can test it and open an issue if it doesn't ;)

To be honest I would expect the reduction in memory to be negligable for most real uses case, but the reduction in GC time in those same cases should be significant once we are using this directly building from lib.State#Tags instead of going from the root node up for each set.

One of the points of this structure is that it "memorizes" how it was constructed and optimizes the same route for the next time. So as long as the code that generates it takes the same route it will be faster the following times. But building it from a map has more chances of it being randomized.

I still would like to see if even like that it makes a difference and in which direction if you just have a fairly basic test. I would expect thsi will be slightly ... worse, before we stop generating the whole map and instead use the atlas.Node for that as well

codebien · 2022-07-26T10:51:35Z

@na-- @mstoykov @oleiade I pushed a new iteration with a basic Atlas integration. Please, take a look and let me know your opinions. I expect several points of discussion so I didn't complete the entire migration, in any case, the js/http package should work as expected.

metrics/tagindex.go

js/modules/k6/metrics/metrics.go

js/modules/k6/ws/ws.go

lib/netext/grpcext/conn.go

lib/netext/httpext/request.go

metrics/tagindex.go

vendor/github.com/mstoykov/atlas/atlas.go

cmd/options.go

codebien · 2022-08-03T06:47:32Z

Updates:

Restored lib.State.Tags and renamed metrics.TagMap to metrics.TagSet
metrics.TagSet is now always lock free
lib.RunTags from SampleTags to map[string]string
Root set in PreInitState
Resolved RunTags set in TestState

Note: as you can see from the failing xk6 test, we are breaking the extensions removing the ability to create a SampleTags from a map.

lib/test_state.go

metrics/tags.go

…las Node

This implements the consensus from the PR comment (#issuecomment-1205273413), with some minor modifications.

This will allow metric Samples to be easily grouped with other Samples with the same metric and tags. It implements the consensus described in my second PR comment (#issuecomment-1205359198)

After the introduction of Atlas, we decided to keep `url` as a normal tag but it always set to have the same value of the `name` tag. In this way, if the `url` contains high-cardinality values it wouldn't affect the system. This is the same thing that currently the cloud output does.

Remove for a temporary commit the experimental module to allow bumping the k6 dependency in the extension's repository.

codebien · 2022-10-04T13:02:58Z

As a reminder, otherwise, the xk6-websockets extension will be broken:

from now rebasing is denied for this PR
for merging this PR use the classic Merge feature for keep the commit history

go.mod

Uses the xk6-websockets main version that is now migrated to the time series version.

na--

🎉 🤞 🙏 😅

codebien self-assigned this Jul 11, 2022

github-actions bot requested review from na-- and oleiade July 11, 2022 12:59

codebien commented Jul 11, 2022

View reviewed changes

metrics/sample.go Outdated Show resolved Hide resolved

codebien requested a review from mstoykov July 11, 2022 13:56

oleiade reviewed Jul 11, 2022

View reviewed changes

metrics/sample.go Outdated Show resolved Hide resolved

mstoykov added this to the v0.40.0 milestone Jul 21, 2022

codebien force-pushed the atlas branch 3 times, most recently from 41823e4 to d7225a4 Compare July 26, 2022 10:38

mstoykov reviewed Jul 26, 2022

View reviewed changes

codebien force-pushed the atlas branch 2 times, most recently from a31b1b6 to d55c5d3 Compare July 27, 2022 13:15

na-- reviewed Aug 1, 2022

View reviewed changes

cmd/options.go Outdated Show resolved Hide resolved

This was referenced Aug 1, 2022

Fix broken tests and add an extra one #2625

Merged

Create distinct test state objects for the pre-init and run phases and thread them everywhere #2627

Merged

codebien force-pushed the atlas branch 4 times, most recently from 1255ebb to 1589807 Compare August 2, 2022 21:50

codebien requested review from mstoykov, oleiade and na-- August 3, 2022 06:47

na-- reviewed Aug 3, 2022

View reviewed changes

lib/test_state.go Outdated Show resolved Hide resolved

na-- reviewed Aug 3, 2022

View reviewed changes

metrics/tags.go Outdated Show resolved Hide resolved

codebien mentioned this pull request Aug 3, 2022

lib/options: Migrated RunTags to map[string]string type #2631

Merged

codebien and others added 11 commits October 4, 2022 12:05

Migrate the outputs to use the metrics' atlas-based types

68a6a33

js: Use the new Atlas-based state.Tags

ad6ba6f

core,cmd: Operations for init the new Atlas-based types

70f3877

Remove NewTagSet and init all tag sets from a common Registry root at…

0191e74

…las Node

Simplify the atlas integration to just metrics.TagSet and helpers

10a0f19

This implements the consensus from the PR comment (#issuecomment-1205273413), with some minor modifications.

Refactor metrics.Sample to have a comparable TimeSeries

7f929cc

This will allow metric Samples to be easily grouped with other Samples with the same metric and tags. It implements the consensus described in my second PR comment (#issuecomment-1205359198)

Rename TagSet.SortAndAddTags() to WithTagsFromMap() and improve comments

721b2f4

Convert xk6-websockets to atlas-based metric tags

cf16837

Reformat go.mod

a03ac0b

output/cloud: Pass error as a field for logging

f31f716

codebien force-pushed the atlas branch from 95c7086 to 62fd14a Compare October 4, 2022 10:06

Remove experimental xk6-websockets

50bc9af

Remove for a temporary commit the experimental module to allow bumping the k6 dependency in the extension's repository.

codebien force-pushed the atlas branch from 017fd58 to 50bc9af Compare October 4, 2022 10:25

codebien mentioned this pull request Oct 4, 2022

Bump the go version to 1.19 grafana/xk6-websockets#19

Merged

mstoykov reviewed Oct 4, 2022

View reviewed changes

go.mod Outdated Show resolved Hide resolved

codebien force-pushed the atlas branch from 281cfc6 to ac0b2b1 Compare October 4, 2022 13:21

Bump xk6-websockets version

f5f05c9

Uses the xk6-websockets main version that is now migrated to the time series version.

codebien force-pushed the atlas branch from ac0b2b1 to f5f05c9 Compare October 4, 2022 13:23

oleiade self-requested a review October 4, 2022 13:32

codebien requested a review from mstoykov October 4, 2022 13:50

mstoykov approved these changes Oct 4, 2022

View reviewed changes

na-- approved these changes Oct 4, 2022

View reviewed changes

codebien merged commit 46b4847 into master Oct 4, 2022

codebien deleted the atlas branch October 4, 2022 14:16

na-- mentioned this pull request Oct 6, 2022

Support non-indexable high-cardinality metric tags / metadata #2584

Closed

This was referenced Oct 13, 2022

metrics.Sample.Metadata: Data structure for high cardinality tags #2726

Merged

metrics.Ingester: Sink by time series #2735

Open

na-- mentioned this pull request Dec 5, 2022

TestVUIntegrationMetrics is broken, make it into an integration test #2799

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor metric samples to use Atlas for tags and have time series #2594

Refactor metric samples to use Atlas for tags and have time series #2594

codebien commented Jul 11, 2022 •

edited

Loading

na-- commented Jul 11, 2022 •

edited

Loading

oleiade left a comment •

edited

Loading

mstoykov commented Jul 11, 2022

codebien commented Jul 26, 2022

codebien commented Aug 3, 2022

codebien commented Oct 4, 2022

na-- left a comment

Refactor metric samples to use Atlas for tags and have time series #2594

Refactor metric samples to use Atlas for tags and have time series #2594

Conversation

codebien commented Jul 11, 2022 • edited Loading

na-- commented Jul 11, 2022 • edited Loading

oleiade left a comment • edited Loading

Choose a reason for hiding this comment

mstoykov commented Jul 11, 2022

codebien commented Jul 26, 2022

codebien commented Aug 3, 2022

codebien commented Oct 4, 2022

na-- left a comment

Choose a reason for hiding this comment

codebien commented Jul 11, 2022 •

edited

Loading

na-- commented Jul 11, 2022 •

edited

Loading

oleiade left a comment •

edited

Loading