[large-tiles] Additional metrics added #2720

gediminasgu · 2020-10-12T14:56:29Z

What this PR does / why we need it:
This PR adds useful metrics for large-tiles process monitoring.

Special notes for your reviewer:

Does this PR introduce a user-facing and/or backwards incompatible change?:

NONE

Does this PR require updating code package or user-facing documentation?:

NONE

arnikola · 2020-10-12T14:59:20Z

src/dbnode/storage/types.go

 	// HandleCounterResets is temporarily used to force counter reset handling logics on the processed series.
 	// TODO: remove once we have metrics type stored in the metadata.
 	HandleCounterResets bool
+	MetricsScope        tally.Scope


nit: would this make more sense as an instrument.Options?

The good thing about the tally.Scope is that I can set tags for all methods below (scope). With instrument.Options I didn't find how to do it?

iOpts := instrument.NewOptions() // To get scope scope := iOpts.MetricsScope() // To set scope on opts iOpts = iOpts.SetMetricsScope(scope.With("tags...."))

It just seems more conventional to put instrument options into options structs rather than the raw scopes

arnikola · 2020-10-12T15:03:11Z

src/dbnode/storage/namespace.go

@@ -1693,6 +1694,7 @@ func (n *dbNamespace) aggregateTiles(
 			ctx, sourceNs.ID(), sourceShard.ID(), blockReaders, sourceBlockVolumes, opts, nsCtx.Schema)

 		processedTileCount += shardProcessedTileCount
+		processedShards.Inc(1)


Should this always increment? Probably better to break this out into success/error metrics? Also might be useful to add a metric for attemptedShards that would increment at the start of this loop to catch scenarios where we can't read the shard for whatever reason

Actually, if the shard fails, the whole job will fail. So kind of failed job count = failed shard count w/o any additional information. Does it really make sense to duplicate it?

arnikola · 2020-10-12T15:03:56Z

src/dbnode/storage/database.go

+		opts.MetricsScope.Counter("aggregation.errors").Inc(1)
+	} else {
+		opts.MetricsScope.Counter("aggregation.success").Inc(1)


Instead of having two different metrics here, should it be one metric with a success tag?

I tried to follow the convention which I saw elsewhere: https://github.com/m3db/m3/blob/gg/large-tiles-metrics/src/x/instrument/methods.go#L298
So, which one we should follow? :)

src/dbnode/storage/database.go

# Conflicts: # src/dbnode/storage/namespace.go # src/dbnode/storage/types.go

# Conflicts: # src/dbnode/storage/types.go

* master: [read_index_segments] Always validate index segment checksums before reading/validating contents (#2835) [query] Return additional warnings in /query{,_range} endpoints (#2836) Add a check for seriesIter.Err (#2834) [tools] Add concurrent read_index_segments validation option (#2833) [query] Add non-ready namespaces to Clusters interface and use in /namespace/ready endpoint (#2828) [query] Tests for when function argument is an expression (#2809) [large-tiles] Additional metrics added (#2720) [query] Refactor groupByNodes and implement `nodes` arg in asPercent (#2816) [read_index_segments] Add log that outlines which segment being processed (#2825) [aggregator] Handle follower canLead() for not yet flushed windows (#2818)

gediminasgu added 2 commits October 12, 2020 17:55

[large-tiles] Additional metrics added

25b6ed4

Merge branch 'master' into gg/large-tiles-metrics

14fa793

arnikola reviewed Oct 12, 2020

View reviewed changes

src/dbnode/storage/database.go Outdated Show resolved Hide resolved

arnikola reviewed Oct 12, 2020

View reviewed changes

src/dbnode/storage/database.go Outdated Show resolved Hide resolved

soundvibe reviewed Oct 13, 2020

View reviewed changes

src/dbnode/storage/database.go Outdated Show resolved Hide resolved

gediminasgu added 5 commits October 14, 2020 13:24

Merge remote-tracking branch 'origin/master' into gg/large-tiles-metrics

8ffcde4

# Conflicts: # src/dbnode/storage/namespace.go # src/dbnode/storage/types.go

changes according to the comments

352d56f

test fix

4da1550

changes according to comments

54a9c2e

fix

9c2fedc

arnikola approved these changes Oct 28, 2020

View reviewed changes

gediminasgu added 5 commits October 29, 2020 10:40

Merge remote-tracking branch 'origin/master' into gg/large-tiles-metrics

c7e3d02

# Conflicts: # src/dbnode/storage/types.go

Merge branch 'master' into gg/large-tiles-metrics

18a1f62

test fix

84568e8

Merge branch 'master' into gg/large-tiles-metrics

ab505a7

Merge branch 'master' into gg/large-tiles-metrics

d5bb69d

gediminasgu merged commit 83fb11f into master Nov 3, 2020

gediminasgu deleted the gg/large-tiles-metrics branch November 3, 2020 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[large-tiles] Additional metrics added #2720

[large-tiles] Additional metrics added #2720

gediminasgu commented Oct 12, 2020

arnikola Oct 12, 2020

gediminasgu Oct 14, 2020 •

edited

Loading

arnikola Oct 14, 2020

arnikola Oct 14, 2020

arnikola Oct 12, 2020

gediminasgu Oct 14, 2020

arnikola Oct 12, 2020

gediminasgu Oct 14, 2020

[large-tiles] Additional metrics added #2720

[large-tiles] Additional metrics added #2720

Conversation

gediminasgu commented Oct 12, 2020

arnikola Oct 12, 2020

Choose a reason for hiding this comment

gediminasgu Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

arnikola Oct 14, 2020

Choose a reason for hiding this comment

arnikola Oct 14, 2020

Choose a reason for hiding this comment

arnikola Oct 12, 2020

Choose a reason for hiding this comment

gediminasgu Oct 14, 2020

Choose a reason for hiding this comment

arnikola Oct 12, 2020

Choose a reason for hiding this comment

gediminasgu Oct 14, 2020

Choose a reason for hiding this comment

gediminasgu Oct 14, 2020 •

edited

Loading