-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional anonymous Tempo usage reporting #1481
Conversation
03da46a
to
5d634ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some thoughts
modules/distributor/receiver/shim.go
Outdated
case "opencensus": | ||
receiverOpencensusStats.Set(1) | ||
case "kafka": | ||
receiverKafkaStats.Set(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default, log an error?
is there a way to associate a string label with this "stat"? so instead of a bunch of individual stats we could says receiverStats.Type(<name>).Set(1)
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to associate a string label with this "stat"? so instead of a bunch of individual stats we could says receiverStats.Type().Set(1) or something?
Discussed offline a bit, and the idea here is to centralize the stats more concretely in the usagestats package. Instead of calling something like usagestats.NewInt("feature_enabled_search")
, int.Set(1)
throughout the code base, this could be usagestats.SetFeatureEnabledSearch(1)
. The SetFeatureEnabledSearch
method would call NewInt/Set.
Couple different ways to do this, but that's the idea.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design of the usagesats
package is such that it doesn't care at all about what stats are being set throughout the code base. To create some helper methods in the package might help readability on the implementation side, but I'm thinking the separation of concerns here is somewhat nice. If the variables names with a Set()
method are ugly to look at, we could also make package local fucntions like setFeatureEnabledSearch()
that would make the necessary calls in.
With a package variable like featureEnabledSearch
, calling featureEnabledSearch.Set(1)
feels almost as readable as usagestats.SetFeatureEnabledSearch()
, the difference being that now the usagestats needs modification for each stat we want to include. Is there another advantage of moving the stats to the usagestats package here I'm not thinking of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just playing with it a little, to move this noise out of the New function could just be something like recordConfigBoolStat(cfg.AuthEnabled, statAuthEnabled)
. Additionally we could move all the stats into a stats file within the packages that implement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with either way. I like @mdisibio's suggestion to consolidate the stats b/c it will help users of the software quickly see what we are reporting. I don't consider this a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I see that it also might prevent sharing the package in the future, but I don't know how much to hold on to that idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could move all the variables to modules.go
if we wanted one place to look. We mostly need access to the config, which could be done in a helper on the Server. Hows that sound? I think that might help smooth out a dependency loop for the backend creation also when trying to implement backend.New()
for shared use.
That's great, thanks for the feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some concerns as discussed offline:
- I'm a little skeptical to adding a dependency on the backend storage to the usage reporting module - not only because of the complexity of running the module but also an unexpected failure scenario with the backend that might result in a component crash. I believe we could hold the token in the memberlistkv for components, in which case it will only ever be recreated if all components are wiped and restarted - at which point we might as well call it a new cluster. (Maybe we can go ahead for now but ease the dependency at some point).
- With the additional dependency on the backend, components like the distributor which initially had no dependency on the backend will have to be configured with the read token from GCS/S3/Azure. This is a breaking change and needs to be communicated clearly and updated in all docs.
- Schema changes - I'm not sure what happens if we change the usage report schema, does an old payload get rejected with a 400? Or do we accept with 200 ok but drop it internally? How hard is it to change the associated dashboards?
- The backoff to send data / wait for the token to be created should definitely be made configurable. In a large cluster with a 100 components, it can result in pretty spikey memberlist traffic if all components query the kv store every second
Thanks for the review @annanay25.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a minor suggestion to the text.
9d61c9c
to
2a51948
Compare
@@ -17,6 +17,7 @@ This document explains the configuration options for Tempo as well as the detail | |||
- [storage](#storage) | |||
- [memberlist](#memberlist) | |||
- [overrides](#overrides) | |||
- [usage-report](#usage-report) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this is placed below search in the actual description
pkg/usagestats/config.go
Outdated
f.DurationVar(&cfg.Backoff.MaxBackoff, util.PrefixConfig(prefix, "backoff.max_backoff"), time.Minute, "maximum time to back off retry") | ||
f.DurationVar(&cfg.Backoff.MinBackoff, util.PrefixConfig(prefix, "backoff.min_backoff"), time.Second, "minimum time to back off retry") | ||
f.IntVar(&cfg.Backoff.MaxRetries, util.PrefixConfig(prefix, "backoff.max_retries"), 0, "maximum number of times to retry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should replace this with cfg.Backoff.RegisterFlagsWithPrefix("usage-report", f)
tempo/vendor/github.com/grafana/dskit/backoff/backoff.go
Lines 18 to 23 in 150eeea
// RegisterFlagsWithPrefix for Config. | |
func (cfg *Config) RegisterFlagsWithPrefix(prefix string, f *flag.FlagSet) { | |
f.DurationVar(&cfg.MinBackoff, prefix+".backoff-min-period", 100*time.Millisecond, "Minimum delay when backing off.") | |
f.DurationVar(&cfg.MaxBackoff, prefix+".backoff-max-period", 10*time.Second, "Maximum delay when backing off.") | |
f.IntVar(&cfg.MaxRetries, prefix+".backoff-retries", 10, "Number of times to backoff and retry before failing.") | |
} |
And then we can override the default if needed. But the flag names should be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been updated. Hows that?
r, err := NewReporter(Config{Leader: true, Enabled: true}, kv.Config{ | ||
Store: "", | ||
}, objectClient, objectClient, log.NewLogfmtLogger(os.Stdout), prometheus.NewPedanticRegistry()) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we error on a wrong k/v store value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, because only the ingesters will require the k/v store. This will get checked when the ingester is started during the call to running()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a final few comments but approving to unblock once done!
What this PR does:
Here we implement an approach from the Loki squad for sending anonymous usage information to Grafana Labs to better understand the uses in the wild.
Fixes https://github.com/grafana/tempo-squad/issues/81
Related grafana/loki#5062
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]