Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for environment and dataset rules with same names #438

Merged
merged 6 commits into from
Apr 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,36 @@ By default, a Refinery process will register itself in Redis using its local hos
In environments where domain name resolution is slow or unreliable, override the reliance on name lookups by specifying the name of the peering network interface with the `IdentifierInterfaceName` configuration option.
See the [Refinery documentation](https://docs.honeycomb.io/manage-data-volume/refinery/) for more details on tuning a cluster.


### Mixing Legacy and Environment & Services Rule Definitions

With the change to support environemt and services in Honeycomb, some users will want to support both sending telemetry to a legacy dataset and a new environment called the same thing (eg `production`).

This can be accomplished by leveraging the new `DatasetPrefix` configuration property and then using that prefix in the rules definitions for the legacy datasets.

When Refinery receives telemetry using an API key associated to a legacy dataset, it will then use the prefix in the form `{prefix}.{dataset}` when trying to resolve the rules definition.

For example
config.toml
```toml
DatasetPrefix = "legacy"
```

rules.toml
```toml
# default rules
Sampler = "DeterministicSampler"
SampleRate = 1

[production] # environment called "production"
Sampler = "DeterministicSampler"
SampleRate = 5

[legacy.production] # dataset called "production"
Sampler = "DeterministicSampler"
SampleRate = 10
```

## How sampling decisions are made

In the configuration file, you can choose from a few sampling methods and specify options for each. The `DynamicSampler` is the most interesting and most commonly used. It uses the `AvgSampleRate` algorithm from the [`dynsampler-go`](https://github.com/honeycombio/dynsampler-go) package. Briefly described, you configure Refinery to examine the trace for a set of fields (for example, `request.status_code` and `request.method`). It collects all the values found in those fields anywhere in the trace (eg "200" and "GET") together into a key it hands to the dynsampler. The dynsampler code will look at the frequency that key appears during the previous 30 seconds (or other value set by the `ClearFrequencySec` setting) and use that to hand back a desired sample rate. More frequent keys are sampled more heavily, so that an even distribution of traffic across the keyspace is represented in Honeycomb.
Expand Down
4 changes: 2 additions & 2 deletions collect/collect.go
Original file line number Diff line number Diff line change
Expand Up @@ -449,9 +449,9 @@ func (i *InMemCollector) send(trace *types.Trace) {
logFields["environment"] = samplerKey
}

// use sampler key to find sampler, crete and cache if not found
// use sampler key to find sampler; create and cache if not found
if sampler, found = i.datasetSamplers[samplerKey]; !found {
sampler = i.SamplerFactory.GetSamplerImplementationForDataset(samplerKey)
sampler = i.SamplerFactory.GetSamplerImplementationForKey(samplerKey, isLegacyKey)
i.datasetSamplers[samplerKey] = sampler
}

Expand Down
2 changes: 1 addition & 1 deletion collect/collect_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ func TestDryRunMode(t *testing.T) {
Config: conf,
Logger: &logger.NullLogger{},
}
sampler := samplerFactory.GetSamplerImplementationForDataset("test")
sampler := samplerFactory.GetSamplerImplementationForKey("test", true)
coll := &InMemCollector{
Config: conf,
Logger: &logger.NullLogger{},
Expand Down
2 changes: 2 additions & 0 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -139,4 +139,6 @@ type Config interface {
GetAddHostMetadataToTrace() bool

GetEnvironmentCacheTTL() time.Duration

GetDatasetPrefix() string
}
37 changes: 37 additions & 0 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -652,3 +652,40 @@ func TestHoneycombLoggerConfigDefaults(t *testing.T) {
assert.Equal(t, false, loggerConfig.LoggerSamplerEnabled)
assert.Equal(t, 5, loggerConfig.LoggerSamplerThroughput)
}

func TestDatasetPrefix(t *testing.T) {
tmpDir, err := ioutil.TempDir("", "")
assert.NoError(t, err)
defer os.RemoveAll(tmpDir)

configFile, err := ioutil.TempFile(tmpDir, "*.toml")
assert.NoError(t, err)

_, err = configFile.Write([]byte(`
DatasetPrefix = "dataset"

[InMemCollector]
CacheCapacity=1000

[HoneycombMetrics]
MetricsHoneycombAPI="http://honeycomb.io"
MetricsAPIKey="1234"
MetricsDataset="testDatasetName"
MetricsReportingInterval=3

[HoneycombLogger]
LoggerHoneycombAPI="http://honeycomb.io"
LoggerAPIKey="1234"
LoggerDataset="loggerDataset"
`))
assert.NoError(t, err)
configFile.Close()

rulesFile, err := ioutil.TempFile(tmpDir, "*.toml")
assert.NoError(t, err)

c, err := NewConfig(configFile.Name(), rulesFile.Name(), func(err error) {})
assert.NoError(t, err)

assert.Equal(t, "dataset", c.GetDatasetPrefix())
}
8 changes: 8 additions & 0 deletions config/file_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ type configContents struct {
InMemCollector InMemoryCollectorCacheCapacity `validate:"required"`
AddHostMetadataToTrace bool
EnvironmentCacheTTL time.Duration
DatasetPrefix string
}

type InMemoryCollectorCacheCapacity struct {
Expand Down Expand Up @@ -736,3 +737,10 @@ func (f *fileConfig) GetEnvironmentCacheTTL() time.Duration {

return f.conf.EnvironmentCacheTTL
}

func (f *fileConfig) GetDatasetPrefix() string {
f.mux.RLock()
defer f.mux.RUnlock()

return f.conf.DatasetPrefix
}
8 changes: 8 additions & 0 deletions config/mock.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ type MockConfig struct {
DryRunFieldName string
AddHostMetadataToTrace bool
EnvironmentCacheTTL time.Duration
DatasetPrefix string

Mux sync.RWMutex
}
Expand Down Expand Up @@ -328,3 +329,10 @@ func (f *MockConfig) GetEnvironmentCacheTTL() time.Duration {

return f.EnvironmentCacheTTL
}

func (f *MockConfig) GetDatasetPrefix() string {
f.Mux.RLock()
defer f.Mux.RUnlock()

return f.DatasetPrefix
}
19 changes: 13 additions & 6 deletions sample/sample.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package sample

import (
"fmt"
"os"

"github.com/honeycombio/refinery/config"
Expand All @@ -21,10 +22,16 @@ type SamplerFactory struct {
Metrics metrics.Metrics `inject:"metrics"`
}

// GetSamplerImplementationForDataset returns the sampler implementation for the dataset,
// or nil if it is not defined
func (s *SamplerFactory) GetSamplerImplementationForDataset(dataset string) Sampler {
c, err := s.Config.GetSamplerConfigForDataset(dataset)
// GetSamplerImplementationForKey returns the sampler implementation for the given
// samplerKey (dataset for legacy keys, environment otherwise), or nil if it is not defined
func (s *SamplerFactory) GetSamplerImplementationForKey(samplerKey string, isLegacyKey bool) Sampler {
if isLegacyKey {
if prefix := s.Config.GetDatasetPrefix(); prefix != "" {
samplerKey = fmt.Sprintf("%s.%s", prefix, samplerKey)
Comment on lines +29 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could mandate that the intervening period be part of the specified prefix. If I set the DatasetPrefix field to "x," the prefix really becomes "x."—even though I didn't mention the period when specifying my preferred prefix. I might prefer a prefix like "dataset-" (with a separating hyphen instead of a period).

Had you considered including this separator as part of the prefix? Were you attempting to avoid false aliasing if there was no obvious separator character included in the prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree additional docs on the prefix format where the . separator is added for you would be useful 👍🏻

The separator is not necessary but I think helps create a mental divide. The intent of this change is to support users migrate from legacy datasets to newer environment and services rule sets and should not be a permanent solution. I think a simple solution of always using a single . that's well documented should be good enough.

}
}

c, err := s.Config.GetSamplerConfigForDataset(samplerKey)
if err != nil {
return nil
}
Expand All @@ -49,11 +56,11 @@ func (s *SamplerFactory) GetSamplerImplementationForDataset(dataset string) Samp

err = sampler.Start()
if err != nil {
s.Logger.Debug().WithField("dataset", dataset).Logf("failed to start sampler")
s.Logger.Debug().WithField("dataset", samplerKey).Logf("failed to start sampler")
return nil
}

s.Logger.Debug().WithField("dataset", dataset).Logf("created implementation for sampler type %T", c)
s.Logger.Debug().WithField("dataset", samplerKey).Logf("created implementation for sampler type %T", c)

return sampler
}
80 changes: 80 additions & 0 deletions sample/sample_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,15 @@
package sample

import (
"io/ioutil"
"os"
"testing"

"github.com/facebookgo/inject"
"github.com/honeycombio/refinery/config"
"github.com/honeycombio/refinery/logger"
"github.com/honeycombio/refinery/metrics"
"github.com/stretchr/testify/assert"
)

func TestDependencyInjection(t *testing.T) {
Expand All @@ -27,3 +30,80 @@ func TestDependencyInjection(t *testing.T) {
t.Error(err)
}
}

func TestDatasetPrefix(t *testing.T) {
tmpDir, err := ioutil.TempDir("", "")
assert.NoError(t, err)
defer os.RemoveAll(tmpDir)

configFile, err := ioutil.TempFile(tmpDir, "*.toml")
assert.NoError(t, err)

_, err = configFile.Write([]byte(`
DatasetPrefix = "dataset"

[InMemCollector]
CacheCapacity=1000

[HoneycombMetrics]
MetricsHoneycombAPI="http://honeycomb.io"
MetricsAPIKey="1234"
MetricsDataset="testDatasetName"
MetricsReportingInterval=3

[HoneycombLogger]
LoggerHoneycombAPI="http://honeycomb.io"
LoggerAPIKey="1234"
LoggerDataset="loggerDataset"
`))
assert.NoError(t, err)
configFile.Close()

rulesFile, err := ioutil.TempFile(tmpDir, "*.toml")
assert.NoError(t, err)

_, err = rulesFile.Write([]byte(`
Sampler = "DeterministicSampler"
SampleRate = 1

[production]
Sampler = "DeterministicSampler"
SampleRate = 10

[dataset.production]
Sampler = "DeterministicSampler"
SampleRate = 20
`))
assert.NoError(t, err)
rulesFile.Close()

c, err := config.NewConfig(configFile.Name(), rulesFile.Name(), func(err error) {})
assert.NoError(t, err)

assert.Equal(t, "dataset", c.GetDatasetPrefix())

factory := SamplerFactory{Config: c, Logger: &logger.NullLogger{}, Metrics: &metrics.NullMetrics{}}

defaultSampler := &DeterministicSampler{
Config: &config.DeterministicSamplerConfig{SampleRate: 1},
Logger: &logger.NullLogger{},
}
defaultSampler.Start()

envSampler := &DeterministicSampler{
Config: &config.DeterministicSamplerConfig{SampleRate: 10},
Logger: &logger.NullLogger{},
}
envSampler.Start()

datasetSampler := &DeterministicSampler{
Config: &config.DeterministicSamplerConfig{SampleRate: 20},
Logger: &logger.NullLogger{},
}
datasetSampler.Start()

assert.Equal(t, defaultSampler, factory.GetSamplerImplementationForKey("unknown", false))
assert.Equal(t, defaultSampler, factory.GetSamplerImplementationForKey("unknown", true))
assert.Equal(t, envSampler, factory.GetSamplerImplementationForKey("production", false))
assert.Equal(t, datasetSampler, factory.GetSamplerImplementationForKey("production", true))
}