Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable sampling for stdout logger #892

Merged
merged 6 commits into from
Nov 3, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions config.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Honeycomb Refinery Configuration Documentation

This is the documentation for the configuration file for Honeycomb's Refinery.
It was automatically generated on 2023-10-31 at 20:19:42 UTC.
It was automatically generated on 2023-11-02 at 19:50:49 UTC.

## The Config file

Expand Down Expand Up @@ -403,7 +403,7 @@ The sampling algorithm attempts to make sure that the average throughput approxi

- Not eligible for live reload.
- Type: `float`
- Default: `10`
- Default: `5`
- Example: `10`

## Stdout Logger
Expand All @@ -420,6 +420,26 @@ Structured controls whether to use structured logging.
- Not eligible for live reload.
- Type: `bool`

### `SamplerEnabled`

SamplerEnabled controls whether logs are sampled before sending to stdout.

The sample rate is controlled by the `SamplerThroughput` setting.

- Not eligible for live reload.
- Type: `bool`

### `SamplerThroughput`

SamplerThroughput is the sampling throughput for logs in events per second.

The sampling algorithm attempts to make sure that the average throughput approximates this value, while also ensuring that all unique logs arrive at stdout at least once per sampling period.

- Not eligible for live reload.
- Type: `float`
- Default: `5`
- Example: `10`

## Prometheus Metrics

`PrometheusMetrics` contains configuration for Refinery's internally-generated metrics as made available through Prometheus.
Expand Down
6 changes: 6 additions & 0 deletions config/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -595,6 +595,8 @@ func TestStdoutLoggerConfig(t *testing.T) {
"General.ConfigurationVersion", 2,
"Logger.Type", "stdout",
"StdoutLogger.Structured", true,
"StdoutLogger.SamplerThroughput", 10,
"StdoutLogger.SamplerEnabled", true,
)
rm := makeYAML("ConfigVersion", 2)
config, rules := createTempConfigs(t, cm, rm)
Expand All @@ -609,6 +611,8 @@ func TestStdoutLoggerConfig(t *testing.T) {
assert.NoError(t, err)

assert.True(t, loggerConfig.Structured)
assert.True(t, loggerConfig.SamplerEnabled)
assert.Equal(t, 10, loggerConfig.SamplerThroughput)
}

func TestStdoutLoggerConfigDefaults(t *testing.T) {
Expand All @@ -627,6 +631,8 @@ func TestStdoutLoggerConfigDefaults(t *testing.T) {
assert.NoError(t, err)

assert.False(t, loggerConfig.Structured)
assert.False(t, loggerConfig.SamplerEnabled)
assert.Equal(t, 5, loggerConfig.SamplerThroughput)
}
func TestDatasetPrefix(t *testing.T) {
cm := makeYAML(
Expand Down
2 changes: 2 additions & 0 deletions config/file_config.go
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,8 @@ type HoneycombLoggerConfig struct {

type StdoutLoggerConfig struct {
Structured bool `yaml:"Structured" default:"false"`
SamplerEnabled bool `yaml:"SamplerEnabled" `
SamplerThroughput int `yaml:"SamplerThroughput" default:"5"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SamplerThroughput default value conflicts with the default set in configMeta.yaml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also conflicts with the stated default value in the generated documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! I honestly don't really understand the relationship here. I got the default from line 274, which is the config for HoneycombLogger. It looks like that one also has the same issue.
@kentquirk , do you mind to explain a bit more on the relationship between each default values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I took a look of the config logic and here's my understanding. Please correct me if I'm wrong in any way @kentquirk. I believe the ones in metadata is purely for documentation and validation logic. It looks the default value defined in the yaml tag is the one that's being loaded and used.
I modified the metadata file to match with this value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Then everything looks good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #894 -- turns out there were a few of these.

}

type PrometheusMetricsConfig struct {
Expand Down
23 changes: 22 additions & 1 deletion config/metadata/configMeta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -460,7 +460,7 @@ groups:
- name: SamplerThroughput
valuetype: showexample
type: float
default: 10
default: 5
example: 10
reload: false
summary: is the sampling throughput for logs in events per second.
Expand All @@ -484,6 +484,27 @@ groups:
`true` generates structured logs (JSON).
`false` generates plain text logs.

- name: SamplerEnabled
type: bool
valuetype: nondefault
default: false
reload: false
summary: controls whether logs are sampled before sending to stdout.
description: >
The sample rate is controlled by the `SamplerThroughput` setting.

- name: SamplerThroughput
valuetype: showexample
type: float
default: 5
example: 10
reload: false
summary: is the sampling throughput for logs in events per second.
description: >
The sampling algorithm attempts to make sure that the average
throughput approximates this value, while also ensuring that all
unique logs arrive at stdout at least once per sampling
period.
- name: PrometheusMetrics
title: "Prometheus Metrics"
description: contains configuration for Refinery's internally-generated metrics as made available through Prometheus.
Expand Down
48 changes: 48 additions & 0 deletions logger/logrus.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
package logger

import (
"fmt"
"math/rand"
"time"

"github.com/sirupsen/logrus"

"github.com/honeycombio/dynsampler-go"
"github.com/honeycombio/refinery/config"
)

Expand All @@ -13,13 +18,16 @@ type StdoutLogger struct {

logger *logrus.Logger
level logrus.Level

sampler dynsampler.Sampler
}

var _ = Logger((*StdoutLogger)(nil))

type LogrusEntry struct {
entry *logrus.Entry
level logrus.Level
sampler dynsampler.Sampler
}

func (l *StdoutLogger) Start() error {
Expand All @@ -34,6 +42,18 @@ func (l *StdoutLogger) Start() error {
l.logger.SetFormatter(&logrus.JSONFormatter{})
}

if cfg.SamplerEnabled {
l.sampler = &dynsampler.PerKeyThroughput{
ClearFrequencyDuration: 10*time.Second,
PerKeyThroughputPerSec: cfg.SamplerThroughput,
MaxKeys: 1000,
}
err = l.sampler.Start()
if err != nil {
return err
}
}

return nil
}

Expand All @@ -45,6 +65,7 @@ func (l *StdoutLogger) Debug() Entry {
return &LogrusEntry{
entry: logrus.NewEntry(l.logger),
level: logrus.DebugLevel,
sampler: l.sampler,
}
}

Expand All @@ -56,6 +77,7 @@ func (l *StdoutLogger) Info() Entry {
return &LogrusEntry{
entry: logrus.NewEntry(l.logger),
level: logrus.InfoLevel,
sampler: l.sampler,
}
}

Expand All @@ -67,6 +89,7 @@ func (l *StdoutLogger) Warn() Entry {
return &LogrusEntry{
entry: logrus.NewEntry(l.logger),
level: logrus.WarnLevel,
sampler: l.sampler,
}
}

Expand All @@ -78,6 +101,7 @@ func (l *StdoutLogger) Error() Entry {
return &LogrusEntry{
entry: logrus.NewEntry(l.logger),
level: logrus.ErrorLevel,
sampler: l.sampler,
}
}

Expand All @@ -98,30 +122,54 @@ func (l *LogrusEntry) WithField(key string, value interface{}) Entry {
return &LogrusEntry{
entry: l.entry.WithField(key, value),
level: l.level,
sampler: l.sampler,
}
}

func (l *LogrusEntry) WithString(key string, value string) Entry {
return &LogrusEntry{
entry: l.entry.WithField(key, value),
level: l.level,
sampler: l.sampler,
}
}

func (l *LogrusEntry) WithFields(fields map[string]interface{}) Entry {
return &LogrusEntry{
entry: l.entry.WithFields(fields),
level: l.level,
sampler: l.sampler,
}
}

func (l *LogrusEntry) Logf(f string, args ...interface{}) {
if l.sampler != nil {
// use the level and format string as the key to sample on
VinozzZ marked this conversation as resolved.
Show resolved Hide resolved
// this will give us a different sample rate for each level and format string
// and avoid high cardinality args making the throughput sampler less effective
rate := l.sampler.GetSampleRate(fmt.Sprintf("%s:%s", l.level, f))
if shouldDrop(uint(rate)){
return
}
l.entry.WithField("SampleRate", rate)
}

VinozzZ marked this conversation as resolved.
Show resolved Hide resolved
switch l.level {
case logrus.DebugLevel:
l.entry.Debugf(f, args...)
case logrus.InfoLevel:
l.entry.Infof(f, args...)
case logrus.WarnLevel:
l.entry.Warnf(f, args...)
default:
l.entry.Errorf(f, args...)
}
}

func shouldDrop(rate uint) bool {
if rate <= 1 {
return false
}

return rand.Intn(int(rate)) != 0
}
22 changes: 21 additions & 1 deletion refinery_config.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ The sampling algorithm attempts to make sure that the average throughput approxi

- Not eligible for live reload.
- Type: `float`
- Default: `10`
- Default: `5`
- Example: `10`

## Stdout Logger
Expand All @@ -403,6 +403,26 @@ Only used if `Logger.Type` is "stdout".
- Not eligible for live reload.
- Type: `bool`

### `SamplerEnabled`

`SamplerEnabled` controls whether logs are sampled before sending to stdout.

The sample rate is controlled by the `SamplerThroughput` setting.

- Not eligible for live reload.
- Type: `bool`

### `SamplerThroughput`

`SamplerThroughput` is the sampling throughput for logs in events per second.

The sampling algorithm attempts to make sure that the average throughput approximates this value, while also ensuring that all unique logs arrive at stdout at least once per sampling period.

- Not eligible for live reload.
- Type: `float`
- Default: `5`
- Example: `10`

## Prometheus Metrics

`PrometheusMetrics` contains configuration for Refinery's internally-generated metrics as made available through Prometheus.
Expand Down
2 changes: 1 addition & 1 deletion rules.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Honeycomb Refinery Rules Documentation

This is the documentation for the rules configuration for Honeycomb's Refinery.
It was automatically generated on 2023-10-31 at 20:19:43 UTC.
It was automatically generated on 2023-11-02 at 19:50:50 UTC.

## The Rules file

Expand Down
23 changes: 21 additions & 2 deletions tools/convert/templates/configV2.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
## Honeycomb Refinery Configuration ##
######################################
#
# created {{ now }} from {{ .Input }} using a template generated on 2023-10-31 at 20:19:40 UTC
# created {{ now }} from {{ .Input }} using a template generated on 2023-11-02 at 19:50:45 UTC

# This file contains a configuration for the Honeycomb Refinery. It is in YAML
# format, organized into named groups, each of which contains a set of
Expand Down Expand Up @@ -404,7 +404,7 @@ HoneycombLogger:
## throughput approximates this value, while also ensuring that all
## unique logs arrive at Honeycomb at least once per sampling period.
##
## default: 10
## default: 5
## Not eligible for live reload.
# SamplerThroughput: 10

Expand All @@ -423,6 +423,25 @@ StdoutLogger:
## Not eligible for live reload.
{{ nonDefaultOnly .Data "Structured" "Structured" false }}

## SamplerEnabled controls whether logs are sampled before sending to
## stdout.
##
## The sample rate is controlled by the `SamplerThroughput` setting.
##
## Not eligible for live reload.
{{ nonDefaultOnly .Data "SamplerEnabled" "SamplerEnabled" false }}

## SamplerThroughput is the sampling throughput for logs in events per
## second.
##
## The sampling algorithm attempts to make sure that the average
## throughput approximates this value, while also ensuring that all
## unique logs arrive at stdout at least once per sampling period.
##
## default: 5
## Not eligible for live reload.
# SamplerThroughput: 10

########################
## Prometheus Metrics ##
########################
Expand Down