Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support Grafana Agent v0.29.0 #184

Merged
merged 2 commits into from
Apr 1, 2023

Conversation

patmaddox
Copy link
Contributor

Change description

This adds support for Grafana Agent v0.29.0

Per the Grafana Agent CHANGELOG, the pertinent breaking change is that server.http_listen_port and server.grpc_listen_port have been removed from the config file, and moved to command line args -server.http.address and -server.grpc.address respectively.

This change adds support for both - pre v0.26.0 is unchanged, v0.26.0 and above omits the removed keys from the default config, and passes them as flags.

It maintains v0.23.0 as the default, to not affect existing installations.

We are running this commit in prod.

What problem does this solve?

We ran into a problem where after rebooting a machine, Grafana Agent got stuck in a crash loop trying to replay the WAL.

After some research, it seemed like various issues related to WAL replaying were resolved in newer versions.

I started using v0.29.0 and it cleared up the problem.

./bin/agent-freebsd-amd64 -config.file config/agent.yml
ts=2022-11-30T03:58:23.338778202Z caller=node.go:85 level=info agent=prometheus component=cluster msg="applying config"
ts=2022-11-30T03:58:23.339038349Z caller=remote.go:180 level=info agent=prometheus component=cluster msg="not watching the KV, none set"
ts=2022-11-30T03:58:23Z level=info caller=traces/traces.go:135 msg="Traces Logger Initialized" component=traces
ts=2022-11-30T03:58:23.345287748Z caller=server.go:77 level=info msg="server configuration changed, restarting server"
ts=2022-11-30T03:58:23.346898915Z caller=gokit.go:72 level=info http=[::]:4141 grpc=[::]:9041 msg="server listening on addresses"
ts=2022-11-30T03:58:23.35337069Z caller=wal.go:182 level=info agent=prometheus instance=28d1c3ee27ede0f514e148ae1918b3e8 msg="replaying WAL, this may take a while" dir=/var/run/sprlcl/grafana-biz/prom_wal/28d1c3ee27ede0f514e148ae1918b3e8/wal
panic: runtime error: index out of range [0] with length 0

goroutine 117 [running]:
github.com/prometheus/prometheus/tsdb/wal.NewSegmentBufReader(...)
	/go/pkg/mod/github.com/grafana/prometheus@v1.8.2-0.20220112164627-aae84190631a/tsdb/wal/wal.go:881
github.com/prometheus/prometheus/tsdb/wal.NewSegmentsRangeReader({0xc00004b840, 0x1, 0x1})
	/go/pkg/mod/github.com/grafana/prometheus@v1.8.2-0.20220112164627-aae84190631a/tsdb/wal/wal.go:863 +0x27a
github.com/prometheus/prometheus/tsdb/wal.NewSegmentsReader(...)
	/go/pkg/mod/github.com/grafana/prometheus@v1.8.2-0.20220112164627-aae84190631a/tsdb/wal/wal.go:835
github.com/grafana/agent/pkg/metrics/wal.(*Storage).replayWAL(0xc0009af0a0)
	/drone/src/pkg/metrics/wal/wal.go:189 +0x3f6
github.com/grafana/agent/pkg/metrics/wal.NewStorage({0x47dab80, 0xc0007a44b0}, {0x48162b8, 0xc000d757d0}, {0xc00014ca00, 0x45})
	/drone/src/pkg/metrics/wal/wal.go:164 +0x326
github.com/grafana/agent/pkg/metrics/instance.New.func1({0x48162b8, 0xc000d757d0})
	/drone/src/pkg/metrics/instance/instance.go:261 +0x5d
github.com/grafana/agent/pkg/metrics/instance.(*Instance).initialize(0xc000a3af20, {0x484e528, 0xc000bbe980}, {0x48162b8, 0xc000d757d0}, 0xc0009aeee0)
	/drone/src/pkg/metrics/instance/instance.go:410 +0xfe
github.com/grafana/agent/pkg/metrics/instance.(*Instance).Run(0xc000a3af20, {0x484e528, 0xc000bbe980})
	/drone/src/pkg/metrics/instance/instance.go:313 +0x38a
github.com/grafana/agent/pkg/metrics/instance.(*BasicManager).runProcess(0xc000d0cc00, {0x484e528, 0xc000bbe980}, {0xc000053c60, 0x20}, {0x4871418, 0xc000a3af20})
	/drone/src/pkg/metrics/instance/manager.go:262 +0x9e
github.com/grafana/agent/pkg/metrics/instance.(*BasicManager).spawnProcess.func1(0xc000d0cc00, {0x484e528, 0xc000bbe980}, 0xc0009aed20, {0x4871418, 0xc000a3af20}, 0xc0000c2240)
	/drone/src/pkg/metrics/instance/manager.go:232 +0x6d
created by github.com/grafana/agent/pkg/metrics/instance.(*BasicManager).spawnProcess
	/drone/src/pkg/metrics/instance/manager.go:231 +0x358

Checklist

  • I have added unit tests to cover my changes.
  • I have added documentation to cover my changes.
  • My changes have passed unit tests and have been tested E2E in an example project.

@akoutmos
Copy link
Owner

Thanks for updating this! I'll give this a test drive in my test environment over the weekend and get back to you with feedback!

@coveralls
Copy link

coveralls commented Dec 14, 2022

Coverage Status

Coverage decreased (-0.4%) to 78.578% when pulling 11426c4 on patmaddox:grafana-agent-29 into 22664c1 on akoutmos:master.

@akoutmos akoutmos mentioned this pull request Mar 31, 2023
3 tasks
Copy link
Collaborator

@tmartin8080 tmartin8080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this and the tests!

Everything seemed to work out well using different versions of the Grafana Agent:

  • Tested version: "0.23.0"
  • Tested version: "0.29.0"

And both generated expected config files and sent metrics to Grafana Cloud.

@tmartin8080 tmartin8080 added the enhancement New feature or request label Apr 1, 2023
@akoutmos akoutmos merged commit 65a8188 into akoutmos:master Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants