-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services/horizon/ingest: added 'horizon_ingest_errors_total' metric key #5302
Conversation
We log ingestion errors here: https://github.com/stellar/go/blob/master/services/horizon/internal/ingest/main.go#L646 If you search on kibana you can see that these logs occur very rarely. I think it would make sense to have an ingestion error counter metric which is basically triggered every time we emit the log in https://github.com/stellar/go/blob/master/services/horizon/internal/ingest/main.go#L646. The metric definition should resemble the log message: a counter with 2 labels, current_state and next_state. We can the create the alert using a very simple heuristic: if the ingestion error counter is incremented at least |
… new error counting metrics, per review feedback
New and removed dependencies detected. Learn more about Socket for GitHub ↗︎
🚮 Removed packages: golang/github.com/stretchr/testify@v1.8.4 |
ah, thanks for pointer to that, updated - 3319801 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few minor comments but overall looks good!
Co-authored-by: tamirms <tamirms@gmail.com>
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
added a new metrics counter key 'horizon_ingest_errors_total' which ingest fsm will emit anytime it traps an error response from in ingestion state run.
Why
to configure prometheus alert against the new metric for early visibility on when an ingestion halt starts to form - https://github.com/stellar/prometheus/pull/243
Closes: #5256
Known limitations