-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(repo): Added prometheus and grafana & other implementations #175
Conversation
0xterminator
commented
Aug 26, 2024
•
edited
Loading
edited
- added nats-prometheus explorer to infra
- extended local docker-compose file with more services
- added graceful shutdown to publisher + tests
- added healtcheck handler to publisher + tests
- added exposed metrics handler to publisher + tests
- Set up Prometheus server to collect metrics from our NATS servers and applications
- Configure NATS exporters to expose relevant metrics to Prometheus
- Set up Grafana server and connect it to the Prometheus data source
- Create Grafana dashboards for key metrics:
- NATS server health and performance
- Message throughput and latency
- Subscription counts and patterns
- Error rates and types
- Resource utilization (CPU, memory, network)
- Implement custom metrics in our application code where necessary
- Set up alerting rules in Prometheus and Grafana for critical thresholds
- Create documentation for interpreting dashboards and responding to alerts
- Implement log aggregation and visualization in Grafana (consider using Loki)
- Set up proper authentication and authorization for Grafana access
- Create a runbook for common operational tasks and troubleshooting scenarios
2ce9385
to
eb164f3
Compare
4ca33a5
to
1104307
Compare
1104307
to
1371c55
Compare
edad907
to
6c5b841
Compare
2be6c23
to
721f639
Compare
330127c
to
2990eac
Compare
cd5a915
to
7e8cb1a
Compare
ab8d3c9
to
e2ac2e7
Compare
5cc89ef
to
9e8066f
Compare
357f3f3
to
8b1d818
Compare
8045ead
to
aea8429
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a large PR. Well done 👍🏾.
I didn't go through it thoroughly, but I've left some comments on things I caught at a glance. Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These surveyor files look like they should be autogenerated in a specified .gitignore
folder. Maybe I am missing something? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are files that go in these folders. I will add them to .gitignore however. thx
aea8429
to
a4604e6
Compare
We should avoid merging this large and complex PR. It would be better to split it into smaller, more manageable PRs. |
5bf99f2
to
656df70
Compare
656df70
to
42f9e21
Compare
42f9e21
to
4d2d821
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of implementing our own shutdown logic we could consider using tokio-graceful-shutdown
.
There's nothing wrong with hand-rolling it. I've done that before in the indexer
and fuel-core
has similar logic.
8f167d5
to
3d37b8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay to merge. We just need to probably revert the Box<dyn FuelCoreLike
change from your conflict resolution.
fuel_service: Arc<FuelService>, | ||
fuel_core: Box<FuelCore>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fuel_service
already resides in fuel_core
. I am also unsure why we've decided to remove the dynamic trait here 🤔?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought it back, was needed when refactoring the tests. fuel service is not inside fuel_core, fuel_core has the importer, chain id and hte db. We need an Arc of the service for the shutdown handlers and the share state.
.gitignore
Outdated
docker/monitoring/surveyor/jetstream | ||
docker/monitoring/surveyor/observations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these post-ignored? (it seems they are still part of your changes). It's fine if we want to hardcode them locally but I'm just making sure that we avoid git tracking them if they are meant to be auto-generated. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not autogenerated. They need to be there for the surveryor to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we shouldn't have them in the .gitignore
, should incase we update their configurations (?) Sorry, I know I initially asked for it but it's now clear that they are regular Grafana dashboards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed these 2 lines from gitignore now. In these folders we have configs that we should keep. Sorry about the confusion. All good now.
3d37b8b
to
fd950bc
Compare
I re-introduced it already. Thanks. |
.gitignore
Outdated
docker/monitoring/surveyor/jetstream | ||
docker/monitoring/surveyor/observations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we shouldn't have them in the .gitignore
, should incase we update their configurations (?) Sorry, I know I initially asked for it but it's now clear that they are regular Grafana dashboards
@@ -48,4 +48,5 @@ cargo run -p fuel-streams-publisher -- \ | |||
--relayer-v2-listening-contracts $RELAYER_V2_LISTENING_CONTRACTS \ | |||
--relayer-da-deploy-height $RELAYER_DA_DEPLOY_HEIGHT \ | |||
--relayer-log-page-size $RELAYER_LOG_PAGE_SIZE \ | |||
--reserved-nodes /dns4/p2p-testnet.fuel.network/tcp/30333/p2p/16Uiu2HAmDxoChB7AheKNvCVpD4PHJwuDGn8rifMBEHmEynGHvHrf | |||
--reserved-nodes /dns4/p2p-testnet.fuel.network/tcp/30333/p2p/16Uiu2HAmDxoChB7AheKNvCVpD4PHJwuDGn8rifMBEHmEynGHvHrf \ | |||
--server-addr 0.0.0.0:9000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to update our Dockerfile
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to , yes. Now we are running an http server with metircs and healthchecks.
fd950bc
to
dd0c957
Compare