Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(repo): Added prometheus and grafana & other implementations #175

Merged
merged 11 commits into from
Sep 17, 2024

Conversation

0xterminator
Copy link
Contributor

@0xterminator 0xterminator commented Aug 26, 2024

  • added nats-prometheus explorer to infra
  • extended local docker-compose file with more services
  • added graceful shutdown to publisher + tests
  • added healtcheck handler to publisher + tests
  • added exposed metrics handler to publisher + tests
  • Set up Prometheus server to collect metrics from our NATS servers and applications
  • Configure NATS exporters to expose relevant metrics to Prometheus
  • Set up Grafana server and connect it to the Prometheus data source
  • Create Grafana dashboards for key metrics:
  • NATS server health and performance
  • Message throughput and latency
  • Subscription counts and patterns
  • Error rates and types
  • Resource utilization (CPU, memory, network)
  • Implement custom metrics in our application code where necessary
  • Set up alerting rules in Prometheus and Grafana for critical thresholds
  • Create documentation for interpreting dashboards and responding to alerts
  • Implement log aggregation and visualization in Grafana (consider using Loki)
  • Set up proper authentication and authorization for Grafana access
  • Create a runbook for common operational tasks and troubleshooting scenarios

@0xterminator 0xterminator self-assigned this Aug 26, 2024
@0xterminator 0xterminator changed the title feat(repo): Added prometheus and grafana locally feat(repo): Added prometheus and grafana Aug 26, 2024
@0xterminator 0xterminator force-pushed the ep/feat/prometheus branch 3 times, most recently from 4ca33a5 to 1104307 Compare August 28, 2024 12:14
@pedronauck pedronauck force-pushed the main branch 2 times, most recently from cd5a915 to 7e8cb1a Compare September 6, 2024 23:36
@0xterminator 0xterminator force-pushed the ep/feat/prometheus branch 2 times, most recently from 357f3f3 to 8b1d818 Compare September 10, 2024 07:15
@0xterminator 0xterminator force-pushed the ep/feat/prometheus branch 3 times, most recently from 8045ead to aea8429 Compare September 10, 2024 11:58
Copy link
Member

@Jurshsmith Jurshsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a large PR. Well done 👍🏾.

I didn't go through it thoroughly, but I've left some comments on things I caught at a glance. Please let me know what you think.

crates/fuel-streams-publisher/src/main.rs Outdated Show resolved Hide resolved
crates/fuel-streams-core/src/stream/stream_impl.rs Outdated Show resolved Hide resolved
crates/fuel-streams-core/src/stream/stream_impl.rs Outdated Show resolved Hide resolved
crates/fuel-streams-publisher/src/blocks.rs Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These surveyor files look like they should be autogenerated in a specified .gitignore folder. Maybe I am missing something? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are files that go in these folders. I will add them to .gitignore however. thx

@pedronauck
Copy link
Collaborator

Quite a large PR. Well done 👍🏾.

I didn't go through it thoroughly, but I've left some comments on things I caught at a glance. Please let me know what you think.

We should avoid merging this large and complex PR. It would be better to split it into smaller, more manageable PRs.

@0xterminator 0xterminator force-pushed the ep/feat/prometheus branch 4 times, most recently from 5bf99f2 to 656df70 Compare September 12, 2024 13:51
@0xterminator 0xterminator changed the title feat(repo): Added prometheus and grafana feat(repo): Added prometheus and grafana & other implementations Sep 12, 2024
lostman
lostman previously approved these changes Sep 17, 2024
Copy link
Contributor

@lostman lostman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of implementing our own shutdown logic we could consider using tokio-graceful-shutdown.

There's nothing wrong with hand-rolling it. I've done that before in the indexer and fuel-core has similar logic.

lostman
lostman previously approved these changes Sep 17, 2024
Copy link
Member

@Jurshsmith Jurshsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay to merge. We just need to probably revert the Box<dyn FuelCoreLike change from your conflict resolution.

Comment on lines 75 to 76
fuel_service: Arc<FuelService>,
fuel_core: Box<FuelCore>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuel_service already resides in fuel_core. I am also unsure why we've decided to remove the dynamic trait here 🤔?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought it back, was needed when refactoring the tests. fuel service is not inside fuel_core, fuel_core has the importer, chain id and hte db. We need an Arc of the service for the shutdown handlers and the share state.

.gitignore Outdated
Comment on lines 18 to 19
docker/monitoring/surveyor/jetstream
docker/monitoring/surveyor/observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these post-ignored? (it seems they are still part of your changes). It's fine if we want to hardcode them locally but I'm just making sure that we avoid git tracking them if they are meant to be auto-generated. what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not autogenerated. They need to be there for the surveryor to work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we shouldn't have them in the .gitignore, should incase we update their configurations (?) Sorry, I know I initially asked for it but it's now clear that they are regular Grafana dashboards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed these 2 lines from gitignore now. In these folders we have configs that we should keep. Sorry about the confusion. All good now.

@0xterminator
Copy link
Contributor Author

Looks okay to merge. We just need to probably revert the Box<dyn FuelCoreLike change from your conflict resolution.

I re-introduced it already. Thanks.

.gitignore Outdated
Comment on lines 18 to 19
docker/monitoring/surveyor/jetstream
docker/monitoring/surveyor/observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we shouldn't have them in the .gitignore, should incase we update their configurations (?) Sorry, I know I initially asked for it but it's now clear that they are regular Grafana dashboards

@@ -48,4 +48,5 @@ cargo run -p fuel-streams-publisher -- \
--relayer-v2-listening-contracts $RELAYER_V2_LISTENING_CONTRACTS \
--relayer-da-deploy-height $RELAYER_DA_DEPLOY_HEIGHT \
--relayer-log-page-size $RELAYER_LOG_PAGE_SIZE \
--reserved-nodes /dns4/p2p-testnet.fuel.network/tcp/30333/p2p/16Uiu2HAmDxoChB7AheKNvCVpD4PHJwuDGn8rifMBEHmEynGHvHrf
--reserved-nodes /dns4/p2p-testnet.fuel.network/tcp/30333/p2p/16Uiu2HAmDxoChB7AheKNvCVpD4PHJwuDGn8rifMBEHmEynGHvHrf \
--server-addr 0.0.0.0:9000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to update our Dockerfile as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to , yes. Now we are running an http server with metircs and healthchecks.

@pedronauck pedronauck merged commit b24825a into main Sep 17, 2024
15 checks passed
@pedronauck pedronauck deleted the ep/feat/prometheus branch September 17, 2024 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants