-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add machine info to telemetry. Closes #404 #411
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliangruber could you please coordinate with @PatrickNercessian? I think he will want to use this data for the public dashboard too.
Good idea! @PatrickNercessian what do you think about these machine metrics and how they are stored? |
The original plan was to use Rust's built-in libraries for device info like this in Zinnia, and pass it to the modules. Do we prefer to check this inside Core and pass it to Zinnia who then passes it to the modules, similar to station_id? If so, should we instead generate this info outside of the telemetry context, so we can pass it Zinnia easily, and then just pass it to telemetry functions? |
I am strongly against passing machine info from Station Core to Zinnia and Station Modules. As I understand the proposal in this pull request, it records machine telemetry directly from Station Core to our InfluxDB, as an alternative design/solution to what we envisioned with Zinnia & Station Modules. If we land this PR, we will have telemetry points about Station machines written to InfluxDB and stored in a bucket with a 30-day retention (see #404). How much work is it to ingest this telemetry from InfluxDB to the Station public dashboard you are building, @PatrickNercessian? I think this will require a new component that is different from what we have discussed so far. I think it will need to periodically read data in InfluxDB and write aggregated metrics to a place where they can be persisted forever and exposed for the Dashboard UI. |
I agree with @bajtos, I don't see the benefit in passing this module data to the module source. Every Station should submit this, and modules don't need to be aware. Also, this is not per-module information. Or am I missing why this is important for the module to read? |
I think your plan might have been to submit this machine info with every measurement that the module submits, right? Then indeed it would need to be passed to the module. |
Co-authored-by: Miroslav Bajtoš <oss@bajtos.net>
BTW, in the longer term, I'd like to replace the process-uuid value with By including I am just sharing my thoughts and visions. This is not blocking progress on this pull request; we can land it without |
Ah I didn't realize this was writing to our centralized InfluxDB, I assumed it was for something locally in each Core deployment... I haven't worked with InfluxDB before, so the reliability of my estimate of effort is somewhat limited. That being said, I can think of a few routes we could go. A) In spark-evaluate, for every stationId within every round (we don't need to do for every measurement), we check InfluxDB for the latest machine info, and store that in the (stationId, day) row of There are probably some other ways we could do it too, these are my first thoughts |
@PatrickNercessian we discussed requirements for this and realized that storing in Influx is fine, since we are mostly interested at the current state of the network, not historic data, and that is easy and cheap to query from Influx |
Sounds good. Do our centralized services have permission to read from InfluxDB? I don't see any existing code about it. Also, any thoughts on the above two ideas about high-level implementation for this querying? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Yes! With an access token, it's readable from the public internet. I believe there were some snippets reading from it, currently our Grafana instance is the main consumer.
I think it would be nicer for spark-stats to periodically query the information. It can also listen to the smart contract for round start events. Otherwise we add a task to spark-evaluate that has nothing to do with evaluating the work done. |
Closes #404
I didn't see a good way to test free disk space (depends on mount points etc), but that's not relevant for Station modules atm anyway.