-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per topic stats of Whisper nodes #1642
Comments
Some of these should be readily available from mailservers, for example:
|
@cammellos what's the best way of accessing those? Has anything materially changed since https://discuss.status.im/t/performance-of-mailservers/1215?
|
I haven't checked recently , but it should be the same as before, @jakubgs should be able to point in the right direction (you need to ssh into the machine and connect to the db running in docker) |
If you want SSH access to the cluster hosts you'll need to make a PR for If you just want temporary access to one of the hosts to fetch some data just send me your public key on Discord. |
How difficult would it be to get these up in a dashboard? Ideally we'd not just see a unique count of envelopes, but have it reflect actual envelopes sent (then split by duplicate and not duplicate) |
Note to self from Oskar:
|
Some metrics already exist:
One thing that probably can't be done via Prometheus are the per-topic metrics themselves. |
The topic hash is available in the |
It might be possible to get this data without modifying |
There's already a bot that could be modified to do this but it doesn't use |
Regarding the metrics themselves. I discussed them with Adam and came up with these comments:
|
Which is why we want to check them. We want them to be zero, and if they aren't that indicates an issue. The bloom filter one is also not unlikely to change, plus instrumentation can be used for light nodes as well as testing Waku mode. It's likely the biggest blocker bandwidth wise in terms of false positives, so we want stats there. |
But how can an envelope not match a full bloom filter? That makes no sense. |
Here is my understanding of the metrics:
I'm currently in the process of rewriting existing metrics and adding new ones using the new Prometheus library we used in #1648. I should have a PR tomorrow. |
First version of PR for new Whisper metrics: status-im/whisper#38 |
This should be allowed. Consider a scenario with two nodes A and B:
So, it's a completely valid case. We can't consider it immediately an error and disconnect. Alternatively, there should be some threshold which depending on a TTL of the message and number of duplicates could finally disconnect such a peer. |
I wouldn't go so far as to call it a completely valid case. We might choose to introduce some slack to make it easier for implementations if it is 100% necessary, but in principle it is bad behaviour. Node A should keep track of the messages it has sent to B already, and I suppose this should/could be persisted across restarts? Probably a spec issue. cc @kdeme EDIT: In any case, for this specific issue, we should still track it! |
This is just at-least-once vs at-most-once , which semantic do we want to go for? at-least-once seems the best suited, if there's some sort of acknowledgement between nodes, i.e I have received this rlp packet. If it's at-least-once then the occasional re-sending should not be considered bad behavior and it's indeed a valid case. |
I think it's a huge requirement from a Whisper node. Currently, it can be totally stateless. If we require it to track what was send to who between restarts, it can't be stateless anymore. Also, this requirement only makes sense with high PoW. High PoW + at-most-once gives you a protection against flooding. Low PoW + at-most-once gives nothing (or very little) because creating messages is cheap and is almost the same as at-least-once. |
While a useful discussion I think this is off-topic for OP. We still want to track it in terms of metrics. For specs, I suggest we open an issue and continue discussion there. Right now I believe it is currently not specified, and no implementation is disconnecting peers due to it (though it is suggested in the code https://github.com/status-im/nim-eth/blob/master/eth/p2p/rlpx_protocols/whisper_protocol.nim#L179-L191). |
Same point made by @adambabik counts for the nim-eth. After restart or disconnect/reconnect the list of already send/received message for that peer is gone. In nim-eth the nodes don't disconnect peers on this "bad" behaviour, or any other type in fact (for now). I'm not even sure if this will practically create huge issues if we would disconnect (from the first duplicate, not a threshold of duplicates, which would be better I guess). One peer less to be connected with, in this possibly(?) rare case? The network should be resilient to this I'd hope. But yes, best way to find out is measure it. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This has largely been addressed, and more work on this is happening with nim-waku benchmarking cc @jm-clius fyi |
Problem
I want to see envelope statistics for Whisper topics so I can reason about Whisper and scalability and address issues such as status-im/status-mobile#9081
Implementation
(Run as a full bloom filter node to pick up all topics).
Desired statistics. For a node I want to see aggregate stats, i.e. over time, for:
Acceptance Criteria
A text dump or some URL where I can go to a node and see numbers above over some time period, e.g. minute/hour/day.
Notes
Here's example of what some of this might look like, just from my own hacky testing:
Hash duplicates (with 24 whisper nodes connected, ideally this would have topic in it too):
Envelopes ingress stats:
Topic stats:
^ @jakubgs @adambabik
The text was updated successfully, but these errors were encountered: