Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autonat: add metrics #2086

Merged
merged 7 commits into from
Feb 16, 2023
Merged

autonat: add metrics #2086

merged 7 commits into from
Feb 16, 2023

Conversation

sukunrt
Copy link
Member

@sukunrt sukunrt commented Feb 11, 2023

fixes: #2017

@sukunrt
Copy link
Member Author

sukunrt commented Feb 11, 2023

@marten-seemann I've added.

Current Reachability Status
Current Reachability Confidence
Client: DialResponses by type
Server: DialResponses by type
Server: DialRejections by type(currently the only case handled is rate limited)

@sukunrt sukunrt changed the title add autonat metrics autonat: add metrics Feb 11, 2023
Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipfs/kubo#9647 was merged, so I could run this on my Kubo node. I had to rebase this branch on top of the current go-libp2p master first though.

This looks mostly good to me. I'm wondering if we should export the timestamp of the next probe (as calculated by scheduleProbe as well), might be nice to display that in Grafana as well.

"refId": "A"
}
],
"title": "Reachability status confidence",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you set the min and max values in the dashboard settings? Then this will be correctly scaled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

p2p/host/autonat/metrics.go Outdated Show resolved Hide resolved
"text": "Public"
},
"2": {
"color": "red",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to use a different color here. Red could be interpreted as "something is going wrong", which is not necessarily the case if you're actually behind a NAT. Maybe orange or violet would be better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fair. violet seems good to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

p2p/host/autonat/grafana-dashboards/autonat.json Outdated Show resolved Hide resolved
"refId": "A"
}
],
"title": "Server: Dial requests rate limited",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to work?

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's because no request is being rate limited. On my kubo node I was receiving very few requests. But I'll test this again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked my dashboard, and now it's showing one event (after a couple of hours), so it seems to work.

Why is this not part of the dashboard to the right though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dashboard on the right has segregation by Message_ResponseStatus. So this point would be counted under dial refused. It does seem fine to separate rate limit and dial refused and display on the same dashboard. Should I do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added other Refusal reasons too.

"refId": "A"
}
],
"title": "server: DialResponse By Type",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit, I'm always confused who's the client and server when thinking about AutoNAT. Maybe "Outgoing Dials" would be harder to mix up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outgoing does seem better. will fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fixed.

Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great work @sukunrt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

autonat: expose metrics
2 participants