Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: topic health tracking #3212

Merged
merged 14 commits into from
Dec 24, 2024
Merged

feat: topic health tracking #3212

merged 14 commits into from
Dec 24, 2024

Conversation

gabrielmer
Copy link
Contributor

@gabrielmer gabrielmer commented Dec 17, 2024

Description

Tracking topic health in the Peer Manager in the same way as in go-waku, and integrating the feature into libwaku

Changes

  • defining a topicHealthChangeHandler app callback and setting it in libwaku
  • adding a topic health change event in libwaku
  • computing topic health in the peer manager and defining a callback to run when a topic changes its health state
  • creating topic_health.nim file for topic health definitions

Issue

closes #3196

Copy link

github-actions bot commented Dec 17, 2024

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:3212

Built from fe2a18c

Comment on lines 92 to 93
topicsHealth*: Table[string, TopicHealth]
onTopicHealthChange*: TopicHealthChangeHandler
Copy link
Contributor Author

@gabrielmer gabrielmer Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these two parameters + wakuRelay for feature-parity with go-waku's peer manager

https://github.com/waku-org/go-waku/blob/78b522db5063c6bb178dc5c16c41e1c308c4d7c0/waku/v2/peermanager/peer_manager.go#L70-L91

relay -> wakuRelay
subRelayTopics -> topicsHealth
TopicHealthNotifCh -> onTopicHealthChange

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think onTopicHealthChange should also belong only to WakuRelay

Comment on lines 943 to 960
if futs.len() > 0:
proc waiter(): Future[void] {.async.} =
# slow path - we have to wait for the handlers to complete
try:
futs = await allFinished(futs)
except CancelledError:
# propagate cancellation
for fut in futs:
if not (fut.finished):
await fut.cancelAndWait()

# check for errors in futures
for fut in futs:
if fut.failed:
let err = fut.readError()
warn "Error in health change handler", description = err.msg

return waiter()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks horrible honestly - copied it from nim-libp2p's handling of Relay callback events

https://github.com/vacp2p/nim-libp2p/blob/fd26f93b80cfeecd4e6f9c84eb7a15a3be26c1ee/libp2p/protocols/pubsub/pubsub.nim#L379-L408

It's in part necessary because the callbacks configured can be async

@@ -914,6 +967,7 @@ proc relayConnectivityLoop*(pm: PeerManager) {.async.} =
await pm.manageRelayPeers()
else:
await pm.connectToRelayPeers()
discard pm.updateTopicsHealth() # we don't want to await for the callbacks to finish
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discarding because the completion of the app callbacks shouldn't stop the peer manager from establishing new connections

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think we can have a separate async loop but running within the topic_healt.nim module

Comment on lines 10 to 15
proc `$`*(t: TopicHealth): string =
result =
case t
of UNHEALTHY: "UnHealthy"
of MINIMALLY_HEALTHY: "MinimallyHealthy"
of SUFFICIENTLY_HEALTHY: "SufficientlyHealthy"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@richard-ramos richard-ramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, sir!

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I've added some comments that I hope you find useful. IMO, we can have a simpler solution without even touching PeerManager :)
Great work so far 🙌

library/libwaku.nim Outdated Show resolved Hide resolved
library/libwaku.nim Outdated Show resolved Hide resolved
library/libwaku.nim Outdated Show resolved Hide resolved
library/libwaku.nim Show resolved Hide resolved
library/libwaku.nim Outdated Show resolved Hide resolved
Comment on lines 92 to 93
topicsHealth*: Table[string, TopicHealth]
onTopicHealthChange*: TopicHealthChangeHandler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think onTopicHealthChange should also belong only to WakuRelay

waku/node/peer_manager/peer_manager.nim Outdated Show resolved Hide resolved
Comment on lines 939 to 960
let fut = pm.onTopicHealthChange(topic, currentHealth)
if not fut.completed(): # Fast path for successful sync handlers
futs.add(fut)

if futs.len() > 0:
proc waiter(): Future[void] {.async.} =
# slow path - we have to wait for the handlers to complete
try:
futs = await allFinished(futs)
except CancelledError:
# propagate cancellation
for fut in futs:
if not (fut.finished):
await fut.cancelAndWait()

# check for errors in futures
for fut in futs:
if fut.failed:
let err = fut.readError()
warn "Error in health change handler", description = err.msg

return waiter()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't just invoke the callbacks in this point? Sorry but I can't quite see the need of this part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I updated it, took it from the callback handling of nim-libp2p but I agree we can just await for the callbacks to be finished and not define this other waiter proc

@@ -914,6 +967,7 @@ proc relayConnectivityLoop*(pm: PeerManager) {.async.} =
await pm.manageRelayPeers()
else:
await pm.connectToRelayPeers()
discard pm.updateTopicsHealth() # we don't want to await for the callbacks to finish
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I think we can have a separate async loop but running within the topic_healt.nim module

waku/node/peer_manager/topic_health.nim Outdated Show resolved Hide resolved
Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for it! 🙌

waku/factory/waku.nim Outdated Show resolved Hide resolved
waku/waku_relay/protocol.nim Show resolved Hide resolved
@gabrielmer gabrielmer force-pushed the feat-topic-health-tracking branch from c9977e4 to 55e4019 Compare December 23, 2024 18:35
@gabrielmer gabrielmer merged commit 6020a67 into master Dec 24, 2024
9 of 11 checks passed
@gabrielmer gabrielmer deleted the feat-topic-health-tracking branch December 24, 2024 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: expose a topic health status channel in nwaku
3 participants