-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lightpush): introduce ReliabilityMonitor and allow send
retries
#2130
Conversation
9bde3f7
to
73e350a
Compare
size-limit report 📦
|
5ae69f0
to
2eddaf5
Compare
3c82393
to
e51e369
Compare
thanks for the great reviews @weboko ! helpful! |
0bd895b
to
58b2c13
Compare
381a84e
to
89c7bae
Compare
@@ -89,16 +97,23 @@ class LightPushSDK extends BaseProtocolSDK implements ILightPushSDK { | |||
successes.push(success); | |||
} | |||
if (failure) { | |||
failures.push(failure); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving for some context, will make an issue for it later so that we can discuss
in status-go where retry is already implemented they do things differently and we will probably do something similar too
when message is sent it is queued, then it will be retried and once it is definitely fails - it will be communicated to consumer through something similar to event API
cc @waku-org/js-waku
const peer = this.connectedPeers.find((connectedPeer) => | ||
connectedPeer.id.equals(failure.peerId) | ||
); | ||
if (peer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we should retry to any peer available - otherwise we loose message if peer got dropped (renewed, went offline or just networking issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, this block is executed WHEN a particular peer fails to send a lightpush request.
example: we have 3 connected peers, and one fails to send it, this is the block that will reattempt delivery for that peer and renewing (instead of just renewiing)
we aren't losing the message in any case
reliability monitor later (even after if reattempts fail), resends the lightpush request after renewal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I think we should align here on what is desired behavior.
considering what we care is successfully pushing a message at this stage (i.e no errors while sending)
then it is enough for us to have at least 1 successful push - in that case no retries needed
if all failures - then just retry but not necessarily to the same peer, just any peer
and if during all of this any peer is failing 3 times - we renew
I probably summarized it for myself only, but just clarifying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering what we care is successfully pushing a message at this stage (i.e no errors while sending)
then it is enough for us to have at least 1 successful push - in that case no retries needed
Well, technically, yes. I agree. I can't really think of a case where we would indeed need redundancy if one of the peers can assure us that they indeed relayed the message further into GossipSub. However, for now, without LightPush V2, it's not trivial to get that. Thus, having redundancy + retries is good for now and we can revisit later if our apps perform well. (ref: https://discord.com/channels/1110799176264056863/1284121724433993758/1285014955023798342 as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decoupled into follow up #2140
`); | ||
void this.reliabilityMonitor.attemptRetriesOrRenew( | ||
failure.peerId, | ||
() => this.protocol.send(encoder, message, peer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't we risking getting into recursion here? apologies if I missed that code part
lightPush fails -> retry initiated -> lightPush fails -> ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm that's a good point.
technically, recursion would've been a neater solution here but this is not it
here: we detect (one peer) to have failed to send the lightpush request, and reattempt a few times. if it keeps failing, we renew the peer, and attempt (only once) to send the lightpush request through that peer.
here it would be neater to introduce recursion maybe, but seems like overkill for now TBH.
we can find peace in the fact that:
- we already use multiple peers to send it first time
- even if one of the peers fails, we will re attempt
- if even that fails, we will do renewal and use the new peer to send it
for it to fail, ALL peers would have to literally just not entertain our requests
in a case where we disregard that, introducing recursion here would be a neat solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
this.attempts.delete(peerIdStr); | ||
this.attempts.set(newPeer.id.toString(), 0); | ||
await protocolSend(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prev - https://github.com/waku-org/js-waku/pull/2130/files#r1758702483
here we call .send
in both branches making no exit of the recursion (i.e it will be called over and over it seems to me)
so I think here we should do .send
to newPeer
instead because next time from here attemptRetriesOrRenew
will be called for peerIdStr
and at that point this.attempts.
will not have it so it will just continue
I believe this is the reason why you needed to implement .stop
operation on the manager entity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.protocol.send()
is different from SDK's lightpush.send()
SDK.send()
calls this.protocol.send()
internally
o I think here we should do .send to newPeer instead because next time from here attemptRetriesOrRenew
We are indeed doing this.protocol.send(peer)
by binding the protocolSend()
function call:
void this.reliabilityMonitor.attemptRetriesOrRenew(
failure.peerId,
() => this.protocol.send(encoder, message, peer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, thanks for clarifying
then it is not an issue - we can have a recursion here
This is the last discussion I want to align on - #2130 (comment) |
Problem
Based on #2075 and #2069,
lightpush.send()
request fails, we renew the peer directlySolution
Based on the above problem statement, we want to do two things:
lightpush.send()
before we actually renew the peerNotes
Contribution checklist:
!
in title if breaks public API;