Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix_: no peers available supporting LightPush protocol after network restored from disabled state #6153

Merged
merged 2 commits into from
Dec 6, 2024

Conversation

qfrank
Copy link
Contributor

@qfrank qfrank commented Dec 2, 2024

fix relate mobile issues: 21452 / 21394, and might also fix part(missing messages) of 21172

relate mobile PR

@qfrank qfrank self-assigned this Dec 2, 2024
@status-im-auto
Copy link
Member

status-im-auto commented Dec 2, 2024

Jenkins Builds

Click to see older builds (16)
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 77bce1b #1 2024-12-02 11:15:47 ~4 min windows 📦zip
✔️ 77bce1b #1 2024-12-02 11:15:59 ~4 min linux 📦zip
✔️ 77bce1b #1 2024-12-02 11:16:46 ~5 min macos 📦zip
✔️ 77bce1b #1 2024-12-02 11:17:07 ~5 min ios 📦zip
✔️ 77bce1b #1 2024-12-02 11:17:30 ~6 min tests-rpc 📄log
✔️ 77bce1b #1 2024-12-02 11:17:34 ~6 min android 📦aar
✔️ 77bce1b #1 2024-12-02 11:19:22 ~8 min macos 📦zip
✖️ 77bce1b #1 2024-12-02 11:41:49 ~30 min tests 📄log
✔️ 4361cb2 #2 2024-12-05 06:31:18 ~4 min macos 📦zip
✔️ 4361cb2 #2 2024-12-05 06:32:06 ~5 min windows 📦zip
✔️ 4361cb2 #2 2024-12-05 06:32:12 ~5 min macos 📦zip
✔️ 4361cb2 #2 2024-12-05 06:32:24 ~5 min linux 📦zip
✔️ 4361cb2 #2 2024-12-05 06:32:25 ~5 min ios 📦zip
✔️ 4361cb2 #2 2024-12-05 06:33:03 ~6 min tests-rpc 📄log
✔️ 4361cb2 #2 2024-12-05 06:33:17 ~6 min android 📦aar
✔️ 4361cb2 #2 2024-12-05 06:57:53 ~31 min tests 📄log
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 7902aab #3 2024-12-06 03:36:24 ~3 min windows 📦zip
✔️ 7902aab #3 2024-12-06 03:37:37 ~4 min macos 📦zip
✔️ 7902aab #3 2024-12-06 03:37:56 ~5 min macos 📦zip
✔️ 7902aab #3 2024-12-06 03:38:02 ~5 min ios 📦zip
✔️ 7902aab #3 2024-12-06 03:38:30 ~5 min android 📦aar
✔️ 7902aab #3 2024-12-06 03:38:51 ~6 min linux 📦zip
✔️ 7902aab #3 2024-12-06 03:39:18 ~6 min tests-rpc 📄log
✖️ 7902aab #3 2024-12-06 04:03:28 ~30 min tests 📄log
✔️ 7902aab #4 2024-12-06 08:08:14 ~31 min tests 📄log
✔️ e623029 #4 2024-12-06 12:45:32 ~3 min windows 📦zip
✔️ e623029 #4 2024-12-06 12:46:49 ~4 min macos 📦zip
✔️ e623029 #4 2024-12-06 12:47:42 ~5 min ios 📦zip
✔️ e623029 #4 2024-12-06 12:47:43 ~5 min macos 📦zip
✔️ e623029 #4 2024-12-06 12:48:22 ~6 min linux 📦zip
✔️ e623029 #4 2024-12-06 12:48:26 ~5 min tests-rpc 📄log
✔️ e623029 #4 2024-12-06 12:48:47 ~6 min android 📦aar
✔️ e623029 #5 2024-12-06 13:12:23 ~29 min tests 📄log

wakuv2/waku.go Outdated Show resolved Hide resolved
wakuv2/waku.go Show resolved Hide resolved
@chaitanyaprem
Copy link
Contributor

chaitanyaprem commented Dec 3, 2024

The root-cause for issues status-im/status-mobile#21452 / status-im/status-mobile#21394 could not be due to peers not being connected after coming online. In light-clients peers can be connected as required i.e when we want to send messages via lightpush or subscribe to a filter to receive messages.

And store nodes are pre-configured and they get connected to as and when we want to do store queries automatically. Hence the discovery of peers and connecting to them need not be done explicitly after coming back online. So far one thing that i had noticed is that the peers that we discover may not be reachable and hence it takes time to make successful connections for filter and lightpush. In order to fix this i had worked on waku-org/go-waku#1244 which would remove peers from local store if they are not reachable. This doesn't solve the problem completely but atleast prevents the node from trying to reuse peers to which connection has failed recently.

@qfrank Can you give me the reasoning as to how you identified this as the root-cause for the issues?

@qfrank
Copy link
Contributor Author

qfrank commented Dec 3, 2024

Can you give me the reasoning as to how you identified this as the root-cause for the issues?

I identified it by debug status backend server ... what I found was: when user back to online from offline(keep offline before login), there's no peers available supporting LightPush protocol (PeerManager.FilterPeersByProto), with this fix, it works like a charm. You should use status backend server, which could help you a lot @chaitanyaprem

@qfrank
Copy link
Contributor Author

qfrank commented Dec 3, 2024

BTW, I found checkForConnectionChanges will invoke ConnectionChanged , and it checks if online by isOnline := len(w.node.Host().Network().Peers()) > 0 , I think this is not good. Thinking about mobile told backend network is available in 100%, but we also check if online by another condition. so online have different meanings at the same time, maybe we should handle these onlines separately? @chaitanyaprem

@chaitanyaprem
Copy link
Contributor

I identified it by debug status backend server ... what I found was: when user back to online from offline(keep offline before login), there's no peers available supporting LightPush protocol (PeerManager.FilterPeersByProto), with this fix, it works like a charm. You should use status backend server, which could help you a lot @chaitanyaprem

ah...interesting. got it since bootstrap nodes were not reachable during login and then they were post network is back online. Got it, this fix would definitely address the issue.

Thanks for this info, i will start using status backend server to simulate these scenarios. Any instructions on how to use and run it?

@chaitanyaprem
Copy link
Contributor

chaitanyaprem commented Dec 3, 2024

BTW, I found checkForConnectionChanges will invoke ConnectionChanged , and it checks if online by isOnline := len(w.node.Host().Network().Peers()) > 0 , I think this is not good. Thinking about mobile told backend network is available in 100%, but we also check if online by another condition. so online have different meanings at the same time, maybe we should handle these onlines separately? @chaitanyaprem

This is also a good point, in case of mobile we can use network connectivity available from device to determine network status and not just peer connectivity alone. Is there a way to access this info from this layer? That way we can use either peers==0 or network-state-identified-by-app = offline to determine network status.

The peers == 0 was initially used for relay which makes more sense there because in desktop or other devices it is hard to detect network connectivity otherwise unless you ping to some standard server. Also note that this method of detecting offline is more decentralized and permissionless since we are not depending on any centralized server. But for mobile since we get info from the device that can also be used to drive online/offline status.

Copy link
Contributor

@chaitanyaprem chaitanyaprem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ilmotta
Copy link
Contributor

ilmotta commented Dec 3, 2024

The peers == 0 was initially used for relay which makes more sense there because in desktop or other devices it is hard to detect network connectivity otherwise unless you ping to some standard server.

@chaitanyaprem, do you know if we considered other ways to reliably detect network connectivity in desktop environments? I know there's no multiplatform way to solve this problem, but for such an important problem I would lean on platform specific solutions, going from most reliable to least reliable or a combo of 2 solutions (where checking the number of peers would be somewhere in the middle I guess). Probably such solutions would be better implemented outside the Waku's umbrella (?)

@osmaczko @igor-sirotin do you remember past discussions about network connectivity for the desktop app?

@chaitanyaprem
Copy link
Contributor

chaitanyaprem commented Dec 4, 2024

@chaitanyaprem, do you know if we considered other ways to reliably detect network connectivity in desktop environments? I know there's no multiplatform way to solve this problem, but for such an important problem I would lean on platform specific solutions, going from most reliable to least reliable or a combo of 2 solutions (where checking the number of peers would be somewhere in the middle I guess). Probably such solutions would be better implemented outside the Waku's umbrella (?)

The current approach is more detailed than just simply peerCount = 0. We currently ping randomly few connected peers in a fixed interval(5 secs) and ping all connected peers at a separate interval(5 mins). This ping will confirm if we are still connected to the peers or drop the connection. This combined with regular check of peers = 0 seems like a pretty decent and decentralized(not depending on any centralized server) way to ensure network connectivity. But this is not enough to ensure that you have healthy connectivity for each shard which is more sophisticated and is documented here. This way to detect connectivity is not dependent on platform and indicates waku's connectivity status.
Do note that may not necessarily indicate node's network connectivity status if for some reason node is not able to connect to any waku nodes. But since waku is the lower layer communication protocol used by status, it would be safe to assume that if waku is not reachable, network is not reachable. Also, if the network online/offline status used by something else within the app context then maybe we should include some other indication apart from waku's connectivity status to identify as overall app-status as online/offline.

I am assuming when you say most reliable of solution it would involve some sort of a ping check with a centralized server. if you mean something else, do indicate the same.

@ilmotta
Copy link
Contributor

ilmotta commented Dec 4, 2024

@chaitanyaprem thank you for the detailed explanation.

I am assuming when you say most reliable of solution it would involve some sort of a ping check with a centralized server. if you mean something else, do indicate the same.

@chaitanyaprem My question/comment was due to a previous comment of yours, quoted here:

This is also a good point, in case of mobile we can use network connectivity available from device to determine network status and not just peer connectivity alone.

The peers == 0 was initially used for relay which makes more sense there because in desktop or other devices it is hard to detect network connectivity

-- #6153 (comment)

We can also leverage the host machine running the desktop app to determine the network connectivity, just as in mobile devices. Or is there something specific about a mobile device you are thinking about? What is hard(er) in devices running the desktop app about detecting network connectivity that's different from a mobile device?

My usage of the word reliable wasn't the clearest. By reliability I meant that detecting network connectivity in a device running the desktop app should be reliable and fast, similar to a mobile device. As you outlined yourself, this solution could be complementary to the Waku-only solution using peer connectivity, not a replacement.

@chaitanyaprem
Copy link
Contributor

@chaitanyaprem My question/comment was due to a previous comment of yours, quoted here:

ah, ok.

We can also leverage the host machine running the desktop app to determine the network connectivity, just as in mobile devices. Or is there something specific about a mobile device you are thinking about? What is hard(er) in devices running the desktop app about detecting network connectivity that's different from a mobile device?

i don't know if there is any platform agnostic way of doing this. But it may help if we can identify network connectivity outside of waku's purview maybe using some sort of an OS level indication of network not connected. Mobile devices especially running on android/IOS has some phone-home concept which i think is used by status app as well to identify if the device is offline or not apart from waku's own connectivity determination.

My usage of the word reliable wasn't the clearest. By reliability I meant that detecting network connectivity in a device running the desktop app should be reliable and fast, similar to a mobile device. As you outlined yourself, this solution could be complementary to the Waku-only solution using peer connectivity, not a replacement.

got it, and agree that if we can find another alternative way to identify device connectivity in desktop as well it will be helpful. Maybe this can be posted to desktop team to add this functionality into the app.

@qfrank qfrank force-pushed the fix/message_not_sent3 branch from 77bce1b to 4361cb2 Compare December 5, 2024 06:26
Copy link

codecov bot commented Dec 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.92%. Comparing base (5d75731) to head (e623029).
Report is 2 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #6153       +/-   ##
============================================
+ Coverage    13.89%   60.92%   +47.02%     
============================================
  Files          817      832       +15     
  Lines       108284   109858     +1574     
============================================
+ Hits         15048    66929    +51881     
+ Misses       91315    35081    -56234     
- Partials      1921     7848     +5927     
Flag Coverage Δ
functional 13.90% <100.00%> (+0.01%) ⬆️
unit 60.04% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
wakuv2/waku.go 68.86% <100.00%> (+22.67%) ⬆️

... and 653 files with indirect coverage changes

@qfrank qfrank force-pushed the fix/message_not_sent3 branch from 4361cb2 to 7902aab Compare December 6, 2024 03:32
@ilmotta ilmotta force-pushed the fix/message_not_sent3 branch from 7902aab to e623029 Compare December 6, 2024 12:42
@ilmotta ilmotta merged commit 55befd8 into develop Dec 6, 2024
18 checks passed
@ilmotta ilmotta deleted the fix/message_not_sent3 branch December 6, 2024 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants