Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant delay or missing messages on 4G in community (without re-login or app closure) #21172

Open
Horupa-Olena opened this issue Sep 4, 2024 · 59 comments

Comments

@Horupa-Olena
Copy link

Horupa-Olena commented Sep 4, 2024

Follow up: #20730

Problem

Users experience issues with delayed or missing messages in community on both iOS and Android devices when reconnecting from offline to online (4G) .

On iOS: There is a noticeable delay in receiving messages when switching from offline to online (4G).
On Android: Messages either do not arrive at all (waited up to 10 minutes) or only a portion of the messages is received immediately. In 6 tests conducted: 4 cases of missing messages and 2 message partly deliveries.

Reproduction

  1. Connect device to a 4G network (setting: WI-FI only for sync/backup)
  2. Put the app in the background.
  3. Go offline for a period of time.
  4. User 1 goes back online and brings the app to the foreground.
  5. User 2 sends messages to User 1 in real time (both is online)
  6. Observe the delay in receiving the messages.

Expected behavior

Messages should be received promptly after the user goes back online, even if the app was in the background.

Actual behavior

iOS: Messages are delayed upon reconnecting to the 4G network after being offline.
Android: Either no messages are received or only a subset of messages arrives instantly. The issue was observed in 4 out of 6 tests.

Additional Information

  • Status version: nightly 05.09.2024 or app version: 2.29.0 (20240905020836)
  • Operating System: Android, IOS

Comments:

Examples of tests on Android (4G enabled):

13:53 - device is offline
13:55 - device returns online
13:55 - 10 messages are sent from the desktop app
14:05 - no messages have arrived in the mobile app
111
photo_2024-09-06_15-13-44

14:06 - device is offline
14:07 - device returns online
14:09 - 10 messages are sent from the desktop app
14:09 - 5 out of 10 messages are received
14:21 - the remaining 5 messages still haven't arrived
2222
photo_2024-09-06_15-13-41
photo_2024-09-06_15-13-43

Android:

Mobile log:
logcat.zip

Desktop log:
geth.zip

IOS:

screencast.2024-09-04.12-51-23.mp4

Mobile log:
logs.zip

Desktop log:
geth.log

@Horupa-Olena Horupa-Olena changed the title Delay in receiving messages on 4G on Android (app in background without re-login/close app) Significant delay in receiving messages on 4G on Android (app in background without re-login/close app) Sep 6, 2024
@Horupa-Olena Horupa-Olena changed the title Significant delay in receiving messages on 4G on Android (app in background without re-login/close app) Significant delay or missing messages on 4G in community (without re-login or app closure) Sep 6, 2024
@seanstrom seanstrom self-assigned this Oct 10, 2024
@seanstrom
Copy link
Member

seanstrom commented Oct 11, 2024

I've looking into this issue and I thought I should share some notes on what I've been seeing.

Summary

  • I was able to reproduce the glitch when running the app on a iOS device (not simulator)
    • My setup involved me using two devices: one desktop client and one mobile client
      • I joined a new open community with both accounts and started sending messages
      • I used the Xcode network utilities app to artificially constrain the iOS device to use a 4G connection
      • I then put the mobile app into the background by navigating to the iOS home screen, and then waited (roughly 5-10 minutes)
      • I then returned to the mobile app and sent a message from the desktop client into the community chat
    • I was also able to reproduce this issue when disabling my network connection, backgrounding the app for a 5-10 minutes, then returning to the app, and sending a message from the desktop client.
  • It seems that the app does wait a long time for new messages to appear
    • When examining the logs, I can see that the app attempts to re-subscribe with a filter, but it fails to resubscribe, and there seems to trying to re-connect to a peer and subscribe but it possibly cannot reach the peer (?)

Logs

Notable logs that happen in sequence

  1. Fail to subscribe
  2. failed to resubscribe for filter
  3. subscribing again
  4. handleFilterSubscribeOptions
  5. handleFilterSubscribeOptions exit
  6. peer selection
  7. dialing peer
  8. Failed to subscribe
Here's a snippet of the kind of logs that I'm seeing in the app geth.log

Log Snippet

  ERROR[10-11|10:12:28.020|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:425]                  Failed to subscribe                      pubSubTopic=/waku/2/rs/16/32 contentTopics="[/waku/1/0xc3384567/rfc26 /waku/1/0x365d0749/rfc26 /waku/1/0xad134f3a/rfc26]" error="failed to dial: failed to dial 16Uiu2HAm3BU3SiRt2Evt96KnbCKW9WRYpoGyqYJDMex3e3mFXTTv: all dials failed\n  * [/ip4/151.64.178.110/tcp/56006] dial backoff"
  DEBUG[10-11|10:12:28.020|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:177]                       failed to resubscribe for filter         apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" error="subscriptions failed for contentTopics: /waku/1/0xc3384567/rfc26,/waku/1/0x365d0749/rfc26,/waku/1/0xad134f3a/rfc26"
  DEBUG[10-11|10:12:28.020|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:143]                       subscription status                      apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" sub-count=1 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }"
  DEBUG[10-11|10:12:28.020|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:167]                       subscribing again                        apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" num-peers=1
  DEBUG[10-11|10:12:28.020|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:192]                       subscribing with peers to exclude        apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" excluded-peers=[16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv]
  DEBUG[10-11|10:12:28.020|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:337]                  handleFilterSubscribeOptions             peerCount=1 excludePeersLen=1
  DEBUG[10-11|10:12:28.020|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/peermanager/peer_selection.go:193]              Select Peers                             selectionCriteria="{\"selectionType\":0,\"protocolId\":\"/vac/waku/filter-subscribe/2.0.0-beta1\",\"pubsubTopics\":[\"/waku/2/rs/16/32\"],\"specificPeers\":null,\"maxPeerCount\":1,\"excludePeers\":{\"\\u0000%\\u0008\\u0002\\u0012!\\u0003\\ufffd \\ufffdڿ\\ufffd\\u001f\\ufffd8\\ufffdJ\\ufffd\\ufffd\\ufffd \\ufffdO\\ufffd\\ufffd^ui\\u000e̓'\\ufffdV\\ufffd\\ufffdp\\ufffd\":{}}}" excludedPeers=16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv
  DEBUG[10-11|10:12:28.021|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:354]                  handleFilterSubscribeOptions exit        selectedPeerCount=1
  DEBUG[10-11|10:12:28.021|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:379]                  peer selection                           params.maxPeers=1
  DEBUG[10-11|10:12:28.021|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       host 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV dialing 16Uiu2HAmCMLcxjaurcqfbsaGWM7fn8mvr8jFJigy9qJirP9u4usC 
  DEBUG[10-11|10:12:28.021|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:251]                                                       dialing peer                             from=16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV to=16Uiu2HAmCMLcxjaurcqfbsaGWM7fn8mvr8jFJigy9qJirP9u4usC
  DEBUG[10-11|10:12:28.022|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       network for 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV finished dialing 16Uiu2HAmCMLcxjaurcqfbsaGWM7fn8mvr8jFJigy9qJirP9u4usC 
  ERROR[10-11|10:12:28.022|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:425]                  Failed to subscribe                      pubSubTopic=/waku/2/rs/16/32 contentTopics="[/waku/1/0xc3384567/rfc26 /waku/1/0x365d0749/rfc26 /waku/1/0xad134f3a/rfc26]" error="failed to dial: failed to dial 16Uiu2HAmCMLcxjaurcqfbsaGWM7fn8mvr8jFJigy9qJirP9u4usC: no addresses"
  DEBUG[10-11|10:12:28.022|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:177]                       failed to resubscribe for filter         apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" error="subscriptions failed for contentTopics: /waku/1/0xc3384567/rfc26,/waku/1/0x365d0749/rfc26,/waku/1/0xad134f3a/rfc26"
  DEBUG[10-11|10:12:28.022|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:143]                       subscription status                      apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xad134f3a/rfc26/waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26 ] }" sub-count=1 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }"
  DEBUG[10-11|10:12:28.022|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:167]                       subscribing again                        apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xad134f3a/rfc26/waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26 ] }" num-peers=1
  DEBUG[10-11|10:12:28.022|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:192]                       subscribing with peers to exclude        apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" excluded-peers=[16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv]
  DEBUG[10-11|10:12:28.022|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:337]                  handleFilterSubscribeOptions             peerCount=1 excludePeersLen=1
  DEBUG[10-11|10:12:28.022|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/peermanager/peer_selection.go:193]              Select Peers                             selectionCriteria="{\"selectionType\":0,\"protocolId\":\"/vac/waku/filter-subscribe/2.0.0-beta1\",\"pubsubTopics\":[\"/waku/2/rs/16/32\"],\"specificPeers\":null,\"maxPeerCount\":1,\"excludePeers\":{\"\\u0000%\\u0008\\u0002\\u0012!\\u0003\\ufffd \\ufffdڿ\\ufffd\\u001f\\ufffd8\\ufffdJ\\ufffd\\ufffd\\ufffd \\ufffdO\\ufffd\\ufffd^ui\\u000e̓'\\ufffdV\\ufffd\\ufffdp\\ufffd\":{}}}" excludedPeers=16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv
  DEBUG[10-11|10:12:28.023|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:354]                  handleFilterSubscribeOptions exit        selectedPeerCount=1
  DEBUG[10-11|10:12:28.023|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:379]                  peer selection                           params.maxPeers=1
  DEBUG[10-11|10:12:28.023|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       host 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV dialing 16Uiu2HAmE8FeXNJnqppRq6Rt3PTfGLWUHtxjBUmHzrDFeFPoPkLM 
  DEBUG[10-11|10:12:28.023|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:251]                                                       dialing peer                             from=16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV to=16Uiu2HAmE8FeXNJnqppRq6Rt3PTfGLWUHtxjBUmHzrDFeFPoPkLM
  DEBUG[10-11|10:12:28.023|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       network for 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV finished dialing 16Uiu2HAkypMfCMupzqGz1c56goDxTwMpQ3nyz7sxJU4PhAT6JNSL 
  ERROR[10-11|10:12:28.023|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:425]                  Failed to subscribe                      pubSubTopic=/waku/2/rs/16/64 contentTopics="[/waku/1/0x28259c00/rfc26 /waku/1/0x397f6567/rfc26]" error="failed to dial: failed to dial 16Uiu2HAkypMfCMupzqGz1c56goDxTwMpQ3nyz7sxJU4PhAT6JNSL: all dials failed\n  * [/ip4/212.41.7.173/tcp/51312] dial backoff"
  DEBUG[10-11|10:12:28.024|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:177]                       failed to resubscribe for filter         apisub-id=c8df0caa-2b49-42dd-be89-c1b046bb22f4 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }" error="subscriptions failed for contentTopics: /waku/1/0x28259c00/rfc26,/waku/1/0x397f6567/rfc26"
  DEBUG[10-11|10:12:28.024|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:143]                       subscription status                      apisub-id=c8df0caa-2b49-42dd-be89-c1b046bb22f4 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }" sub-count=1 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }"
  DEBUG[10-11|10:12:28.024|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:167]                       subscribing again                        apisub-id=c8df0caa-2b49-42dd-be89-c1b046bb22f4 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }" num-peers=1
  DEBUG[10-11|10:12:28.024|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:192]                       subscribing with peers to exclude        apisub-id=c8df0caa-2b49-42dd-be89-c1b046bb22f4 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }" excluded-peers=[16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv]
  DEBUG[10-11|10:12:28.024|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:337]                  handleFilterSubscribeOptions             peerCount=1 excludePeersLen=1
  DEBUG[10-11|10:12:28.024|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/peermanager/peer_selection.go:193]              Select Peers                             selectionCriteria="{\"selectionType\":0,\"protocolId\":\"/vac/waku/filter-subscribe/2.0.0-beta1\",\"pubsubTopics\":[\"/waku/2/rs/16/64\"],\"specificPeers\":null,\"maxPeerCount\":1,\"excludePeers\":{\"\\u0000%\\u0008\\u0002\\u0012!\\u0003\\ufffd \\ufffdڿ\\ufffd\\u001f\\ufffd8\\ufffdJ\\ufffd\\ufffd\\ufffd \\ufffdO\\ufffd\\ufffd^ui\\u000e̓'\\ufffdV\\ufffd\\ufffdp\\ufffd\":{}}}" excludedPeers=16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv
  DEBUG[10-11|10:12:28.024|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:354]                  handleFilterSubscribeOptions exit        selectedPeerCount=1
  DEBUG[10-11|10:12:28.024|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:379]                  peer selection                           params.maxPeers=1
  DEBUG[10-11|10:12:28.024|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       host 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV dialing 16Uiu2HAm463fLnSYYQjCtk5p1kAh1SeTT8BHH2ia3sFAVdqkcD9k 
  DEBUG[10-11|10:12:28.025|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:251]                                                       dialing peer                             from=16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV to=16Uiu2HAm463fLnSYYQjCtk5p1kAh1SeTT8BHH2ia3sFAVdqkcD9k
  DEBUG[10-11|10:12:28.025|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       [limiter] clearing all peer dials: 16Uiu2HAm3BU3SiRt2Evt96KnbCKW9WRYpoGyqYJDMex3e3mFXTTv 
  DEBUG[10-11|10:12:28.025|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       [limiter] clearing all peer dials: 16Uiu2HAmCMLcxjaurcqfbsaGWM7fn8mvr8jFJigy9qJirP9u4usC 
  DEBUG[10-11|10:12:28.025|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       network for 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV finished dialing 16Uiu2HAmE8FeXNJnqppRq6Rt3PTfGLWUHtxjBUmHzrDFeFPoPkLM 
  DEBUG[10-11|10:12:28.025|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       [limiter] clearing all peer dials: 16Uiu2HAkypMfCMupzqGz1c56goDxTwMpQ3nyz7sxJU4PhAT6JNSL 
  DEBUG[10-11|10:12:28.025|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       [limiter] clearing all peer dials: 16Uiu2HAmE8FeXNJnqppRq6Rt3PTfGLWUHtxjBUmHzrDFeFPoPkLM 
  DEBUG[10-11|10:12:28.025|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       [limiter] clearing all peer dials: 16Uiu2HAm463fLnSYYQjCtk5p1kAh1SeTT8BHH2ia3sFAVdqkcD9k 
  DEBUG[10-11|10:12:28.025|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       network for 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV finished dialing 16Uiu2HAm463fLnSYYQjCtk5p1kAh1SeTT8BHH2ia3sFAVdqkcD9k 
  ERROR[10-11|10:12:28.025|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:425]                  Failed to subscribe                      pubSubTopic=/waku/2/rs/16/64 contentTopics="[/waku/1/0x28259c00/rfc26 /waku/1/0x397f6567/rfc26]" error="failed to dial: failed to dial 16Uiu2HAm463fLnSYYQjCtk5p1kAh1SeTT8BHH2ia3sFAVdqkcD9k: all dials failed\n  * [/ip4/138.0.200.202/tcp/62515] dial backoff"
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:177]                       failed to resubscribe for filter         apisub-id=c8df0caa-2b49-42dd-be89-c1b046bb22f4 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }" error="subscriptions failed for contentTopics: /waku/1/0x28259c00/rfc26,/waku/1/0x397f6567/rfc26"
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:143]                       subscription status                      apisub-id=c8df0caa-2b49-42dd-be89-c1b046bb22f4 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }" sub-count=1 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }"
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:167]                       subscribing again                        apisub-id=c8df0caa-2b49-42dd-be89-c1b046bb22f4 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }" num-peers=1
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:192]                       subscribing with peers to exclude        apisub-id=c8df0caa-2b49-42dd-be89-c1b046bb22f4 content-filter="{ pubsubTopic: /waku/2/rs/16/64, contentTopics: [ /waku/1/0x28259c00/rfc26/waku/1/0x397f6567/rfc26 ] }" excluded-peers=[16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv]
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:337]                  handleFilterSubscribeOptions             peerCount=1 excludePeersLen=1
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/peermanager/peer_selection.go:193]              Select Peers                             selectionCriteria="{\"selectionType\":0,\"protocolId\":\"/vac/waku/filter-subscribe/2.0.0-beta1\",\"pubsubTopics\":[\"/waku/2/rs/16/64\"],\"specificPeers\":null,\"maxPeerCount\":1,\"excludePeers\":{\"\\u0000%\\u0008\\u0002\\u0012!\\u0003\\ufffd \\ufffdڿ\\ufffd\\u001f\\ufffd8\\ufffdJ\\ufffd\\ufffd\\ufffd \\ufffdO\\ufffd\\ufffd^ui\\u000e̓'\\ufffdV\\ufffd\\ufffdp\\ufffd\":{}}}" excludedPeers=16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv
  ERROR[10-11|10:12:28.025|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:425]                  Failed to subscribe                      pubSubTopic=/waku/2/rs/16/32 contentTopics="[/waku/1/0xc3384567/rfc26 /waku/1/0x365d0749/rfc26 /waku/1/0xad134f3a/rfc26]" error="failed to dial: failed to dial 16Uiu2HAmE8FeXNJnqppRq6Rt3PTfGLWUHtxjBUmHzrDFeFPoPkLM: all dials failed\n  * [/ip4/103.227.96.201/tcp/40345] dial backoff"
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:354]                  handleFilterSubscribeOptions exit        selectedPeerCount=1
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:177]                       failed to resubscribe for filter         apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26/waku/1/0xc3384567/rfc26 ] }" error="subscriptions failed for contentTopics: /waku/1/0xc3384567/rfc26,/waku/1/0x365d0749/rfc26,/waku/1/0xad134f3a/rfc26"
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:143]                       subscription status                      apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" sub-count=1 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }"
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:167]                       subscribing again                        apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" num-peers=1
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:192]                       subscribing with peers to exclude        apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" excluded-peers=[16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv]
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:337]                  handleFilterSubscribeOptions             peerCount=1 excludePeersLen=1
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/peermanager/peer_selection.go:193]              Select Peers                             selectionCriteria="{\"selectionType\":0,\"protocolId\":\"/vac/waku/filter-subscribe/2.0.0-beta1\",\"pubsubTopics\":[\"/waku/2/rs/16/32\"],\"specificPeers\":null,\"maxPeerCount\":1,\"excludePeers\":{\"\\u0000%\\u0008\\u0002\\u0012!\\u0003\\ufffd \\ufffdڿ\\ufffd\\u001f\\ufffd8\\ufffdJ\\ufffd\\ufffd\\ufffd \\ufffdO\\ufffd\\ufffd^ui\\u000e̓'\\ufffdV\\ufffd\\ufffdp\\ufffd\":{}}}" excludedPeers=16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:354]                  handleFilterSubscribeOptions exit        selectedPeerCount=1
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:379]                  peer selection                           params.maxPeers=1
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       host 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV dialing 16Uiu2HAmCb3p2uuxh2ncooR1qxNheLNmatj7ePAZQEKsd3cTRTnd 
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:251]                                                       dialing peer                             from=16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV to=16Uiu2HAmCb3p2uuxh2ncooR1qxNheLNmatj7ePAZQEKsd3cTRTnd
  DEBUG[10-11|10:12:28.026|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       network for 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV finished dialing 16Uiu2HAmCb3p2uuxh2ncooR1qxNheLNmatj7ePAZQEKsd3cTRTnd 
  ERROR[10-11|10:12:28.026|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:425]                  Failed to subscribe                      pubSubTopic=/waku/2/rs/16/32 contentTopics="[/waku/1/0xc3384567/rfc26 /waku/1/0x365d0749/rfc26 /waku/1/0xad134f3a/rfc26]" error="failed to dial: failed to dial 16Uiu2HAmCb3p2uuxh2ncooR1qxNheLNmatj7ePAZQEKsd3cTRTnd: no addresses"
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:177]                       failed to resubscribe for filter         apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" error="subscriptions failed for contentTopics: /waku/1/0xc3384567/rfc26,/waku/1/0x365d0749/rfc26,/waku/1/0xad134f3a/rfc26"
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:143]                       subscription status                      apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" sub-count=1 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }"
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:167]                       subscribing again                        apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" num-peers=1
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/filter/filter.go:192]                       subscribing with peers to exclude        apisub-id=154a6d67-b3e5-41bd-9c58-50edb950f248 content-filter="{ pubsubTopic: /waku/2/rs/16/32, contentTopics: [ /waku/1/0xc3384567/rfc26/waku/1/0x365d0749/rfc26/waku/1/0xad134f3a/rfc26 ] }" excluded-peers=[16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv]
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:337]                  handleFilterSubscribeOptions             peerCount=1 excludePeersLen=1
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/peermanager/peer_selection.go:193]              Select Peers                             selectionCriteria="{\"selectionType\":0,\"protocolId\":\"/vac/waku/filter-subscribe/2.0.0-beta1\",\"pubsubTopics\":[\"/waku/2/rs/16/32\"],\"specificPeers\":null,\"maxPeerCount\":1,\"excludePeers\":{\"\\u0000%\\u0008\\u0002\\u0012!\\u0003\\ufffd \\ufffdڿ\\ufffd\\u001f\\ufffd8\\ufffdJ\\ufffd\\ufffd\\ufffd \\ufffdO\\ufffd\\ufffd^ui\\u000e̓'\\ufffdV\\ufffd\\ufffdp\\ufffd\":{}}}" excludedPeers=16Uiu2HAmPqUttg6kDmmk1nyqzq2En3i5BTiye1rTYHooHAeE3RNv
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:354]                  handleFilterSubscribeOptions exit        selectedPeerCount=1
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:379]                  peer selection                           params.maxPeers=1
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       host 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV dialing 16Uiu2HAmULXfYgvnAhAzBwFGSJoJ2roPudAsAJ3LnYLNtwfR7HGJ 
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:251]                                                       dialing peer                             from=16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV to=16Uiu2HAmULXfYgvnAhAzBwFGSJoJ2roPudAsAJ3LnYLNtwfR7HGJ
  DEBUG[10-11|10:12:28.027|github.com/status-im/status-go/vendor/go.uber.org/zap/sugar.go:198]                                                       network for 16Uiu2HAm7GY3UyS7vCfBiCqAzVhJG4j1nGys4mfQDzJ8rwPgHkQV finished dialing 16Uiu2HAmULXfYgvnAhAzBwFGSJoJ2roPudAsAJ3LnYLNtwfR7HGJ 

logs-fail-to-resubscribe.zip

@ilmotta ilmotta removed the bug label Oct 14, 2024
@chaitanyaprem
Copy link
Contributor

chaitanyaprem commented Oct 15, 2024

updating after some analysis:

  1. I have noticed that some of the filters have been uninstalled through the logs which should not be the case even if app goes into background. Or maybe there is a flow which just uninstalls filters which is normal (would be good to know what is the cause of filter removal).
  2. Note that failed to subscribe and trying to resubscribe is expected because desktop peers also support filter and mobile may be trying to subscribe to a desktop peer which maybe offline and hence you can see dial errors. The idea is in case of failure, it should recover as subscriptions are retried.
  3. i have noticed a sudden drop in filter subscriptions because the peers are returning 404 for filter ping. need to analyze why that could be happening. this could lead to delay in messages being received. Wondering if app going into background is causing peers to be disconnected and pings to not happen? any idea of the behaviour cc @seanstrom

i see drop in peer-counts at 10-10|16:22:56.820 , 10-10|16:24:51.155 , 10-10|16:28:30.978, 10-10|16:42:45.776 10-11|08:34:21.105 times. it would help if you can give some timeline of events so that i can understand logs better.

  1. Even if filter fails, there should be periodic store queries happening every 1 minute to check for missing messages and query from store node. is this disabled when user is on 4G? cc @seanstrom Because right now we depend on store for reliability and if this is disabled, might be causing issues.
    Will analyze further as i also see below. Will update further after analysis.
  • some weird logs which should not be there , probably due to hitting some sort of race condition between resubcribe and unsubscribe. ERROR[10-10|15:40:18.671|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:425] Failed to subscribe pubSubTopic=/waku/2/rs/16/32 contentTopics=[] error="at least one contenttopic should be specified" this is not a concern as i see this resolved automatically.
  • there was a perioid where resubscribe did not happen eventhough there are subscriptions. Looks like this is not the case, just the logs were misleading due to the way filter stats are displayed. Will make a small fix for this.

it would help to know which messages were missed, if you have any messageID, contentTopic , time and date to look in more details what exactly is happening.

@seanstrom
Copy link
Member

seanstrom commented Oct 17, 2024

Hey @chaitanyaprem thanks for analysis, great stuff 🙌

I've attempted to reproduce this glitch again, but this time I reproduced the .glitch using a combination of an iOS simulator for sending messages to another iOS device. This was done because it was easier for me to access the logs of the simulator and I was having some issues running the dev build of the desktop app. If it's important to reproduce this glitch lmk and I can try again.

That being said, it did take me a while to finally capture the same glitch, but I was able to get a copy of the logs from the sending device and the receiver device so we can compare them.

More details here: #21172 (comment)


To respond to your question about the disconnecting peers when backgrounding the app:

i have noticed a sudden drop in filter subscriptions because the peers are returning 404 for filter ping. need to analyze why that could be happening. this could lead to delay in messages being received. Wondering if app going into background is causing peers to be disconnected and pings to not happen? any idea of the behaviour

Supposedly when we background the application there could be some behaviour where the operating system will "idle" the application if it's waiting in the background. This could have an affect on network activity or network resources, but I cannot say for certain. Perhaps @ilmotta or @igor-sirotin may have some more context about this behaviour.

Though what I've been able to notice is that this issue seems to mainly occur when the app returns to the foreground (the app was reopened), and then the network reconnects. So potentially on a low or missing data connection (mobile lte) the app may go through a connection state change. When that happens perhaps there's some logic that handling resources based on whether the connection is offline or online, but I'm not sure so I'd need to do some more digging through the code to find logic related to connection state.

I do know that there is logic for checking the connection state when that is going from the background to the foreground, but I'm not sure if there's any logic for re-examining the connection state when the app is already running. This seems to be an important detail since I'm mainly able to reproduce this issue when reconnecting to the network after opening the app.


To respond to your question about missing messages:

Even if filter fails, there should be periodic store queries happening every 1 minute to check for missing messages and query from store node. is this disabled when user is on 4G? cc @seanstrom Because right now we depend on store for reliability and if this is disabled, might be causing issues.

I do notice from the logs that we are attempting to retrieve missing messages, and it seems we've configured status-go to retrieve these missing message, but I haven't found any conditions for not doing retrieving the messages based on the connection state atm. I'll need to look deeper into the codebase to verify what's happening, but I think we're setup to to retrieve missing messages.

@seanstrom
Copy link
Member

Here's some updated logs and context from reproducing this glitch:

  • It was difficult to reproduce this glitch, so it took me multiple attempts of disconnecting the iOS device from the network and and reconnecting the network after backgrounding the mobile and reopening the app to the foreground. I also needed to wait roughly 5 minutes after backgrounding the app before bringing it to the foreground again.

    • This glitch is a little peculiar since I was able to receive a message after reconnecting, but I was not able to receive an additional message after reconnecting.
  • For context, both devices are on the same WiFi network and the iOS simulator remains online while the other device goes offline.

  • From the logs of the iOS Simulator (the device sending a message)

    INFO [10-17|21:10:54.912|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/peer_exchange/client.go:156]      connecting to newly discovered peers     count=20
    INFO [10-17|21:10:56.232|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/publish/message_sender.go:115]         publishing message via lightpush         envelopeHash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xe88187f2/rfc26 timestamp=1,729,199,456,232,070,000
    INFO [10-17|21:10:56.525|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:214]             received message push                    peerID=16Uiu2HAm8mUZ18tBWPXDQsaF7PbCKYA35z7WB2xNZH2EVq1qS8LJ hash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xe88187f2/rfc26 timestamp=1,729,199,456,232,070,000
    INFO [10-17|21:10:56.812|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:214]             received message push                    peerID=16Uiu2HAmAR24Mbb6VuzoyUiGx42UenDkshENVDj4qnmmbabLvo31 hash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xe88187f2/rfc26 timestamp=1,729,199,456,232,070,000
    INFO [10-17|21:10:56.932|github.com/status-im/status-go/protocol/messenger.go:3585]                                                           processing message                       site=RetrieveAll hash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 message-id=0x7e48aaa26c5b8c3247479417b133ad55b321821de66e81e937642386a0ebc16f type=CHAT_MESSAGE senderID=0x046bf754f6eecd4a6d73557b4795eef4e824f13c63d4343356f6ec1d6597c9ba02f09dc75809f208283f370d60f0a76555dc132fdc17970460a1f93c87d7d01490
    WARN [10-17|21:10:56.932|github.com/status-im/status-go/protocol/messenger_maps.go:65]                                                        contacts map: loading own identity       contactID=0x046bf754f6eecd4a6d73557b4795eef4e824f13c63d4343356f6ec1d6597c9ba02f09dc75809f208283f370d60f0a76555dc132fdc17970460a1f93c87d7d01490
    
    • contentTopic=/waku/1/0xe88187f2/rfc26
    • message-id=0x7e48aaa26c5b8c3247479417b133ad55b321821de66e81e937642386a0ebc16f
    • timestamp=1,729,199,456,232,070,000
    • 10-17|21:10:56.232
  • From the logs of the iOS Device (the device receiving a message)

    WARN [10-17|21:10:56.699|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:207]                  received messagepush with invalid subscription parameters peerID=16Uiu2HAmNTpGnyZ8W1BK2sXEmgSCNWiyDKgRU3NBR2DXST2HzxRU hash=0xd91e8399546bfecf15c1ff688d5a0856fad7c6bef920eb4a4fda4af33c6d8595 pubsubTopic=/waku/2/rs/16/32 contentTopic=/waku/1/0xe88187f2/rfc26 timestamp=1,729,199,456,232,070,000
    DEBUG[10-17|21:10:57.568|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185]            retrieving history to check for missing messages peerID=16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P contentTopics="[/waku/1/0x7e6732ec/rfc26 /waku/1/0x5759bbff/rfc26 /waku/1/0xacfae801/rfc26 /waku/1/0x696bdb86/rfc26 /waku/1/0xa35b9bb3/rfc26 /waku/1/0xb69e0a57/rfc26 /waku/1/0x0b65ce26/rfc26 /waku/1/0xe5eb41b8/rfc26 /waku/1/0xa9a5e604/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729192525821337000 to=1729192778950497000 attempt=1
    ERROR[10-17|21:10:57.600|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:428]                  Failed to subscribe                      pubSubTopic=/waku/2/rs/16/32 contentTopics="[/waku/1/0x6c561c0d/rfc26 /waku/1/0x8396b7a1/rfc26 /waku/1/0xe88187f2/rfc26]" error="failed to dial: failed to dial 16Uiu2HAmBPzatVXrBnv6bJVdUNDXTUJ5y9JRGAbtcHdhzhjm6Ssk: all dials failed\n  * [/ip4/152.58.243.206/tcp/37331] dial tcp4 0.0.0.0:49443->152.58.243.206:37331: i/o timeout"
    
    • Here I noticed that we do receive a message, but it has a warning of: received messagepush with invalid subscription parameters
    • Missing messages do seem to be attempted to be retrieved but it doesn’t seem to be happening for the identified contentTopic, but in this case there weren’t any missing messages to be retrieved because we had received all the existing messages and were only missing the latest message
    • There are more logs for failing to resubscribe to certain filters too, but those may be unrelated to the specific community channel that was expecting a message

Logs from the device: ios-device-logs.zip
Logs from the simulator: ios-simulator-logs.log.zip

Unfortunately I was only able to extract the geth.log from the iOS simulator, but hopefully that's enough for now.

@seanstrom
Copy link
Member

Some follow-up notes:

  • What’s interesting is that the message does make it through to the physical iOS device, but it seems to be invalid because of received messagepush with invalid subscription parameters

    • I’m not sure how subscriptions are coordinated between the two devices, but perhaps when the physical device goes offline and reconnects, maybe the subscriptions are cleared in some way which could cause the message to be seen as an undesired message (?)
    • from previous experiments, it seems that the issue is self-corrected when the app switches to background and returns to the foreground. So perhaps there’s some other logic that will re-register subscriptions?
  • I do notice that the missing message verification does happen after the reconnection happens and the message is not delivered, but the contentTopic /waku/1/0xe88187f2/rfc26 seems to be missing from the list of topics that would be retrieved.

    DEBUG[10-17|21:11:51.613|[github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185](http://github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185)]            retrieving history to check for missing messages peerID=16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P contentTopics="[/waku/1/0x7e6732ec/rfc26 /waku/1/0x5759bbff/rfc26 /waku/1/0xacfae801/rfc26 /waku/1/0x696bdb86/rfc26 /waku/1/0xa35b9bb3/rfc26 /waku/1/0xb69e0a57/rfc26 /waku/1/0x0b65ce26/rfc26 /waku/1/0xe5eb41b8/rfc26 /waku/1/0xa9a5e604/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729192525821337000 to=1729192778950497000 attempt=1
    DEBUG[10-17|21:11:51.614|[github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/store/client.go:268](http://github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/store/client.go:268)]                   sending store request                    peer=16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P requestId=37316536333764356134346664326263646135303038373463656566373765376165353464323833313932306163306430383564303462646634316563633136
    ERROR[10-17|21:11:51.614|[github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:190](http://github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:190)]            could not query storenode                peerID=16Uiu2HAm9CQhsuwPR54q27kNj9iaQVfyRzTGKrhFmr94oD8ujU6P contentTopics="[/waku/1/0x7e6732ec/rfc26 /waku/1/0x5759bbff/rfc26 /waku/1/0xacfae801/rfc26 /waku/1/0x696bdb86/rfc26 /waku/1/0xa35b9bb3/rfc26 /waku/1/0xb69e0a57/rfc26 /waku/1/0x0b65ce26/rfc26 /waku/1/0xe5eb41b8/rfc26 /waku/1/0xa9a5e604/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729192525821337000 to=1729192778950497000 error="context canceled" attempt=1
    
    • In this case, it seems we’re unable to contact the store node for the missing messages, but it still shows that the contentTopic was not included in the request.
  • For comparison, we can see that the topic is included when the messages are retrieved. Note that these logs are from a different attempt to reproduce the glitch.

    DEBUG[10-17|23:55:24.468|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185]            retrieving missing messages              peerID=16Uiu2HAmMELCo218hncCtTvC2Dwbej3rbyHQcR8erXNnKGei7WPZ contentTopics="[/waku/1/0xa9a5e604/rfc26 /waku/1/0x5759bbff/rfc26 /waku/1/0x696bdb86/rfc26 /waku/1/0xacfae801/rfc26 /waku/1/0x7e6732ec/rfc26 /waku/1/0xe5eb41b8/rfc26 /waku/1/0x8396b7a1/rfc26 /waku/1/0x6c561c0d/rfc26 /waku/1/0xf473d4e0/rfc26 /waku/1/0xe88187f2/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729209264462294000 to=1729209324215529000 attempt=1
    DEBUG[10-17|23:55:24.468|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/store/client.go:268]                   sending store request                    peer=16Uiu2HAmMELCo218hncCtTvC2Dwbej3rbyHQcR8erXNnKGei7WPZ requestId=30393030326432323130633531306439333939663733326233373833656462323631646336396438376361333064383537323135343331666265383430663834
    

@igor-sirotin
Copy link
Contributor

igor-sirotin commented Oct 18, 2024

@seanstrom FYI, could be related: status-im/status-go#5659

@chaitanyaprem
Copy link
Contributor

chaitanyaprem commented Oct 18, 2024

Thanks a lot for detailed tests and analysis, it helped me look at specific areas in the logs.

I’m not sure how subscriptions are coordinated between the two devices, but perhaps when the physical device goes offline and reconnects, maybe the subscriptions are cleared in some way which could cause the message to be seen as an undesired message (?)

this could be a reason why invalid subscription params is noticed. while going through logs i had noticed that app lost network connectivity at 21:10:34 and immediately it got connectivity back at 21:10:50.
when connectivity is lost client removes all subscriptions and once it is back it will try to subscribe again. but server is not aware client has removed subscription and is sending for a filter it registered before network is lost. this may not happen if app is in background for a longer period of time (i.e time between disconnection and reconnection is more than 5/10 minutes). Note that this is fine and expected to happen.

INFO [10-17|21:10:34.328|github.com/status-im/status-go/api/geth_backend.go:2439] Network state change package=status-go/api.GethStatusBackend old=wifi new=offline .... INFO [10-17|21:10:50.168|github.com/status-im/status-go/api/geth_backend.go:2439] Network state change package=status-go/api.GethStatusBackend old=offline new=wifi

  • Here I noticed that we do receive a message, but it has a warning of: received messagepush with invalid subscription parameters

while going through logs, noticed that this could happen because filter subscriptions are not successful due to peers that are not reachable and taking time to stabilize. will analyze further to see if there is some other issue.

but in the meantime i have made a simple logic to remove bad peers if dial fails twice to them and included it #21458. can you try to simulate the issue with this version.

  • Missing messages do seem to be attempted to be retrieved but it doesn’t seem to be happening for the identified contentTopic, but in this case there weren’t any missing messages to be retrieved because we had received all the existing messages and were only missing the latest message

this is odd, wondering what might be the reason for this content-topic not being chosen for missing messages. maybe @richard-ramos has an idea.

@richard-ramos
Copy link
Member

@chaitanyaprem is the filter for this content topic is ephemeral? those are ignored for retrieving missing messages: https://github.com/status-im/status-go/blob/e611b1e5131fff706151661d15cc2904c20a7f71/protocol/transport/transport.go#L699C19-L699C24

@chaitanyaprem
Copy link
Contributor

@chaitanyaprem is the filter for this content topic is ephemeral? those are ignored for retrieving missing messages: https://github.com/status-im/status-go/blob/e611b1e5131fff706151661d15cc2904c20a7f71/protocol/transport/transport.go#L699C19-L699C24

ah, that might make sense...i have no idea though.
maybe @seanstrom can help

@seanstrom
Copy link
Member

It seems that a contentTopic filter is marked as Ephemeral here: https://github.com/status-im/status-go/blob/3179532b645549c103266e007694d2c81a7091b4/protocol/messenger_store_node_request_manager.go#L284

I've checked around and added some logs debug logs for a contentTopic being Ephemeral, and it doesn't seem in this case the contentTopic for the community channel is considered ephemeral.

cc @richard-ramos @chaitanyaprem

@seanstrom
Copy link
Member

I've reproduced the issue again and this the logs will focus on:

  • contentTopic is 0x479e6e55
  • missing messageID 0x850e6ac49c6f715d17fab609cbcf4faf813f51f601321256184caecade741a00

So far it seems like there are some issues with reconnecting to the peer after the network connection is changed from offline to online.

  • I’m noticing peers are not available either by dialing and receiving a backoff status or timeout status
  • This seems to affect the message verification retrieval for the contentTopic, because the tasks for retrieving the messages are being cancelled.

From the ios-device logs (the receiver device)

  • The logs show a timeout when attempting to connect to peer and subscribe

    WARN [10-21|18:14:21.601|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/peermanager/peer_manager.go:740]                connecting to peer                       peerID=16Uiu2HAmSijA4xVLfXuEhvU126psZbVzPRUhZT4BDtiophCsrcQz error="failed to dial: failed to dial 16Uiu2HAmSijA4xVLfXuEhvU126psZbVzPRUhZT4BDtiophCsrcQz: all dials failed\n  * [/ip4/156.213.241.94/tcp/46379] dial tcp4 0.0.0.0:54600->156.213.241.94:46379: i/o timeout"
    ERROR[10-21|18:14:21.601|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:428]                  Failed to subscribe                      pubSubTopic=/waku/2/rs/16/32 contentTopics="[/waku/1/0x6c042192/rfc26 /waku/1/0x479e6e55/rfc26 /waku/1/0x9e634129/rfc26]" error="failed to dial: failed to dial 16Uiu2HAmSijA4xVLfXuEhvU126psZbVzPRUhZT4BDtiophCsrcQz: all dials failed\n  * [/ip4/156.213.241.94/tcp/46379] dial tcp4 0.0.0.0:54600->156.213.241.94:46379: i/o timeout"
    
  • The logs show a backoff when attempting to connect to peer and subscribe

    WARN [10-21|18:15:57.508|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/peermanager/peer_manager.go:740]                connecting to peer                       peerID=16Uiu2HAmKXqamc4aSTkUEgFU3coCydiVCaKoDyHtsYdCUaxqKhUv error="failed to dial: failed to dial 16Uiu2HAmKXqamc4aSTkUEgFU3coCydiVCaKoDyHtsYdCUaxqKhUv: all dials failed\n  * [/ip4/79.110.55.4/tcp/53585] dial backoff"
    ERROR[10-21|18:15:57.508|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/filter/client.go:428]                  Failed to subscribe                      pubSubTopic=/waku/2/rs/16/32 contentTopics="[/waku/1/0xd1b725db/rfc26 /waku/1/0x1ef3b8eb/rfc26]" error="failed to dial: failed to dial 16Uiu2HAmKXqamc4aSTkUEgFU3coCydiVCaKoDyHtsYdCUaxqKhUv: all dials failed\n  * [/ip4/79.110.55.4/tcp/53585] dial backoff"
    
  • The logs show the message-verification store-node requests being cancelled

    DEBUG[10-21|18:17:23.501|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185]            retrieving history to check for missing messages peerID=16Uiu2HAmJnVR7ZzFaYvciPVafUXuYGLHPzSUigqAmeNw9nJUVGeM contentTopics="[/waku/1/0x0b10486b/rfc26 /waku/1/0x1ca1ac8f/rfc26]" pubsubTopic=/waku/2/rs/16/64 from=1729534036339416000 to=1729534096331553000 attempt=1
    DEBUG[10-21|18:17:23.501|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:185]            retrieving history to check for missing messages peerID=16Uiu2HAmJnVR7ZzFaYvciPVafUXuYGLHPzSUigqAmeNw9nJUVGeM contentTopics="[/waku/1/0xae688b3b/rfc26 /waku/1/0x479e6e55/rfc26 /waku/1/0xb5753763/rfc26 /waku/1/0x92e8c9ee/rfc26 /waku/1/0x55cd0a44/rfc26 /waku/1/0x277bd875/rfc26 /waku/1/0x48b1bff9/rfc26 /waku/1/0x44204c9b/rfc26 /waku/1/0x17c556f6/rfc26 /waku/1/0x1ef3b8eb/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729534036577581000 to=1729534096331662000 attempt=1
    DEBUG[10-21|18:17:23.502|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/store/client.go:268]                   sending store request                    peer=16Uiu2HAmJnVR7ZzFaYvciPVafUXuYGLHPzSUigqAmeNw9nJUVGeM requestId=65393335653535353465653761306336646635303835393032626466353138393331616634646333323633353765343430306236376161666433323565306538
    DEBUG[10-21|18:17:23.502|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/protocol/store/client.go:268]                   sending store request                    peer=16Uiu2HAmJnVR7ZzFaYvciPVafUXuYGLHPzSUigqAmeNw9nJUVGeM requestId=66363732363262383966663539346663623666366632623738333561653937616430633639333363383338633735353131366239643566316633653030386665
    ERROR[10-21|18:17:23.502|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:190]            could not query storenode                peerID=16Uiu2HAmJnVR7ZzFaYvciPVafUXuYGLHPzSUigqAmeNw9nJUVGeM contentTopics="[/waku/1/0x0b10486b/rfc26 /waku/1/0x1ca1ac8f/rfc26]" pubsubTopic=/waku/2/rs/16/64 from=1729534036339416000 to=1729534096331553000 error="context canceled" attempt=1
    ERROR[10-21|18:17:23.502|github.com/status-im/status-go/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go:190]            could not query storenode                peerID=16Uiu2HAmJnVR7ZzFaYvciPVafUXuYGLHPzSUigqAmeNw9nJUVGeM contentTopics="[/waku/1/0xae688b3b/rfc26 /waku/1/0x479e6e55/rfc26 /waku/1/0xb5753763/rfc26 /waku/1/0x92e8c9ee/rfc26 /waku/1/0x55cd0a44/rfc26 /waku/1/0x277bd875/rfc26 /waku/1/0x48b1bff9/rfc26 /waku/1/0x44204c9b/rfc26 /waku/1/0x17c556f6/rfc26 /waku/1/0x1ef3b8eb/rfc26]" pubsubTopic=/waku/2/rs/16/32 from=1729534036577581000 to=1729534096331662000 error="context canceled" attempt=1
    

ios-device-backoff.zip
ios-simulator-backoff.log.zip

@ilmotta ilmotta moved this from Backlog to In Progress in Status Desktop/Mobile Board Oct 22, 2024
@ilmotta ilmotta added this to the 2.32.0 Beta milestone Oct 22, 2024
@ilmotta ilmotta removed the high-prio label Oct 22, 2024
@chaitanyaprem
Copy link
Contributor

It seems that a contentTopic filter is marked as Ephemeral here: https://github.com/status-im/status-go/blob/3179532b645549c103266e007694d2c81a7091b4/protocol/messenger_store_node_request_manager.go#L284

I've checked around and added some logs debug logs for a contentTopic being Ephemeral, and it doesn't seem in this case the contentTopic for the community channel is considered ephemeral.

cc @richard-ramos @chaitanyaprem

oh, that is interesting.
@richard-ramos do you think setting this default to true could be a cause that some content-topics are not considered for store-node query. wondering if there is some sort of race condition during initialization that can cause the filter to be created/overridden for storeNodeRequest?

@qfrank
Copy link
Contributor

qfrank commented Dec 2, 2024

I think I found the cause although I'm working on 21452 / 21394, but they might be related, I'll create the status-go PR later @seanstrom @chaitanyaprem

@chaitanyaprem
Copy link
Contributor

I think I found the cause although I'm working on 21452 / 21394, but they might be related, I'll create the status-go PR later @seanstrom @chaitanyaprem

I don't think this fixes the missing messages as that is caused due to an issue explained here.

Also the root-cause for the issue is not peer connectivity, because store nodes are pre-configured and missing messages check will try to connect to them if we are not connected to the nodes.

@qfrank
Copy link
Contributor

qfrank commented Dec 3, 2024

what I found is not the root cause, but might be related. I tried your draft PR, it does reduced the delay for re-sending message once user is back online. but it won't fix 21452 / 21394

because store nodes are pre-configured and missing messages check will try to connect to them if we are not connected to the nodes.

what if user login without enable network and enable after login? from my debug result, it seems didn't work the way you mentioned. @chaitanyaprem

@chaitanyaprem
Copy link
Contributor

what I found is not the root cause, but might be related. I tried your draft PR, it does reduced the delay for re-sending message once user is back online. but it won't fix 21452 / 21394

hmm, interesting..do you have logs where you tested this in scenarios 21452 and 21394? would like to take a look to see what is actually happening.

what if user login without enable network and enable after login? from my debug result, it seems didn't work the way you mentioned. @chaitanyaprem

this is indeed an interesting scenario, but i am surprised that store nodes were not connected automatically.

@qfrank
Copy link
Contributor

qfrank commented Dec 3, 2024

stop only depend on logs, start to debug 🚀 status-go

@qfrank
Copy link
Contributor

qfrank commented Dec 3, 2024

stop only depend on logs, start to debug 🚀 status-go

Sorry, the link I provided is outdated, this should be the latest. @chaitanyaprem

@chaitanyaprem
Copy link
Contributor

@seanstrom could you validate my changes and see if the issue still occurs?

I would like these improvements to go in as well as they help in various other scenarios.

@seanstrom
Copy link
Member

seanstrom commented Dec 3, 2024

@qfrank thank you for these status-go changes: status-im/status-go#6153 🙏

I had some time today to test those changes, and I was still able to reproduce the issue with the missing messages not being retrieved when receiving a message after coming back online (while having been offline for over a few minutes).

Your changes could definitely solve some issues with re-connecting to peers after coming back online, though from my testing peers are re-connecting (on @chaitanyaprem branch) after a small window of time (roughly less than a minute). But based on the logs, there are some issues with store node requests failing because deadlines have exceeded.

Atm, I'm looking at this code: https://github.com/status-im/status-go/blob/18469e98e67fc82a3f5da078f6aae9e35d0d9f9e/vendor/github.com/waku-org/go-waku/waku/v2/api/missing/missing_messages.go#L190-L211

I think this might be where I would attempt to debug next, but I haven't started that process yet. I'm wondering if there's some weird errors there with how we're passing context into this function, and how we're handling retry counts and limits. Atm, it seems we associate a context to some criteriaInterest and that has a context stored with it, so I'm wondering if somehow that context is being cancelled and then we carry the cancelled context around afterwards (?).


I had collected the logs from @qfrank branch, but it seems my logs had rolled over and became inconsistent, so I'll re-collect them soon. But for now, here's my recent logs from reproducing the issue on @chaitanyaprem branch.

In @chaitanyaprem branch I was still able reproduce the issue by sending a message to device a device that has very recently come back online after being offline for over a few minutes.

In this case the missing message-id: 0x08794cfef53c97e068e3879c4d26d09d9c35a63640ab5e5446022ee507590de4
And the content topic: /waku/1/0x14667d94/rfc26

receiver-device-dec-2-2024.zip
sender-device-dec-2-2024.log.zip

@chaitanyaprem
Copy link
Contributor

In @chaitanyaprem branch I was still able reproduce the issue by sending a message to device a device that has very recently come back online after being offline for over a few minutes.

Could you probably get me logs that are not rolling over? Would like to see what is happening since you start the test.

@seanstrom
Copy link
Member

seanstrom commented Dec 3, 2024

@chaitanyaprem ah sorry about that, I re-ran the tests on my side and extracted some fresh logs. Let me know if these are better 🙏

In this case, missing-message id: 0x03723fa3a21847b04149d89e83ac719d0dbe8ae18d60940259e0e7d0455f6d85
and content topic: /waku/1/0x7141e4b8/rfc26

receiver-device-logs-v3-dec-2-2024.zip
sender-device-v3-dec-2-2024-geth.log.zip

@chaitanyaprem
Copy link
Contributor

chaitanyaprem commented Dec 3, 2024

@chaitanyaprem ah sorry about that, I re-ran the tests on my side and extracted some fresh logs. Let me know if these are better 🙏

In this case, missing-message id: 0x03723fa3a21847b04149d89e83ac719d0dbe8ae18d60940259e0e7d0455f6d85 and content topic: /waku/1/0x7141e4b8/rfc26

receiver-device-logs-v3-dec-2-2024.zip sender-device-v3-dec-2-2024-geth.log.zip

I ran the code locally with status-desktop and realized that i needed to add more logic to missing-message-verifier code for restart to make it work properly. I did that and pushed few fixes into the same PR now.

Do note that some more go-waku changes had got pulled in now but those should not affect these flows.

@seanstrom Can you please retest? I am hoping it will work now 🤞

@qfrank can you test your scenario with this new build especially missing_messages related issue? I think i had figured out the cause of why it did not work.

@qfrank
Copy link
Contributor

qfrank commented Dec 5, 2024

@qfrank can you test your scenario with this new build especially missing_messages related issue? I think i had figured out the cause of why it did not work.

If the scenario you meant was keep offline before login, and enable network after login, I just tried with your PR, it didn't fix the issues 21452 / 21394 @chaitanyaprem

you can see that messages 88-92 are gray, and the right device never receive the messages
Image

but PR fixed this kind of issues

@chaitanyaprem
Copy link
Contributor

@qfrank can you test your scenario with this new build especially missing_messages related issue? I think i had figured out the cause of why it did not work.

If the scenario you meant was keep offline before login, and enable network after login, I just tried with your PR, it didn't fix the issues 21452 / 21394 @chaitanyaprem

Can you share the logs? Odd that missing messages after coming back from offline did not work. Also hope you have tested with my latest pushed version.

@qfrank
Copy link
Contributor

qfrank commented Dec 5, 2024

logs:
device1-log.zip
device2-log.zip

@qfrank
Copy link
Contributor

qfrank commented Dec 5, 2024

I think we can use PR as a quick fix if we get good news from mobile QA? @chaitanyaprem

@qfrank
Copy link
Contributor

qfrank commented Dec 5, 2024

Also hope you have tested with my latest pushed version.

Yup, I checked out your branch test/filter-bad-peers and latest commit should be 93da02b02ec6bbbfafb68c8b35ad1d4feab9edfa, correct me if I was wrong

@qfrank
Copy link
Contributor

qfrank commented Dec 5, 2024

I had some time today to test those changes, and I was still able to reproduce the issue with the missing messages not being retrieved > when receiving a message after coming back online (while having been offline for over a few minutes).

Your changes could definitely solve some issues with re-connecting to peers after coming back online, though from my testing peers > are re-connecting (on @chaitanyaprem branch) after a small window of time (roughly less than a minute). But based on the logs, there > are some issues with store node requests failing because deadlines have exceeded.

Emm.. I guess maybe it's because checkForConnectionChanges changed network status to offline later? 🤔

@chaitanyaprem
Copy link
Contributor

Yup, I checked out your branch test/filter-bad-peers and latest commit should be 93da02b02ec6bbbfafb68c8b35ad1d4feab9edfa, correct me if I was wrong

if you are talking about status-go latest commit in my branch is status-im/status-go@9d4d3d2. So maybe you used an older commit?

@qfrank
Copy link
Contributor

qfrank commented Dec 5, 2024

Yup, I checked out your branch test/filter-bad-peers and latest commit should be 93da02b02ec6bbbfafb68c8b35ad1d4feab9edfa, correct me if I was wrong

if you are talking about status-go latest commit in my branch is status-im/status-go@9d4d3d2. So maybe you used an older commit?

Oh, sorry, I checked out your branch and then did a rebase to include one of my commit which fixed one issue of backend server, so the latest commit was not the same with 9d4d3d2f6985a6b306afed94af0e04db904861d7 , but I'm sure I tested with your latest pushed version

@chaitanyaprem
Copy link
Contributor

logs: device1-log.zip device2-log.zip

will take a look at them..can you also help by indicating the messageID of the message that was not received?

@qfrank
Copy link
Contributor

qfrank commented Dec 5, 2024

logs: device1-log.zip device2-log.zip

will take a look at them..can you also help by indicating the messageID of the message that was not received?

for message `90': 0x0a1b88182c29157a4b56c1f2e7d4b40daf1eafc07e11da55b2cab3220005cf8f
FYI, now(after sharing the logs) it's received because I tried re-login with network enabled later, I hope this won't affect you investigate the issue

ilmotta pushed a commit to status-im/status-go that referenced this issue Dec 6, 2024
@seanstrom
Copy link
Member

Okie dokie, I've re-ran the tests using @chaitanyaprem latest changes (pulled the branch today), and those changes were also rebased on top of the latest develop since we've recently merged status-im/status-go#6153

During my testing, I noticed that missing-messages retrieval would usually work, and instant messaging between the clients would become re-connected in less than a minute, and missing messages would be fetched in roughly a minute. I would say that the system seems much more robust than where we started.

However, I was still able to reproduce the issue with missing messages not being retrieved after toggling the network connection multiple times on the receiver device. I had to try multiple times to get this result, so it feels harder to reproduce but still possible.


That being said here's the logs from my tests and some useful bits of info for debugging:

contentTopic: /waku/1/0xd1a374d3/rfc26
missing message ID: 0x0231e800a83d585b97f13cf3c242588c25621ee8df8bc08c91fdea949c96a843

Snippet from the receiver logs:

2024-12-06T18:40:29.765Z	DEBUG	missing-msg-verifier	missing/missing_messages.go:213	retrieving history to check for missing messages	{"peerID": "16Uiu2HAm2M7xs7cLPc3jamawkEqbr7cUJX11uvY7LxQ6WFUdUKUT", "contentTopics": ["/waku/1/0xd1a374d3/rfc26", "/waku/1/0x420f406d/rfc26", "/waku/1/0xb6eaa496/rfc26", "/waku/1/0x31a95896/rfc26", "/waku/1/0xc89a98dc/rfc26", "/waku/1/0xd798f186/rfc26", "/waku/1/0x187929cc/rfc26", "/waku/1/0xadf10e4a/rfc26", "/waku/1/0xf171f48a/rfc26", "/waku/1/0xf187ec55/rfc26"], "pubsubTopic": "/waku/2/rs/16/32", "from": "1733509695059170000", "to": "1733509742504014000", "attempt": 1}
2024-12-06T18:40:29.766Z	DEBUG	node2.store-client	store/client.go:297	sending store request	{"peer": "16Uiu2HAm2M7xs7cLPc3jamawkEqbr7cUJX11uvY7LxQ6WFUdUKUT", "requestId": "845d549a5d6bf7ceb588e71baa9d74626b0ce0fb9e50f2bb701d3ab753424655"}
2024-12-06T18:40:29.766Z	ERROR	missing-msg-verifier	missing/missing_messages.go:218	could not query storenode	{"peerID": "16Uiu2HAm2M7xs7cLPc3jamawkEqbr7cUJX11uvY7LxQ6WFUdUKUT", "contentTopics": ["/waku/1/0xd1a374d3/rfc26", "/waku/1/0x420f406d/rfc26", "/waku/1/0xb6eaa496/rfc26", "/waku/1/0x31a95896/rfc26", "/waku/1/0xc89a98dc/rfc26", "/waku/1/0xd798f186/rfc26", "/waku/1/0x187929cc/rfc26", "/waku/1/0xadf10e4a/rfc26", "/waku/1/0xf171f48a/rfc26", "/waku/1/0xf187ec55/rfc26"], "pubsubTopic": "/waku/2/rs/16/32", "from": "1733509695059170000", "to": "1733509742504014000", "error": "context canceled", "attempt": 1}

receiver-logs-dec-6-2024.zip
sender-dec-6-2024-geth.log.zip

@seanstrom
Copy link
Member

On thing I noticed in the sender logs is that it's having issues with make store node requests too:

2024-12-06T18:36:14.940Z	DEBUG	missing-msg-verifier	missing/missing_messages.go:213	retrieving history to check for missing messages	{"peerID": "16Uiu2HAmAUdrQ3uwzuE4Gy4D56hX6uLKEeerJAnhKEHZ3DxF1EfT", "contentTopics": ["/waku/1/0x0345a65d/rfc26", "/waku/1/0xd798f186/rfc26", "/waku/1/0x6fc8d937/rfc26"], "pubsubTopic": "/waku/2/rs/16/32", "from": "1733508444803296000", "to": "1733508602370858000", "attempt": 1}
2024-12-06T18:36:14.940Z	DEBUG	node2.store-client	store/client.go:297	sending store request	{"peer": "16Uiu2HAmAUdrQ3uwzuE4Gy4D56hX6uLKEeerJAnhKEHZ3DxF1EfT", "requestId": "f9aa9d069b1208e153fdf72675949de7428d17c23577e6cb3ce76e581dce90a9"}
2024-12-06T18:36:14.940Z	ERROR	missing-msg-verifier	missing/missing_messages.go:218	could not query storenode	{"peerID": "16Uiu2HAmAUdrQ3uwzuE4Gy4D56hX6uLKEeerJAnhKEHZ3DxF1EfT", "contentTopics": ["/waku/1/0x0345a65d/rfc26", "/waku/1/0xd798f186/rfc26", "/waku/1/0x6fc8d937/rfc26"], "pubsubTopic": "/waku/2/rs/16/32", "from": "1733508444803296000", "to": "1733508602370858000", "error": "context canceled", "attempt": 1}

@chaitanyaprem
Copy link
Contributor

During my testing, I noticed that missing-messages retrieval would usually work, and instant messaging between the clients would become re-connected in less than a minute, and missing messages would be fetched in roughly a minute. I would say that the system seems much more robust than where we started.

That is really good to hear.

However, I was still able to reproduce the issue with missing messages not being retrieved after toggling the network connection multiple times on the receiver device. I had to try multiple times to get this result, so it feels harder to reproduce but still possible.

Went through the logs and noticed that missing_msg_verifier is falling behind in query due to context is cancelled.
Can you share the code which you had run the test with (as you said you had to rebase)? Looks like something is messed up and causing context to get cancelled frequently. Would like to review the code once after rebase(as both PRs work on online/offline status, something seems wrong).

@chaitanyaprem
Copy link
Contributor

chaitanyaprem commented Dec 9, 2024

On thing I noticed in the sender logs is that it's having issues with make store node requests too:

@seanstrom i added some logs to help identify why store query context is getting cancelled. I had also rebased with latest develop branch in status-go.

can you rerun the test with updated PR and share the logs?

@chaitanyaprem
Copy link
Contributor

@seanstrom i found another issue and pushed a fix in status-go which was causing some of the logs which i had noticed before.

Can you please retest with latest pushes and share the results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

9 participants