Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure auto-healing mechanism for the rethinkdb connection #726

Closed
rkatuzhanets opened this issue Apr 9, 2021 · 10 comments
Closed

Configure auto-healing mechanism for the rethinkdb connection #726

rkatuzhanets opened this issue Apr 9, 2021 · 10 comments

Comments

@rkatuzhanets
Copy link

The following feature\functionality could be potentially useful when we have some network issues, for instance when network is down, rethinkdb becomes unavailable and it could be restored back only via manual interactions with required services (e.g. STF).

@vdelendik vdelendik added the bug label Nov 19, 2021
@vdelendik vdelendik modified the milestones: 2.0, 2.1 Dec 22, 2021
@vdelendik
Copy link
Contributor

related tickets are zebrunner/mcloud-ios#54 and zebrunner/mcloud-ios#78

@vdelendik vdelendik modified the milestones: 2.1, 2.2 Mar 7, 2022
@vdelendik vdelendik modified the milestones: 2.2, 2.3 Apr 3, 2022
@vdelendik vdelendik modified the milestones: 2.3, 2.4 Aug 19, 2022
@vdelendik vdelendik modified the milestones: 2.4, 2.5 Dec 12, 2022
@vdelendik
Copy link
Contributor

let's try to integrate recover kickstart approach when connection to rethinkdb is broken: zebrunner/mcloud-ios#151

under the question any kind of limits in recovery attempts...

@vdelendik vdelendik modified the milestones: 2.5, 2.4.4 Mar 6, 2023
@vdelendik
Copy link
Contributor

@dhreben, please retest. I hope our existing wda/stf healthcheck could solve it... Feel free to reopen if not so we review in details exact place where stf is srashed and where recovery kickstart should be added

@dhreben
Copy link
Contributor

dhreben commented Mar 13, 2023

Still repro
502 error after docker stop rethinkdb

Logs iOS device:

33755875,"x":355.77527832984924,"y":400.8180618286133},{"type":"pointerUp"}]}]},"json":true}
2023-03-15T11:35:49.875Z WRN/db 26 [d6af......9b9b6ebb11] Connection closed
2023-03-15T11:35:49.877Z INF/db 26 [d6a........b6ebb11] Connecting to rethinkdb:28015
2023-03-15T11:35:49.904Z INF/db 26 [d6af................9b9b6ebb11] Unable to connect to rethinkdb:28015
2023-03-15T11:35:49.936Z FTL/db 26 [d6afc6.................479b9b6ebb11] No hosts left to try
2023-03-15T11:35:49.936Z FTL/util:lifecycle 26 [d6afc6...............9b6ebb11] Shutting down due to fatal error with optional error :  undefined

@dhreben dhreben reopened this Mar 13, 2023
@vdelendik vdelendik modified the milestones: 2.4.4, 2.4.5 Mar 17, 2023
@vdelendik vdelendik modified the milestones: 2.4.5, 2.4.6 Apr 14, 2023
@vdelendik
Copy link
Contributor

moved to 2.4.6 as default recovery didn't resolve the problem

@vdelendik vdelendik transferred this issue from zebrunner/mcloud-ios Apr 14, 2023
@vdelendik vdelendik transferred this issue from zebrunner/mcloud-ios Apr 14, 2023
@vdelendik vdelendik transferred this issue from zebrunner/mcloud-ios Apr 14, 2023
@vdelendik
Copy link
Contributor

vdelendik commented Apr 14, 2023

@ignacionar, please take a look onto this one as well.

2023-04-14T15:58:15.429Z WRN/db 85985 [D77F3CD0-A614-4699-940A-2A7D00AFF164] Connection closed
2023-04-14T15:58:15.433Z INF/db 85985 [D77F3CD0-A614-4699-940A-2A7D00AFF164] Connecting to demo.zebrunner.farm:28015
2023-04-14T15:58:15.440Z INF/db 85985 [D77F3CD0-A614-4699-940A-2A7D00AFF164] Unable to connect to demo.zebrunner.farm:28015
2023-04-14T15:58:15.441Z FTL/db 85985 [D77F3CD0-A614-4699-940A-2A7D00AFF164] No hosts left to try
2023-04-14T15:58:15.442Z FTL/util:lifecycle 85985 [D77F3CD0-A614-4699-940A-2A7D00AFF164] Shutting down due to fatal error with optional error :  undefined
2023-04-14 18:58:15.453 WebDriverAgentRunner-Runner[83623:13538483] Disconnected a client from screenshots broadcast

We have to improve stf doing explicit exit on this failure. it should activate recovery as for linux so for mac.
I suppose we need recovery function like this one https://github.com/zebrunner/stf/blob/develop/lib/units/ios-device/plugins/wda/WdaClient.js#L585

@ignacionar
Copy link
Collaborator

ignacionar commented Apr 18, 2023

Done: #727, when disconnecting from the db it should try to connect every 5 seconds.

@vdelendik vdelendik removed this from the 2.4.6 milestone Apr 24, 2023
@vdelendik vdelendik added this to the 2.5 milestone Apr 24, 2023
vdelendik added a commit that referenced this issue Aug 4, 2023
@vdelendik vdelendik modified the milestones: 2.6, 2.5 Aug 9, 2023
@dhreben
Copy link
Contributor

dhreben commented Aug 11, 2023

Reopened, still repro
Steps:
docker stop rethinkdb

log:

023-08-11T12:03:05.016Z INF/db 25 [00008101-000848222187001E] Retrying connection in 5 seconds...
2023-08-11T12:03:10.017Z INF/db 25 [00008101-000848222187001E] Connecting to rethinkdb:28015
2023-08-11T12:03:10.021Z INF/db 25 [00008101-000848222187001E] Unable to connect to rethinkdb:28015
2023-08-11T12:03:10.022Z ERR/db 25 [00008101-000848222187001E] Error: No hosts left to try
    at next (/opt/lib/db/index.js:29:15)
    at /opt/lib/db/index.js:45:18
From previous event:
    at next (/opt/lib/db/index.js:40:15)
    at /opt/lib/db/index.js:50:12
From previous event:
    at connect (/opt/lib/db/index.js:24:4)
    at processImmediate (node:internal/timers:466:21)
2023-08-11T12:03:10.023Z INF/db 25 [00008101-000848222187001E] Retrying connection in 5 seconds...
2023-08-11T12:03:15.029Z INF/db 25 [00008101-000848222187001E] Connecting to rethinkdb:28015
2023-08-11T12:03:15.037Z INF/db 25 [00008101-000848222187001E] Unable to connect to rethinkdb:28015, Exiting...
2023-08-11T12:03:15.039Z FTL/db 25 [00008101-000848222187001E] Cannot read properties of undefined (reading 'on')
2023-08-11T12:03:15.039Z FTL/util:lifecycle 25 [00008101-000848222187001E] Shutting down due to fatal error with optional error :  undefined
Exit status: 1

@dhreben dhreben reopened this Aug 11, 2023
@vdelendik
Copy link
Contributor

@dhreben - please tests with rethinkdb restart. there is a limitation in retries so long pause will crash as expected...

@dhreben
Copy link
Contributor

dhreben commented Aug 11, 2023

Verified.
Steps:
docker restart rethinkdb
rethinkdb restarted and available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants