Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] Membership service enable retry db connection #37

Merged
merged 1 commit into from
Mar 15, 2023

Conversation

MfCrizz
Copy link
Contributor

@MfCrizz MfCrizz commented Mar 10, 2023

When using ArgoCD for deployment, "wait:true" to wait for the database
in the skaffold.yaml, is not applied, instead all services are created
in parallel. The membership-service tries to connect once to the not yet ready db,
and then waits, doing nothing. To resolve this the option EnableRetryOnFailure
was added. We try to connect 10 times, with increasing delay each time. (e.g. first 10ms, second 150ms... up to 60s). If the connection cant be established at the 10th try, we get to the try-catch. Then we stop the program, and the pod restarts.

@MfCrizz MfCrizz requested a review from a team as a code owner March 10, 2023 10:03
@MfCrizz MfCrizz changed the title Bugfix/membership service retry db connection [Bugfix] Membership service enable retry db connection Mar 10, 2023
@MfCrizz MfCrizz force-pushed the bugfix/membership-service-retry-db-connection branch from 753a269 to 40d4f25 Compare March 10, 2023 10:09
@AlejandroCamba
Copy link
Contributor

If you have the time would you update your commits to use the propper tag? in this case it would be 🐛, we internally use Gitmoji sorry that this is not documented, still early days of unguard opensource 😄

Copy link
Contributor

@W3D3 W3D3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proxy service Dockerfile changes should not be part of this PR.

Comment on lines +52 to +57
connectString,
new MySqlServerVersion(new Version(10, 5, 8)),
options => options.EnableRetryOnFailure(
maxRetryCount: 10,
maxRetryDelay: System.TimeSpan.FromSeconds(60),
errorNumbersToAdd: null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[QUESTION] Don't fully understand, we don't execute this code, do we? or does this throws only after 10 minutes (after retrying once every minute 10 times) and only THEN we get into the catch block and restart the pod by exiting the process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to connect 10 times, with increasing delay each time. (e.g. first 10ms, second 150ms... up to 60s). If the connection cant be established at the 10th try, we get to the try-catch. Then we stop the program, and the pod restarts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally with ArgoCD the connection was established at around the 5th try. I am using such a high retryCount and retryDelay to prevent restarting the pod to frequently.

@@ -49,6 +49,8 @@ private static void CreateDbIfNotExists(IHost host)
{
var logger = services.GetRequiredService<ILogger<Program>>();
logger.LogError(ex, "An error occurred creating the DB.");
// Restart the pod until success
System.Environment.Exit(1);
Copy link
Contributor

@AlejandroCamba AlejandroCamba Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[QUESTION] does this executes after trying to connect 10 times (10 minutes) (see comment above)? or on each failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It executes after trying 10m times. But not after 10 minutes, as the delay increases each retry a little until it reaches maxRetryDelay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the comment about the pod being restarted until success, we shouldn't refer to how the app is being run here.

@MfCrizz MfCrizz force-pushed the bugfix/membership-service-retry-db-connection branch 2 times, most recently from 1b3afe5 to 4419fd2 Compare March 13, 2023 14:50
@MfCrizz MfCrizz requested a review from W3D3 March 15, 2023 08:39
When using ArgoCD for deployment, "wait:true" to wait for the database
in the skaffold.yaml, is not applied, instead all services are created
in parallel. The membership-service tries to connect once to the not yet ready db,
and then waits, doing nothing. To mitigate this the option EnableRetryOnFailure
was added. Now the service tries in a timespan of 60 seconds to connect ten times,
and if it still fails exists the program, causing a recreation of the pod.
@MfCrizz MfCrizz force-pushed the bugfix/membership-service-retry-db-connection branch from 0d33363 to 08922f3 Compare March 15, 2023 10:41
@MfCrizz MfCrizz merged commit 4b263d5 into main Mar 15, 2023
@MfCrizz MfCrizz deleted the bugfix/membership-service-retry-db-connection branch March 15, 2023 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants