This repository has been archived by the owner on May 19, 2020. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previous to this PR, we witnessed a failed redis ping but after redis
recovered, the health check /ping never corrected itself.
Incident: https://alerts.newrelic.com/accounts/907948/incidents/6561272
Test Case added: To prove this out, I initially added a test case where
the redis instance came back online and expected the health check to
recover. However it did not.
This is because previously, the /ping endpoint the redis client instance
for the ping was using the 1) default config and was a single Go obj.
The single Go obj did not know how to reconnect after a failed
connection.
Fix: this was fixed by using the same redis pool as the session store.
The default config did not use any time outs so any requests to redis
would wait forever.
Fix: Added timeouts for the read, write and connection attempts.
Also, during the debug, went back through the logs to see what was the exact error. Added more debug statements upon failure to make it easier when reviewing logs. https://github.com/18F/cg-dashboard/pull/1109/files#diff-661525df772e761479305d75690a20adR192