Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database actions should be tolerant of cache (redis or memcache) connection failures #557

Closed
andrewsg opened this issue Oct 8, 2020 · 2 comments · Fixed by #560
Closed
Assignees
Labels
api: datastore Issues related to the googleapis/python-ndb API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@andrewsg
Copy link
Contributor

andrewsg commented Oct 8, 2020

To make database calls more robust, we should be tolerant of cache connection issues and issue a warning instead of a fatal error; a flag on the cache configuration object could toggle this behavior for users who desire the strict behavior.

@product-auto-label product-auto-label bot added the api: datastore Issues related to the googleapis/python-ndb API. label Oct 8, 2020
@andrewsg
Copy link
Contributor Author

andrewsg commented Oct 8, 2020

Some things to think about before implementing: Are there cache invalidation implications for fault tolerance here? Is there any way to be totally safe about cache invalidation in the event the database call succeeds but the cache call fails? (For that matter, are we 100% robust to cache invalidation issues even with the current behavior, given the error in some cases will be raised after the database call is complete?)

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Oct 9, 2020
@chrisrossi chrisrossi self-assigned this Oct 9, 2020
@chrisrossi
Copy link
Contributor

@andrewsg Yeah, invalidation can be a problem with writes. If a write succeeds in Datastore but then fails in the cache, you can have other clients pulling stale data from the cache. We might need different policies for read/write.

Here's some out loud thinking.

Reads are easy. A failure during a read can be translated to a cache miss without any real problems. Easy case. A default to warn and move on is reasonable

For writes, we could swap the order around so we attempt cache invalidation before writing to Datastore. If the write to Datastore fails, this isn't a problem, clients will just get the stored value from Datastore and repopluate the cache. If the write to the cache fails, we have an opportunity to decide 1) to abort the write, so that the cache state will reflect the database state, or 2) write to Datastore anyway, if clients retrieving stale data is tolerable (depends on the application).

There really is an unavoidable trade off here: data integrity or application stability?

Let's say we go with 2). Adding retry functionality for the cache invalidation can help us avoid the the problem altogether if the problem is transient, so we should probably do that. If we exhaust retries, then there is probably a more sustained outage occurring that is also affecting reads, which is good news, because it means clients aren't getting stale data from the cache. If we take the extra step to set a flag on the Cache when there is a connection error that tells it to the clear the cache next time it tries anything, we should be able to flush any stale data fairly quickly after the cache comes back online. This can't guarantee non-stale data for every request, but it can make instances of stale data rare and short lived.

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Oct 13, 2020
@chrisrossi chrisrossi added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed 🚨 🚨 This issue needs some love. triage me I really want to be triaged. labels Oct 14, 2020
chrisrossi pushed a commit to chrisrossi/python-ndb that referenced this issue Oct 15, 2020
chrisrossi pushed a commit to chrisrossi/python-ndb that referenced this issue Oct 18, 2020
andrewsg pushed a commit that referenced this issue Oct 22, 2020
* feat: fault tolerance for global caches

Closes #557

* Fix spelling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/python-ndb API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants