Improve handling of Google App Engine connection reset events. #620

crwilcox · 2021-03-29T16:57:46Z

The GAE standard environment can have transient connection reset events. When using redis-py, it is typically recommended that the user wrap the redis-py client with something that can handle and retry on connection resets. It appears that a redis-py instance can be passed to Cloud NDB but this feels like an experience that could be improved?

On the whole, connection resets are somewhat a fact of life in GAE standard net stack, and users of redis-py connection pools (NDB library looks like one of those "users") frequently grapple with this issue. The ask is whether we might make NDB more resilient (can NDB retry?) in the face of the inevitable resets.

chrisrossi · 2021-03-29T17:57:46Z

Possibly this has already been addressed and we just need a release. Or do the specific errors you're discussing not get caught by this?

crwilcox · 2021-03-30T16:39:59Z

The customer who let me know about this states this does look like it may be the cause and solution.

Glad to do a release. Do you think there are any upcoming changes worth waiting on?

crwilcox · 2021-03-30T19:48:43Z

@chrisrossi a thought came up in conversation: What happens if the default pool size is overridden to be more than 5?

The concern is that, 5 retries is hardcoded, but if a reset happened and there were 5 or more connections, we would exhaust retries before creating a new connection. Is this a concern? If so maybe NDB could calculate # of retries to always exceed the pool? Maybe pool size + 5?

Example:

4 connections exist in the pool
stream loss event occurs, and all connections need to be reset on an instance
first connection will reset, select next existing connection from pool 3 times
this leaves 1 retry

justinkwaugh · 2021-04-22T19:43:52Z

I notice with the latest release (v1.8.0) running in an instance on app engine standard that has no load other than me doing light testing that I still get cache flushes regularly because of connection reset events. Definitely edging toward a showstopper for us migrating to python 3.

The most common transient error with memcache is `ConnectionResetError`, which wasn't included in exceptions to retry. Now all connection errors are retried. Fixes googleapis#620

The most common transient error with memcache is `ConnectionResetError`, which wasn't included in exceptions to retry. Now all connection errors are retried. Fixes #620

product-auto-label bot added the api: datastore Issues related to the googleapis/python-ndb API. label Mar 29, 2021

crwilcox added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed api: datastore Issues related to the googleapis/python-ndb API. labels Mar 29, 2021

product-auto-label bot added the api: datastore Issues related to the googleapis/python-ndb API. label Apr 7, 2021

chrisrossi mentioned this issue May 7, 2021

fix: retry connection errors with memcache #645

Merged

chrisrossi closed this as completed in #645 May 10, 2021

chrisrossi pushed a commit that referenced this issue May 10, 2021

fix: retry connection errors with memcache (#645)

06b466a

The most common transient error with memcache is `ConnectionResetError`, which wasn't included in exceptions to retry. Now all connection errors are retried. Fixes #620

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve handling of Google App Engine connection reset events. #620

Improve handling of Google App Engine connection reset events. #620

crwilcox commented Mar 29, 2021

chrisrossi commented Mar 29, 2021

crwilcox commented Mar 30, 2021

crwilcox commented Mar 30, 2021 •

edited

Loading

justinkwaugh commented Apr 22, 2021

Improve handling of Google App Engine connection reset events. #620

Improve handling of Google App Engine connection reset events. #620

Comments

crwilcox commented Mar 29, 2021

chrisrossi commented Mar 29, 2021

crwilcox commented Mar 30, 2021

crwilcox commented Mar 30, 2021 • edited Loading

justinkwaugh commented Apr 22, 2021

crwilcox commented Mar 30, 2021 •

edited

Loading