Retry batchGet RESOURCE_EXHAUSTED #1387

IchordeDionysos · 2021-01-04T15:23:22Z

I've experienced the RESOURCE_EXHAUSTED error recently a lot. So, I wanted to dig into the error and understand why occurs and how it could be fixed.

I've found this answer from a Google Cloud engineer suggesting that this is a quota error, however, I did not find any quota that is exhausted...

To verify that this is indeed a temporary issue (as this kept occurring for the same read document), I've validated that I could read the document outside of the script where it occurs.

Under that assumption, it is a temporary issue and a quota issue. I thought why doesn't it get retried?

I've also found this issue: #909
Where @schmidt-sebastian confirms that some issues do get retried, but not RESOURCE_EXHAUSTED is there a reason why this error does not get retried?

From another issue for Node PubSub, it suggests that usually also a RESOURCE_EXHAUSTED error will be retried (at least by other libraries).
googleapis/google-cloud-node#2661 (comment)

So I guess we should also retry on RESOURCE_EXHAUSTED errors to make them occur less!

Environment details

OS: Compute Engine image (debian-9-stretch-v20200902)
Node.js version: 12.20.0
npm version: 6.14.8
@google-cloud/firestore version: 4.7.1

The text was updated successfully, but these errors were encountered:

IchordeDionysos · 2021-01-04T15:23:53Z

@schmidt-sebastian what are your thoughts on this?

schmidt-sebastian · 2021-01-14T19:40:52Z

We could retry on Resource Exhausted, but it is a little more complicated then just changing a setting. We currently rely on a shared retry configuration for all of our retries (https://github.com/googleapis/nodejs-firestore/blob/master/dev/src/v1/firestore_client_config.json). This retry configuration configures all error codes that we deem retryable, as well as the backoff between attempts. In this configuration, it is currently not possible to use a longer retry interval for Resource Exhausted, and using the existing retry intervals would further starve your resources.

If we do address this, it would mean that we have to add custom backoff handling for this error code. I am not opposed to this, but I don't know when we will have the bandwidth.

Thanks for filing this issue!

IchordeDionysos · 2021-01-15T00:04:43Z

@schmidt-sebastian is in the config file already such a thing to consider various retry configs based on the error code/group?

Because there would be the option to add additional retry configurations other than default.

{
  "retry_params": {
    "default": {
      "initial_retry_delay_millis": 100,
      "retry_delay_multiplier": 1.3,
      "max_retry_delay_millis": 60000,
      "initial_rpc_timeout_millis": 60000,
      "rpc_timeout_multiplier": 1,
      "max_rpc_timeout_millis": 60000,
      "total_timeout_millis": 600000
    }
  }
}

So I suppose if one would tackle this issue, they would create a new retry config there. That's straightforward.

But it looks like there wouldn't be a straight forward option to tailor the retry config based on the error code, right now.
Or am I missing something?

I am asking to understand what would be amount of work that would go into this.

But from looking at how the retry config works is probably a bigger issue as this would involve changing the service definitions in gax right?

schmidt-sebastian · 2021-01-15T00:10:04Z

We currently don't have the ability to use different retry settings based on error codes. We can use a different retry configuration for different methods, but unfortunately, that is not enough here. Any change here would need to plumbed through to GAX in all of our supported languages. If we just wanted to address this for Firestore and just in Node, we could add special handling for Resource Exhausted in this repo.

yoshi-automation added the triage me I really want to be triaged. label Jan 5, 2021

product-auto-label bot added the api: firestore Issues related to the googleapis/nodejs-firestore API. label Jan 5, 2021

yoshi-automation added the 🚨 This issue needs some love. label Jan 9, 2021

crwilcox assigned schmidt-sebastian Jan 11, 2021

dmahugh added priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. labels Jan 14, 2021

yoshi-automation removed triage me I really want to be triaged. 🚨 This issue needs some love. labels Jan 14, 2021

Sembiance mentioned this issue Jan 19, 2021

Error: 13 INTERNAL: Received RST_STREAM with code 2 googleapis/nodejs-vision#785

Closed

schmidt-sebastian mentioned this issue Jun 24, 2021

feat: retry BatchGetDocuments RPCs that fail with errors #1544

Merged

schmidt-sebastian closed this as completed in #1544 Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry batchGet RESOURCE_EXHAUSTED #1387

Retry batchGet RESOURCE_EXHAUSTED #1387

IchordeDionysos commented Jan 4, 2021

IchordeDionysos commented Jan 4, 2021

schmidt-sebastian commented Jan 14, 2021

IchordeDionysos commented Jan 15, 2021 •

edited

Loading

schmidt-sebastian commented Jan 15, 2021

Retry batchGet RESOURCE_EXHAUSTED #1387

Retry batchGet RESOURCE_EXHAUSTED #1387

Comments

IchordeDionysos commented Jan 4, 2021

Environment details

IchordeDionysos commented Jan 4, 2021

schmidt-sebastian commented Jan 14, 2021

IchordeDionysos commented Jan 15, 2021 • edited Loading

schmidt-sebastian commented Jan 15, 2021

IchordeDionysos commented Jan 15, 2021 •

edited

Loading