-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent DefaultCredentialsError on GCE #211
Comments
Also from @dmho418 (may be unrelated to this issue, but is probably the root cause): ...
File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit
retry_strategy=self._retry_strategy)
File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request
func, RequestsMixin._get_status_code, retry_strategy)
File "/usr/local/lib/python2.7/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry
response = func()
File "/usr/local/lib/python2.7/dist-packages/google/auth/transport/requests.py", line 176, in request
self._auth_request, method, url, request_headers)
File "/usr/local/lib/python2.7/dist-packages/google/auth/credentials.py", line 121, in before_request
self.refresh(request)
File "/usr/local/lib/python2.7/dist-packages/google/auth/compute_engine/credentials.py", line 93, in refresh
raise exceptions.RefreshError(exc)
RuntimeError: RefreshError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true |
@jonparrott Would there be a way to access the number of retries that were exceeded? |
It'll be the requests / urllib3 default |
Default retry strategy for the >>> import requests
>>> session = requests.Session()
>>> adapter1, adapter2 = session.adapters.values()
>>> adapter1.max_retries
Retry(total=0, connect=None, read=False, redirect=None, status=None)
>>> adapter2.max_retries
Retry(total=0, connect=None, read=False, redirect=None, status=None) From the docstring:
|
I'm trying to reproduce on GCE with:
and am not having any luck. If I had to guess, I'd say the only way to reproduce would be to stress out the instance (e.g. high CPU usage) so that the process handling the metadata server fails. |
We see this issue in our setup as well. Python script connecting to BQ and running hundreds of queries in rapid fire sequence. We'll occasionally see: followed by a crash: |
Awesome data @antoineazar! Thanks for the confirmation. Any code you could share so we could try to reproduce / strategize? |
@dhermes any code that runs multiple (dozens usually suffice) queries on BQ in rapid sequence. Some simple sample code (pre-0.28 library, didn't test with 0.28), you can wrap this in a loop: client = bigquery.Client()
query_job = client.run_async_query(str(uuid.uuid4()), query)
# Use standard SQL syntax.
query_job.use_legacy_sql = False
# Set a destination table.
dest_dataset = client.dataset(dest_dataset_id)
dest_table = dest_dataset.table(dest_table_id)
query_job.destination = dest_table
# Allow the results table to be overwritten.
query_job.write_disposition = 'WRITE_TRUNCATE'
query_job.begin()
query_job.result() # Wait for query to finish. |
I'm seeing this issue in production. Is there a workaround/fix? |
@GEverding the recommendation right now is to use a service account keyfile instead of relying on the GCE metadata service. It's possible that we could make retry failed connections to the metadata service, but I'm unsure on that at the moment. |
Thanks. Is that documented somewhere?
On Thu, Mar 1, 2018 at 12:25 PM Jon Wayne Parrott ***@***.***> wrote:
@GEverding <https://github.com/geverding> the recommendation right now is
to use a service account keyfile instead of relying on the GCE metadata
service.
It's possible that we could make retry failed connections to the metadata
service, but I'm unsure on that at the moment.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#211 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUFzCiFYhfaVHVD87TTf7cDedV8XLULks5taC6ogaJpZM4QYcXk>
.
--
Garrett
|
I'm getting "[Errno 111] Connection refused" triggering the same exceptions @dhermes mentioned fairly regularly on my appengine flex deployment. Is the best solution getting a keyfile into my flex deployment? I'm going to try passing in a requests adapter that has retry logic configured, but it's frustrating I've already spent this much time on such a fundamental feature of this library. |
@vanpelt yep, that is a completely acceptable approach. |
I'm also seeing these errors. Stack trace below:
I'm running a very, very simple App Engine app in the Flexible Python 2.7 environment. It really doesn't do anything more than the sample found here, using the exact code: https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/appengine/flexible/pubsub I ingest post data within a flask app, then publish it to a topic using the pubsub_v1 client. No other errors in the logs before or after, and the app/method successfully processes other requests within seconds before the failed attempt, and within a minute after. This is a little frustrating/concerning since I'm basically seeing the issue using the provided GAE sample code. I'm updating my app and pushing a service account secrets json with my code, and will see what happens if I manually create the credentials for the client instead, but would love to see this resolved. |
Still seeing this today. Any workaround if storing the credentials file on the server is not possible (riskier than default service account credentials) ? |
Is this something that I'll need to push up a keyfile with my flex deployment (as @vanpelt mentioned), or will it be fixed in the PR that was just linked? |
I would love to see a fix for this issue |
Is this still open 1 year later? |
@JustinBeckwith @theacodes I see #323 was merged, should this issue be fixed now? A lot of people will likely land here when they run into this issue, you should confirm the fix here and, if it's really a fix, tell people to upgrade ( |
@mike-seekwell Thanks for the call out! #323 merged a fix to retry the ping to the metadata server. If you're seeing this error, please upgrade to version 1.6.3 or greater. |
I'm still seeing this error fairly regularly running on GAE flex for Python 3.6 with
|
Greetings! Would you mind opening a new issue? It makes tracking these discussions much easier. |
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping(). This one is adding same behaviour & tests for .get() method, as the problem still occurres
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping(). This one is adding same behaviour & tests for .get() method, as the problem still occurres
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping(). This one is adding same behaviour & tests for .get() method, as the problem still occurres
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping(). This one is adding same behaviour & tests for .get() method, as the problem still occurres
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping()n This one is adding same behaviour & tests for .get() method, as the problem still occurres\n\nResolves: googleapis#211
Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping()n This one is adding same behaviour & tests for .get() method, as the problem still occurres\n\nResolves: googleapis#211
…mpute_engine._metadata.get() Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping() This one is adding same behaviour & tests for .get() method, as the problem still occurres See the issue for details Refs: googleapis#323 Resolves: googleapis#211
…mpute_engine._metadata.get() Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping() This one is adding same behaviour & tests for .get() method, as the problem still occurres See the issue for details Refs: googleapis#323 Resolves: googleapis#211
…mpute_engine._metadata.get() Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping() This one is adding same behaviour & tests for .get() method, as the problem still occurres See the issue for details Refs: googleapis#323 Resolves: googleapis#211
…mpute_engine._metadata.get() Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping() This one is adding same behaviour & tests for .get() method, as the problem still occurres See the issue for details Refs: googleapis#323 Resolves: googleapis#211
…mpute_engine._metadata.get() Initial fix of issue googleapis#211 was done in CL googleapis#323, but only for .ping() This one is adding same behaviour & tests for .get() method, as the problem still occurres See the issue for details Refs: googleapis#323 Resolves: googleapis#211
Original issue: googleapis/google-cloud-python#4358
After successful use of
credentials, _ = google.auth.default()
, an application crashes when credentials cannot be detected:/cc @dmho418
The text was updated successfully, but these errors were encountered: