You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/usr/local/lib/python3.8/dist-packages/dbt/clients/system.py", line 471, in untar_package
with tarfile.open(tar_path, 'r') as tarball:
File "usr/lib/python3.8/tarfile.py", line 1608, in open
raise ReadError("file could not be opened successfully")
For all of these, the root cause was an intermittent outage of GitHub which meant that the repo was never pulled, timed out, or gave some kind of bad response.
I would like dbt to provide a cleaner error that properly identifies when a request to GitHub does not come back with the expected response, rather than failing in a followup I/O step with an unrelated error.
Describe alternatives you've considered
We can write more logic around a dbt deps call to catch and confirm any kind of error state that may have come from this, but since the GitHub network request happens inside the core logic, it would be much harder to detect that the network response came back in a state that was not expected.
Who will this benefit?
While I do think this could benefit anybody who is using GitHub with their dbt setup, this would greatly benefit the Support staff at dbt Labs, as well as the engineers at the company who triaged and attempted to identify a root cause, which could have been more easily identified at the site of the network request itself.
The text was updated successfully, but these errors were encountered:
This was cropping up in the install method of dbt deps. We failed to download the tar package, so when it came time to unpack ("untar") it, there was no tar there:
We have just had this become a big issue for us. We have our dbt running in docker containers and it needs to run dbt deps every time an airflow dag is run.
Our dags are now constantly failing due to this issue (presumably some kind of latency has been introduced into our network, not sure).
Describe the feature
When running
dbt deps
, we found multiple users coming across different error states:2021-06-28 17:02:01.465389 (MainThread): STDERR: "b"fatal: destination path '2570ae56bf9cb34e49fe01aa3bc99195' already exists and is not an empty directory.\n""
For all of these, the root cause was an intermittent outage of GitHub which meant that the repo was never pulled, timed out, or gave some kind of bad response.
I would like dbt to provide a cleaner error that properly identifies when a request to GitHub does not come back with the expected response, rather than failing in a followup I/O step with an unrelated error.
Describe alternatives you've considered
We can write more logic around a dbt deps call to catch and confirm any kind of error state that may have come from this, but since the GitHub network request happens inside the core logic, it would be much harder to detect that the network response came back in a state that was not expected.
Who will this benefit?
While I do think this could benefit anybody who is using GitHub with their dbt setup, this would greatly benefit the Support staff at dbt Labs, as well as the engineers at the company who triaged and attempted to identify a root cause, which could have been more easily identified at the site of the network request itself.
The text was updated successfully, but these errors were encountered: