Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect GitHub network errors and provide a clean error that defines that a bad response came from GitHub #3546

Closed
heysweet opened this issue Jul 8, 2021 · 4 comments · Fixed by #3729
Assignees
Labels
deps dbt's package manager enhancement New feature or request

Comments

@heysweet
Copy link
Contributor

heysweet commented Jul 8, 2021

Describe the feature

When running dbt deps, we found multiple users coming across different error states:

  • tarfile.ReadError: file could not be opened successfully
  • 2021-06-28 17:02:01.465062 (MainThread): STDOUT: "b''"
    2021-06-28 17:02:01.465389 (MainThread): STDERR: "b"fatal: destination path '2570ae56bf9cb34e49fe01aa3bc99195' already exists and is not an empty directory.\n""
  • HTTPSConnectionPool(host='codeload.github.com', port=443): Read timed out. (read timeout=10.0)
/usr/local/lib/python3.8/dist-packages/dbt/clients/system.py", line 471, in untar_package
  with tarfile.open(tar_path, 'r') as tarball:
File "usr/lib/python3.8/tarfile.py", line 1608, in open
  raise ReadError("file could not be opened successfully")

For all of these, the root cause was an intermittent outage of GitHub which meant that the repo was never pulled, timed out, or gave some kind of bad response.

I would like dbt to provide a cleaner error that properly identifies when a request to GitHub does not come back with the expected response, rather than failing in a followup I/O step with an unrelated error.

Describe alternatives you've considered

We can write more logic around a dbt deps call to catch and confirm any kind of error state that may have come from this, but since the GitHub network request happens inside the core logic, it would be much harder to detect that the network response came back in a state that was not expected.

Who will this benefit?

While I do think this could benefit anybody who is using GitHub with their dbt setup, this would greatly benefit the Support staff at dbt Labs, as well as the engineers at the company who triaged and attempted to identify a root cause, which could have been more easily identified at the site of the network request itself.

@heysweet heysweet added enhancement New feature or request triage labels Jul 8, 2021
@jtcohen6 jtcohen6 added packages Functionality for interacting with installed packages and removed triage labels Jul 15, 2021
@jtcohen6
Copy link
Contributor

Thanks for opening @heysweet!

This was cropping up in the install method of dbt deps. We failed to download the tar package, so when it came time to unpack ("untar") it, there was no tar there:

https://github.com/dbt-labs/dbt/blob/f460d275baa4df8cf8713f27e198a3577d6ea71c/core/dbt/deps/registry.py#L63-L67

Presumably, the error was a timeout in system.download:

https://github.com/dbt-labs/dbt/blob/f460d275baa4df8cf8713f27e198a3577d6ea71c/core/dbt/clients/system.py#L444-L452

I imagine we could do either of:

  • Raise a descriptive exception the first time download times out
  • Try download five times, then raise the exception

The latter would be consistent with how we handle Hub site unavailability, in an earlier deps step.

@jamesohare84
Copy link

jamesohare84 commented Jul 22, 2021

Thanks for this

We have just had this become a big issue for us. We have our dbt running in docker containers and it needs to run dbt deps every time an airflow dag is run.

Our dags are now constantly failing due to this issue (presumably some kind of latency has been introduced into our network, not sure).

cc @BenBalde

@poudrouxj
Copy link

[ Subscribed ] A lot of etl jobs failing today because of this 👎

@leahwicz leahwicz self-assigned this Aug 18, 2021
@leahwicz
Copy link
Contributor

This will be released in v0.21.0RC1 to try out

@jtcohen6 jtcohen6 added deps dbt's package manager and removed packages Functionality for interacting with installed packages labels Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deps dbt's package manager enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants