Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate dbt_utils folder when running dbt deps #4372

Closed
1 of 5 tasks
abalila opened this issue Nov 19, 2021 · 25 comments
Closed
1 of 5 tasks

Duplicate dbt_utils folder when running dbt deps #4372

abalila opened this issue Nov 19, 2021 · 25 comments
Labels
bug Something isn't working deps dbt's package manager stale Issues that have gone stale windows Everyone's favorite OS that's sometimes a little weird

Comments

@abalila
Copy link

abalila commented Nov 19, 2021

Describe the bug

When running dbt deps, it creates another dbt-utils folder which results in an error. The only way to be able to fix dbt from breaking, is to manually delete dbt-utils-x.x.x and not run dbt deps.
Running dbt deps and the duplicated folder will appear, but after going back and forth with deleting the folder and running the command again, all of a sudden the command would run successfully. Note, I deleted the duplicated folder after each time I got the winerror 32

Steps to reproduce

Simply running the command dbt deps

Expected results

Having dbt deps run successfully and receiving a message Up to date! as a final output

Actual results

When running the command dbt deps, a duplicate folder of dbt_utils will be created but with a specific version of the package. In the dbt_modules folder, there will be two dbt_utils folders (dbt_utils folder and dbt-utils-x.x.x folder)

Screenshots and log output

image
image

System information

The contents of your packages.yml file:
The content in my packages.yml file is as follows

packages:
  - package: dbt-labs/dbt_utils
    version: [">=0.7.0", "<0.8.0"]

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.21.0
   latest version: 0.21.0

Up to date!

Plugins:
  - bigquery: 0.21.0
  - postgres: 0.21.0
  - redshift: 0.21.0
  - snowflake: 0.21.0

Additional context

As mentions above, when deleted the duplicated folder and running dbt deps several time, it will eventually execute correctly. In the screenshot below, I run the command everytime and deleted the duplicated folder right after each command until the command executed successfully
image

Are you interested in contributing the fix?

``

@radian21
Copy link

I'm having the exact same problem. Same versions of dbt, dbt_utils, plugins and using Snowflake. Using Windows 10.

However, the workaround @abalila used by deleting the dbt-utils-0.7.4 folder is not working for me. I am currently blocked from further dev by this issue.

Running dbt clean then dbt deps consistently gives me this error:

Running with dbt=0.21.0
Installing dbt-labs/dbt_utils@0.7.4
Encountered an error:
[WinError 32] The process cannot access the file because it is being used by another process: 'dbt_modules\dbt-utils-0.7.4\integration_tests'

@radian21
Copy link

Update: I ran the cycle of dep clean and dbt deps several times both deleting the dbt-utils-0.7.4 folder and not deleting it. All iterations failed with the error.

I waited a while and closed VS Code and other apps that might(?) have had some of hook on the files dbt deps was trying to update and re-ran it. It worked that time updating dependencies.

There is some issue going on here and I wish I could give you a solid set of steps to reproduce it.

@jasnonaz
Copy link

Hi folks - thank you for posting this.

Question - did this problem crop up in tandem with upgrading to 0.7.4 or had you upgraded and been using successfully before this started to happen?

@radian21
Copy link

radian21 commented Nov 19, 2021

@jasnonaz Good question. It happened on the initial install of 0.7.4. I didn't pay much attention to it at the time because I thought I was experiencing a contention issue due to a csv seed file I had open at the same time. Probably bad speculation on my part. After closing that file and re-installing deps, it worked and I assumed that was the issue. It was running fine for a few weeks.

Today, I needed to dbt clean and on the re-install of deps I saw the issue again.

@jasnonaz
Copy link

Hmmm this is definitely an interesting one!! Can I request that you try to pin your version of utils precisely to 0.7.4 and see if that does anything to resolve this? If not, could also try directly pinning to an earlier version `0.7.3' or lower and see if that works out for you.

Also just to confirm - that is your entire packages.yml? We're working on ensuring dependency issues between packages but they can still creep in sometimes.

Appreciate you working through this with us here - we're investigating what might be the cause of this internally but want to see if we can isolate it a little better on your end. 🙏

@radian21
Copy link

@jasnonaz Let me clarify my environment during this experience. I first saw the problem repeatedly when the entirety of my packages.yml file looked like this:

packages:

  • package: dbt-labs/dbt_utils
    version: 0.7.4

I updated to include the range like this later:

packages:

  • package: dbt-labs/dbt_utils
    version: [">=0.7.0", "<0.8.0"]

And continued to see the error for at least 3 more dbt deps iterations before it succeeded.

Yes, it is weird, but the first occurrences of the problem were when the package was isolated to 0.7.4.

@abalila
Copy link
Author

abalila commented Nov 22, 2021

The packages.yml only has the 3 lines I included in the issue description.

I tried pointing to a specific version and I also tried older versions. No matter which version I chose, it resulted in the same error. As @radian21 wrote, it takes multiple tries until the command executed successfully (with deleting the duplicated folder each time).

@joellabes
Copy link
Contributor

joellabes commented Nov 30, 2021

Caught it in the act! https://www.loom.com/share/c1a000b2bb6544f4b74a46a05b21724b (45 second mark)

It looks like it makes a directory called dbt-utils-0.7.4 and then renames it to dbt_utils a fraction of a second later. The other packages don't behave that way... not sure why!

Edit: We went back to utils 0.7.3 as well, and observed the same behaviour, as well as for hubspot_source. I wonder whether it actually does it for all packages, but those ones are larger? 🤷

I feel reasonably confident that this is actually a Core issue, so will shoot it over there!

@joellabes joellabes transferred this issue from dbt-labs/dbt-utils Nov 30, 2021
@joellabes joellabes added packages Functionality for interacting with installed packages triage labels Nov 30, 2021
@jtcohen6 jtcohen6 added Team: Execution bug Something isn't working and removed triage labels Dec 1, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 2, 2021

Great video, nice work reproducing @joellabes @jasnonaz!

I think there are two relevant spots to look at. First up, what's happening when you run deps for a Hub package:

download_url = metadata.downloads.tarball
system.download_with_retries(download_url, tar_path)
deps_path = project.packages_install_path
package_name = self.get_project_name(project, renderer)
system.untar_package(tar_path, deps_path, package_name)

Line by line:

  1. Get the URL of the tarball (package) we want to download. This is after version/dependency resolution, so we're installing a versioned tarball (specific version of the package).
  2. Download the tarball (package), with retries, and stick it in a temp location (tar_path). On my machine, that path resolves to something like '/private/var/folders/7h/hj5_fw9j291c58hwfdvy5xbm0000gp/T/dbt-downloads-4xj3oo5t/dbt-labs/dbt_utils.0.7.4.tar.gz'. Note that the final name of this file is dbt_utils.0.7.4.tar.gz, not dbt_utils.tar.gz.
  3. Get the deps_path, the place this package actually wants to go. (In v1, we're renaming its default value from dbt_modulesdbt_packages)
  4. Get the package_name for the package we just downloaded, i.e. the thing we actually want to call it (dbt_utils)
  5. Use system.untar_package to extract all files from the tarball, and rename the resulting directory to package_name

Ok! So let's look at system.untar_package in more detail. This is defined in the dbt.clients.system module, which is where we try to define common methods that handle differences across operating systems.

def untar_package(
tar_path: str, dest_dir: str, rename_to: Optional[str] = None
) -> None:
tar_path = convert_path(tar_path)
tar_dir_name = None
with tarfile.open(tar_path, 'r') as tarball:
tarball.extractall(dest_dir)
tar_dir_name = os.path.commonprefix(tarball.getnames())
if rename_to:
downloaded_path = os.path.join(dest_dir, tar_dir_name)
desired_path = os.path.join(dest_dir, rename_to)
dbt.clients.system.rename(downloaded_path, desired_path, force=True)

Step by step:

  1. Get the path where the package tarball has been stored ('/private/var/folders/...')
  2. We don't yet know the tar_dir_name, that is, the name of the folder that will be created when we extract the tarball
  3. Open the tarball
  4. Extract the tarball into a directory. This is the step that creates a folder named dbt-utils-0.7.4
  5. Get the name of the just-created directory
  6. If we're renaming the just-created directory (and indeed we are):
  7. Get the relative path of the directory we just created ('dbt_packages/dbt-utils-0.7.4')
  8. Get the relative path of the directory we desire to create ('dbt_packages/dbt_utils')
  9. Use dbt.clients.system.rename() to rename the directory from the current name to the desired name. This is where the error must be occurring on Windows. In particular, the rename is disallowed because Windows detects a symlink from 'dbt_packages/dbt-utils-0.7.4/integration_tests'. The rename() method has logic that's meant to handle that symlinking, but it's clearly not working:

def rename(from_path: str, to_path: str, force: bool = False) -> None:
from_path = convert_path(from_path)
to_path = convert_path(to_path)
is_symlink = path_is_symlink(to_path)
if os.path.exists(to_path) and force:
if is_symlink:
remove_file(to_path)
else:
rmdir(to_path)
shutil.move(from_path, to_path)

At this point, we need someone with Windows expertise and/or a Windows machine (ideally both!) to help us figure out the right way to handle that rename

@jtcohen6 jtcohen6 added the windows Everyone's favorite OS that's sometimes a little weird label Dec 2, 2021
@joellabes
Copy link
Contributor

we need someone with Windows expertise and/or a Windows machine (ideally both!)

I have neither 😭

@sophiad96
Copy link

Just a heads up, I had a user report this happening with the fivetran/jira package as well.

@joellabes
Copy link
Contributor

Just a heads up, I had a user report this happening with the fivetran/jira package as well.

@sophiad96 was this on dbt Cloud? That would disprove the assumption that it's a Windows-specific issue, right? or was it a cloud user who happened to be on the CLI but reached out to you?

@sophiad96
Copy link

@joellabes it was on the CLI! I should've mentioned that!

@eli-zarzar
Copy link

We're experiencing the same issue on the CLI in Windows with the fivetran/jira package. There are no issues when we run dbt deps in the cloud.

@jtcohen6 jtcohen6 added deps dbt's package manager and removed packages Functionality for interacting with installed packages labels Mar 30, 2022
@dweaver33
Copy link

dweaver33 commented May 25, 2022

Same issue here for some Windows users.

Edit: After some testing, we were able to get this to work for Windows users by closing VSCode and running the dbt deps command on the command line directly.

@JasonMcKenzie1977
Copy link

I overcame this issue by installing dbt into a new directory. I got a similar error when trying to delete the old directory (Folder In Use - The action can't be completed because the folder or a file in it is open in another program). This prompted me to install docs.microsoft.com/en-us/sysinternals/downloads/… to try to find what was hanging up... I couldn't see anything so I tried deleting the old directory again to see if that helped pinpoint things, but that time I was successful deleting it. Very strange...

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 24, 2022
@joellabes
Copy link
Contributor

lol i wish, but keep it open

@github-actions github-actions bot removed the stale Issues that have gone stale label Nov 25, 2022
@andersbergren
Copy link

I had this problem using dbt-sqlserver 1.3.1 in VS Code on Windows and got it working by pausing my OneDrive sync

@joellabes
Copy link
Contributor

Interesting @andersbergren! Is OneDrive configured to watch/sync/etc your dbt project's directory?

@natej-eb
Copy link

natej-eb commented Mar 24, 2023

I'm having this same problem on Windows 10. I setup a new project and tried to start with dbt-snowflake==1.4.1 and

packages:
  - package: dbt-labs/dbt_utils
    version: 1.0.0

I was using python version 3.10.10 and in my debugging I thought it maybe was because of the python version, but I tried different versions in python 3.9 and even 3.8 and it didn't seem to have any difference.

Couldn't ever get it to work, but I figured out that by downgrading to dbt-snowflake==1.3.0 worked.

I've since upgraded dbt-snowflake to 1.4.2 via pip install --upgrade dbt-snowflake and the upgrade was successful. But I wanted to add a new package and running dbt deps failed again like it did before.

@tanderson-hp
Copy link

Not sure if this is what other people are seeing but I got past this by temporarily disabling the dbt-poweruser extension. It seems like it's related to this issue in the dbt-poweruser extension

@JasonMcKenzie1977
Copy link

Not sure if this is what other people are seeing but I got past this by temporarily disabling the dbt-poweruser extension. It seems like it's related to this issue in the dbt-poweruser extension

That's interesting. My colleagues and I have been overcoming this issue by closing out of VS Code and running dbt deps in Powershell. It works every time & then we can resume working in VS Code.

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Feb 26, 2024
Copy link
Contributor

github-actions bot commented Mar 4, 2024

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working deps dbt's package manager stale Issues that have gone stale windows Everyone's favorite OS that's sometimes a little weird
Projects
None yet
Development

No branches or pull requests