Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-664] [Feature] copy local dbt deps deeply, without symlinking #5271

Closed
1 task done
alexrosenfeld10 opened this issue May 18, 2022 · 5 comments
Closed
1 task done
Labels
deps dbt's package manager enhancement New feature or request stale Issues that have gone stale

Comments

@alexrosenfeld10
Copy link
Contributor

alexrosenfeld10 commented May 18, 2022

Is this your first time opening an issue?

Describe the Feature

Currently dbt deps does a few different things depending on the dependency type. For local dependencies, it creates symlinks pointing to the local dbt project. I have a monorepo use case where I’d like to publish each sub-project separately, but they all reference a top level project. As a result, when I run dbt deps and then publish the directory the deps aren’t actually included in the artifact, instead it's just a symlink. When I pull the artifact down and run it in my cron job, the symlink is (obviously) broken.

There should be an option on local type dependencies that overrides this behavior and actually copies the dependency in, similar to how the git / hub based packages work

Describe alternatives you've considered

My workaround here is I'm just going to publish the entire monorepo and then have my execution job navigate to the sub-project directory, OR using the GitHub PAT type auth for cloning. That's (slightly) less preferred because it means any of the developers in my org using this monorepo will have to set up their own PAT and env var in order to run the project, when in reality.. the dependency is in fact already on their machine locally already in the repo itself 😆

Who will this benefit?

No response

Are you interested in contributing this feature?

Sure! If time allows.

Anything else?

No response

@alexrosenfeld10 alexrosenfeld10 added enhancement New feature or request triage labels May 18, 2022
@github-actions github-actions bot changed the title [Feature] copy local dbt deps deeply, without symlinking [CT-664] [Feature] copy local dbt deps deeply, without symlinking May 18, 2022
@jtcohen6 jtcohen6 added Team:Language deps dbt's package manager labels May 19, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 1, 2022

@alexrosenfeld10 Thanks for the really clear write-up, as always!

We had a chance to discuss this a bit yesterday. dbt already creates "deep" links on Windows, which doesn't support real symlinking:

can_create_symlink = system.supports_symlinks()

if can_create_symlink:
fire_event(DepsCreatingLocalSymlink())
system.make_symlink(src_path, dest_path)
else:
fire_event(DepsSymlinkNotAvailable())
shutil.copytree(src_path, dest_path)

We are a bit hesitant about mixing symlinks and "deep" links, since these could lead to situations that are confusing to debug, both for users and for us. The "deep" links on Windows are not preferable IMO, as they can cause other related errors (file permissions) when trying to clean/replace installed dependencies: #4372 (comment)

My workaround here is I'm just going to publish the entire monorepo and then have my execution job navigate to the sub-project directory

This feels reasonable to me... alternatively, could all of the projects exist at the same hierarchy level, rather than the one reused package living at the top level?

I have a monorepo use case where I’d like to publish each sub-project separately, but they all reference a top level project.

Out of curiosity, is the top-level project here a project of shared sources / upstream models (pointers to data warehouse objects)? Or shared macros (dbt source code) used by all of those sub-packages? If it's the former, I've got some bigger ideas of how we could better support the pattern of multiple projects, owned by different teams, that roll up to one mono-DAG / monorepo: #5244

The parallels to #4538 here are instructive, insofar as both of these issues are finding rough edges around local deps, and both to solve a use case (multi-project deployment) for which we think we could have more compelling answers.

@jtcohen6 jtcohen6 removed the triage label Jun 1, 2022
@alexrosenfeld10
Copy link
Contributor Author

Thanks @jtcohen6, and same to you!

alternatively, could all of the projects exist at the same hierarchy level, rather than the one reused package living at the top level?

Yes, that's possible, but we have a project per-domain and it's coupled with other per-domain entities, including some bespoke per-domain configurations (for example, which events should we stream into our data warehouse for this domain). I'd prefer to keep them all co-located in nested folders, as it's preferable for the various domain teams for things like PR review codeowners, etc.

is the top-level project here a project of ... shared macros
It's currently shared macros that override things like default__generate_schema_name and default__generate_database_name to enforce a few standards in the per-domain projects (we've got another project for shared resources / common models, but as you discuss in #5244 it's not super simple to do that right now, and as a result we run that separately and use it as sources in the per-domain projects 😢). It also serves as common utils for things that all domain projects need to consume in a standard way.

I definitely agree with the ideas in #5244, but it's a big lift and there's a lot to do! I think we could soften the rough edges by adding support for "deep copy this package because I explicitly asked you to" type functionality in presumably a much shorter time frame, softening those rough edges while not ruling out the more compelling (but longer term) answers (I realize there are issues wrt/Windows, but that seems to me like something to be debugged in isolation to this issue).

@alexrosenfeld10
Copy link
Contributor Author

btw, I decided to go with this as a workaround:

packages:
  - git: "https://{{ env_var('DBT_ENV_SECRET_GITHUB_TOKEN') }}@github.com/my-company/my-data-mesh-project.git"
    warn-unpinned: false
    subdirectory: "my-reporting-team-internal/dbt/common_utils"

This means:

  1. developers have to set DBT_ENV_SECRET_GITHUB_TOKEN as part of onboarding to this repo. Not the end of the world.
  2. I have to do this (kind of goofy thing) in my orchestrated job environment:
        # dbt will require this to be set when it parses the project because it appears in the packages.yml file
        # however, we don't need a real value because the dependencies are resolved in the CI 
        # at time and are already part of the artifact instead of being resolved just prior to runtime.
        - name: DBT_ENV_SECRET_GITHUB_TOKEN
          value: "dummy-value"

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Nov 29, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2022

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deps dbt's package manager enhancement New feature or request stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

2 participants