Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1888] [Bug] Package project yml configs overriding root project configs when used in other yml files #6705

Closed
2 tasks done
Tracked by #7372
rlh1994 opened this issue Jan 24, 2023 · 4 comments · Fixed by #7441
Closed
2 tasks done
Tracked by #7372
Assignees
Labels
bug Something isn't working multi_project vars

Comments

@rlh1994
Copy link
Contributor

rlh1994 commented Jan 24, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When a variable is used within a yaml file within a package (such as one containing model tests), this variable is taking precidence from the package config, not the project config as expected.

It is possible to workaround this currently by not setting the default in the yaml and setting it in each var call, but this is not ideal and it hides variables throughout the code and makes mistakes with changes more likely.

Expected Behavior

The variable would always take precidence from the project, allowing defaults to be set in the package yaml, but could be overwritten in the project yaml.

Steps To Reproduce

  1. Create a package with some basic model, a variable in the package's project yaml named enabled_test (give it the value true), and a yaml file for that model with some test.
  2. Set the enabled: condition of that test to be var('enabled_test')
  3. Create a new project, install this package and run dbt compile - you should see 1 test
  4. Set enabled_test to false in your main project yaml, run dbt compile you should still see 1 test
  5. Set enabled_test to true in your main project yaml, and false in your package (dbt deps if not locally installed), run dbt compile you should see 0 tests.

Relevant log output

No response

Environment

- OS: Mac OS
- Python: 3.9.13
- dbt: 1.3.2

Which database adapter are you using with dbt?

No response

Additional Context

Reopen of: #3698 but with non-source freshness specific scope, later discussed in #5123 as both went stale without a resolution

Specific case in snowplow web package snowplow/dbt-snowplow-web#54

@rlh1994 rlh1994 added bug Something isn't working triage labels Jan 24, 2023
@github-actions github-actions bot changed the title [Bug] Package project yml configs overriding root project configs when used in other yml files [CT-1888] [Bug] Package project yml configs overriding root project configs when used in other yml files Jan 24, 2023
@MichelleArk MichelleArk self-assigned this Jan 27, 2023
@MichelleArk
Copy link
Contributor

Thanks for (re)opening this @rlh1994! I was able to reproduce the precedence for variable declaration you described (package vars preceding project vars) with an example pared down to a model (not tests) by:

  1. Creating a package with a single model and vars: {enabled_test: false} set in its dbt_project.yml. The model was enabled via the enabled config based on the var value:
version: 2
models:
 - name: model_a
  config:
   enabled: '{{ var("enabled_test", true) }}'
  1. Installing the package from (1) in an encompassing project with vars: {enabled_test: true} set in its dbt_project.yml
  2. Running dbt compile from the encompassing project and observing that enabled_test:true set in (2) does not override enabled_test:false set in (1):
❯ dbt compile
15:16:00 Running with dbt=1.3.0
15:16:00 Unable to do partial parsing because a project config has changed
15:16:01 Found 0 models, 0 tests, 0 snapshots, 0 analyses, 500 macros, 0 operations, 3 seed files, 0 sources, 0 exposures, 0 metrics
15:16:01
15:16:02 Concurrency: 4 threads (target='dev')
15:16:02
15:16:02 Done.

From our documentation on variable precedence:

The order of precedence for variable declaration is as follows (highest priority first):

  1. The variables defined on the command line with --vars.
  2. The package-scoped variable declaration in the dbt_project.yml file
  3. The global variable declaration in the dbt_project.yml file.
  4. The variable's default argument (if one is provided).

Unfortunately our docs aren't clear on which dbt_project.yml gets used, the one from the package or the one in the root project. In this case, the package dbt_project.yml always takes precedence regardless of whether global or package-scoped vars are defined in the root dbt_project.yml.

As you've outlined, the vars in root-level dbt_project.yml should actually take precedence over the vars in package-level dbt_project.yml. This is what happens in other cases and is generally expected given how the configuration hierarchy works (= root-level dbt_project.yml config "wins," even over more-specific configurations defined in packages).

All that said, I’m curious to hear more about your use case! Is the idea to set vars from an encompassing project to configure the behaviour of an installed package? Given that vars are currently globals and share a flat namespace, supporting this use case would require rejigging vars to support package-namespaced overrides. I agree that the appropriate workaround for now is to set the default in each var call until this issue is resolved.

@rlh1994
Copy link
Contributor Author

rlh1994 commented Feb 6, 2023

Glad you were able to reproduce!

Sure, our main use case is around enabling/altering tests based on variables (as pretty much all our other varaibles are used directly in models/macros so don't hit this issue). Because we support multi-warehouses but have a different data structure for redshift/postgres we can end up with tests defined on models that aren't enabled, which raises a warning for our users and clogs the logs. We also have one case where a table may be unique across 2 columns instead of one depending on a variable so again that's impacted by a variable:

https://github.com/snowplow/dbt-snowplow-web/blob/deb14ea5aa7cd6238887117849cdc95d4bfe0551/models/page_views/scratch/page_views_scratch.yml#L297-L312

Our main approach to variables more generally is:

  1. Scope them at the package level, and encourage users to do that within their root project.yml
  2. Prefix them with snowplow__ in the majority of cases
  3. Any variables we use in multiple places and/or expect many users to customise, define in our package level project.yml
  4. Any variables only used in one place, with few users customising, define a default in the call but document in our docs.

@MichelleArk
Copy link
Contributor

MichelleArk commented Feb 27, 2023

Gotcha! If I could summarize the desired/expected behaviour with regards to variables precedence, it would be (highest precedence first):

  1. The variables defined on the command line with --vars.
  2. The package-scoped variable declaration in the root dbt_project.yml file (ideally this is where users can customize package-specific 'config-like' vars.
  3. The global variable declaration in the root dbt_project.yml file.
  4. The package-scoped variable declaration in the package dbt_project.yml file
  5. The global variable declaration in the package dbt_project.yml file
  6. The variable's default argument (if one is provided).

4 and 5 seem swappable for the scenarios to resolve the scenarios outlined.

Given the workarounds available and likely complexity of this change, this will probably remain a lower priority issue but should definitely be taken into account for any vars refactoring work done in dbt-core.

@MichelleArk
Copy link
Contributor

MichelleArk commented Apr 28, 2023

@rlh1994 - We ended up prioritizing and closing this issue as part of #7372! It should be released for 1.6.0b1 (May 11)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working multi_project vars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants