Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support using different dbt paths for DAG rendering and dbt execution #568

Closed
2 tasks
tatiana opened this issue Sep 29, 2023 · 4 comments · Fixed by #634
Closed
2 tasks

Support using different dbt paths for DAG rendering and dbt execution #568

tatiana opened this issue Sep 29, 2023 · 4 comments · Fixed by #634
Assignees
Labels
good first issue Good for newcomers
Milestone

Comments

@tatiana
Copy link
Collaborator

tatiana commented Sep 29, 2023

Context

As of Cosmos 1.1.x, the argument project_dir (passed in the ProjectConfig constructor) is used for two purposes:

  1. To interpret the dbt project & build Airflow DAG
  2. To execute the dbt project nodes

Although this makes sense for the ExecutionMode.LOCAL, it would make more sense to declare different paths when using ExecutionMode.DOCKER and ExecutionMode.KUBERNETES.

This was raised as an issue by @adityasharma1685 in the Apache Airflow #airflow-dbt Slack channel:
https://apache-airflow.slack.com/archives/C059CC42E9W/p1695930777905369

Proposal

Introduce two new configuration arguments to be passed in the constructor

  • RenderConfig(dbt_project_path="airflow-worker-dbt-project-path") (optional, new)
  • ExecutionConfig(dbt_project_path="where-dbt-is-executed") (optional, new)
    To complement:
  • ProjectConfig(dbt_project_path="current-project-path-used-for-parsing-and-execution") (existing)

For the DAG generation (1), we could:
a. Use RenderConfig.dbt_project_path if defined, or
b. Fallback to the existing config (backward compatible) ProjectConfig.dbt_project_path

For dbt execution (2), we could:
a. Use ExecutionConfig.dbt_project_path if defined, or
b. Fallback to the existing config (backward compatible) ProjectConfig.dbt_project_path

Alternative

I'm still trying to figure out alternatives. Any ideas are very welcome!

It would probably not work to set project_dir (

) as an Airflow templated field, because the Kubernetes Pod is instantiated at the Airflow Worker - and would likely have the same value as the Scheduler/DAG Parser.

Acceptance criteria

@tatiana
Copy link
Collaborator Author

tatiana commented Oct 5, 2023

Discussed in the #airflow-dbt channel:
https://apache-airflow.slack.com/archives/C059CC42E9W/p1696308262251219

@tatiana
Copy link
Collaborator Author

tatiana commented Oct 12, 2023

This ticket was brought up in a new thread in the #airflow-dbt slack channel:
https://apache-airflow.slack.com/archives/C059CC42E9W/p1697141246951679

@MrBones757
Copy link
Contributor

I'm happy to start working on this issue; however before i do so, i'd like to discuss some options i believe we have.

Since a majority of the work required to implement this was included in #605, the main changes that need to be made here is the generation of new user-configurable fields.

At the moment, i have the following list of ideas that i think would be worthy of discussion:

1 - add a new field to ProjectConfig, alongside dbt_project_path called dbt_execution_path
2 - add 2 new fields - 1 to RenderConfig called dbt_project_path and one to ExecutionConfig called dbt_project_path
in this case; the existing field in ProjectConfig, and new fields would be mutually exclusive; that is - one or the other
This would need to be validated somehow, and an exception raised if both are defined
3 - The same as 2, except we would allow both to be defined, where the most specific values from ExecutionConfig or RenderConfig would override the ProjectConfig value.

My current thinking is that 2 would be the best approach, and it would get us in a position where #597 could be implemented in the future without being too breaking, as this would, in theory, have a similar interface. It would mean that #597's scope would be reduced to a ProjectConfig deprecation of some sort as opposed to a full interface change. We could also add a deprecation notice for the ProjectConfig field in favor of these two new fields, though it would likely be out of scope for this PR - up for debate though.

If we can get some agreeance on the direction and scope (or indeed another option other than the ones specified above), i could start working on this immediately

Appreciate any feedback / discussion.

@tatiana
Copy link
Collaborator Author

tatiana commented Oct 26, 2023

Hey @MrBones757 , I agree with you on approach (2). This way, we are backward compatible while teaching users how we'd like them to do things from now onwards. Go for it! 🚀

tatiana pushed a commit that referenced this issue Nov 3, 2023
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
tatiana pushed a commit that referenced this issue Nov 6, 2023
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
(cherry picked from commit b64eb9a)
tatiana pushed a commit that referenced this issue Nov 6, 2023
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
(cherry picked from commit b64eb9a)
LennartKloppenburg pushed a commit to LennartKloppenburg/astronomer-cosmos that referenced this issue Nov 17, 2023
… Rendering and Execution (astronomer#634)

This MR finishes the work that was started in astronomer#605 to add full support
for ProjectConfig.dbt_project_path = None, and implements astronomer#568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: astronomer#568
LennartKloppenburg pushed a commit to LennartKloppenburg/astronomer-cosmos that referenced this issue Dec 17, 2023
… Rendering and Execution (astronomer#634)

This MR finishes the work that was started in astronomer#605 to add full support
for ProjectConfig.dbt_project_path = None, and implements astronomer#568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: astronomer#568
tatiana pushed a commit that referenced this issue Jul 4, 2024
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
tatiana pushed a commit that referenced this issue Jul 29, 2024
… Rendering and Execution (#634)

This MR finishes the work that was started in #605 to add full support
for ProjectConfig.dbt_project_path = None, and implements #568.

Within this PR, several things have been updated:
1 - Added project_path fields to RenderConfig and ExecutionConfig
2 - Simplified the consumption of RenderConfig in the dbtGraph class
3 - added option to configure different dbt executables for Rendering vs
Execution.

Closes: #568
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants