Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default behaviour for converting dbt Source nodes into Airflow tasks #630

Closed
Tracked by #1103
tatiana opened this issue Oct 26, 2023 · 5 comments · Fixed by #1107
Closed
Tracked by #1103

Add default behaviour for converting dbt Source nodes into Airflow tasks #630

tatiana opened this issue Oct 26, 2023 · 5 comments · Fixed by #1107
Assignees
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:parsing Related to parsing DAG/DBT improvement, issues, or fixes dbt:source Primarily related to dbt source command or functionality triage-needed Items need to be reviewed / assigned to milestone
Milestone

Comments

@tatiana
Copy link
Collaborator

tatiana commented Oct 26, 2023

As of Cosmos 1.2.1, it does not render dbt Source nodes as Airflow tasks by default.

Since Cosmos 1.2.0, we've introduced support for customizing how the library converts any dbt node into Airflow:
https://astronomer.github.io/astronomer-cosmos/configuration/render-config.html#customizing-how-nodes-are-rendered-experimental

This means users can already customise the desired behaviour for Source nodes, using something like:

DbtResourceType("source"): convert_source, # known dbt node type to Cosmos (part of DbtResourceType)

The open questions are:

  1. Do we want Cosmos to have a default behaviour for dbt Source nodes and automatically generate tasks for nodes of this type?
  2. If the answer to (1) is yes, what do we want the behaviour for those nodes to be? Should we run:
dbt source freshness
@tatiana tatiana added this to the 1.3.0 milestone Oct 26, 2023
@tatiana tatiana changed the title Add support converting dbt Source nodes into Airflow tasks Add default behaviour for converting dbt Source nodes into Airflow tasks Oct 26, 2023
@pgoslatara
Copy link
Contributor

In addition to freshness checks users can also (and sometimes do) add standard tests to their sources, i.e. unique, not_null, etc. I think a good default behaviour would be to display sources as a task in the Airflow UI as this provides an Airflow DAG to the end user familiar with dbt that is most similar to dbt's DAG, thereby enabling Cosmos adoption. These tasks can use EmptyOperator and therefore have no impact on compute usage or scheduler resources.

One small note, DummyOperator has been deprecated in favour of EmptyOperator in Airflow 2.4 (see here).

@tatiana tatiana added area:parsing Related to parsing DAG/DBT improvement, issues, or fixes area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc dbt:source Primarily related to dbt source command or functionality labels Nov 8, 2023
@tatiana tatiana modified the milestones: 1.3.0, 1.4.0 Dec 7, 2023
@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 9, 2024
Copy link

dosubot bot commented Mar 9, 2024

Hi, @tatiana,

I'm Dosu, and I'm helping the Cosmos team manage their backlog. I wanted to let you know that I'll be marking this issue as stale. From what I understand, the issue you raised pertains to the default rendering of dbt Source nodes as Airflow tasks in Cosmos 1.2.1, and there's a discussion about the need for default behavior considering the customization options introduced in Cosmos 1.2.0. Additionally, there's a suggestion to display sources as tasks in the Airflow UI and a mention of the deprecation of DummyOperator in favor of EmptyOperator in Airflow 2.4.

Could you please confirm if this issue is still relevant to the latest version of the Cosmos repository? If it is, please let the Cosmos team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding, and please don't hesitate to reach out if you have any questions or need further assistance.

Best,
Dosu

@linchun3
Copy link
Contributor

+1 to this issue as this is something I'm interested in.

  1. Do we want Cosmos to have a default behaviour for dbt Source nodes and automatically generate tasks for nodes of this type?

I'm in support for having each source render as an Airflow task, and running all tests attached to it. This improves the visibility of a source and whether a model update failure is due to an upstream failure.

  1. If the answer to (1) is yes, what do we want the behaviour for those nodes to be?

dbt test --select source:jaffle_shop.orders

This command should run all the tests associated to a source. I'm not sure how we should handle dbt source freshness though.

Add on questions:
3. How do we handle models that share the same source? Should we render 2 source nodes or should the models share the same source node?

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 11, 2024
Copy link

dosubot bot commented Mar 11, 2024

@tatiana, could you please help with this issue? The user has indicated that the issue is still relevant and has provided additional context and questions. Thank you!

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@tatiana tatiana modified the milestones: 1.4.0, 1.5.0 Apr 25, 2024
@tatiana tatiana added the triage-needed Items need to be reviewed / assigned to milestone label May 17, 2024
@tatiana tatiana modified the milestones: Cosmos 1.5.0, 1.6.0 May 17, 2024
@pankajastro
Copy link
Contributor

Look like we have a PR #661 from @arojasb3 to fix it

@tatiana tatiana mentioned this issue Jul 31, 2024
18 tasks
tatiana pushed a commit that referenced this issue Aug 14, 2024
Re-Opening of PR #661

This PR features a new way of rendering source nodes:
- Check freshness for sources with freshness checks
- Source tests
- Empty operators for nodes without tests or freshness.

One of the main limitations I found while using the `custom_callback`
functions on source nodes to check freshness is that nodes were being
created on 100% of sources but not all of them required freshness
checks, this made workers waste compute time.

I'm adding a new variable into the DbtNode class called has_freshness
which would be True for sources with freshness checks and False for any
other resource type.

If this feature is enabled with the option `ALL`:
All sources with the has_freshness == False will be rendered as Empty
Operators, to keep the dbt's behavior of showing sources as suggested in
issue #630
<!-- Add a brief but complete description of the change. -->

A new rendered template field is included too: `freshness` which is the
sources.json generated by dbt when running `dbt source freshness`

This adds a new node type (source), which changes some tests behavior.
This PR also updates the dev dbt project jaffle_shop to include source
nodes when enabled.

![image](https://github.com/user-attachments/assets/e972ac58-8741-4c13-9905-e78775f9cc80)

As seen in the image, source nodes with freshness checks are rendered
with a blue color, while the ones rendered as EmptyOperator show a
white/light green color

Closes: #630
Closes: #572
Closes: #875
<!-- If this PR closes an issue, you can use a keyword to auto-close.
-->
<!-- i.e. "closes #0000" -->

This won't be a breaking change since the default behavior will still be
ignoring this new feature. That can be changed with the new RenderConfig
variable called `source_rendering_behavior`.

Co-authored-by: Pankaj <pankaj.singh@astronomer.io>
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:parsing Related to parsing DAG/DBT improvement, issues, or fixes dbt:source Primarily related to dbt source command or functionality triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants