Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DatabricksTaskOperator #40013

Merged
merged 4 commits into from
Jun 4, 2024

Conversation

pankajkoti
Copy link
Member

@pankajkoti pankajkoti commented Jun 2, 2024

This pull request introduces the DatabricksTaskOperator
to the Airflow Databricks provider from the astro-provider-databricks
repository. Unlike the DatabricksNotebookOperator which only
allows to run Notebook tasks, the DatabricksTaskOperator allows
running all kinds of tasks across a wide range of types including
notebooks, JAR files, Python scripts, Databricks SQL queries
and dashboards, Delta Live Tables pipelines, dbt tasks and more
that are supported by the Jobs API in its tasks attribute.

It marks another pull request aimed at contributing operators
and features from that repository into the Airflow Databricks
provider. This PR also abstracts the common implementation
between the DatabricksNotebookOperator and
DatabricksTaskOperator into a base abstract class
DatabricksTaskBaseOperator so that we do not introduce
duplicate code for the common set of implementation methods.

Example successful run of a DAG using the
DatabricksTaskOperator with the implementation from this PR

Screenshot 2024-06-03 at 1 44 04 AM

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@pankajkoti pankajkoti force-pushed the add-databricks-task-operator branch from 2fb3bb0 to 058d082 Compare June 2, 2024 20:49
@pankajkoti pankajkoti marked this pull request as ready for review June 3, 2024 06:32
@pankajkoti
Copy link
Member Author

cc: @tatiana for helping review the PR.

Copy link
Member

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pankajkoti pankajkoti requested a review from Lee-W June 3, 2024 11:04
@pankajkoti pankajkoti force-pushed the add-databricks-task-operator branch from e26fa40 to c5c3d34 Compare June 3, 2024 11:26
Copy link
Contributor

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great, @pankajkoti , thanks for all the refactoring and improvements!
I added a minor comment on the documentation.

@pankajkoti pankajkoti merged commit 68bd42a into apache:main Jun 4, 2024
49 checks passed
@pankajkoti pankajkoti deleted the add-databricks-task-operator branch June 4, 2024 10:22
fdemiane pushed a commit to fdemiane/airflow that referenced this pull request Jun 6, 2024
This pull request introduces the [DatabricksTaskOperator](https://github.com/astronomer/astro-provider-databricks/blob/main/src/astro_databricks/operators/common.py#L26) 
to the Airflow Databricks provider from the [astro-provider-databricks](https://github.com/astronomer/astro-provider-databricks/tree/main)
repository. Unlike the `DatabricksNotebookOperator` which only
allows to run Notebook tasks, the `DatabricksTaskOperator` allows
running all kinds of tasks across a wide range of types including 
notebooks, JAR files, Python scripts, Databricks SQL queries 
and dashboards, Delta Live Tables pipelines, dbt tasks and more
that are supported by the [Jobs API in its tasks attribute](https://docs.databricks.com/api/workspace/jobs/create#tasks).

It marks another pull request aimed at contributing  operators 
and features from that repository into the Airflow  Databricks 
provider. This PR also abstracts the common implementation
between the `DatabricksNotebookOperator` and 
`DatabricksTaskOperator` into a base abstract class 
`DatabricksTaskBaseOperator` so that we do not introduce 
duplicate code for the common set of implementation methods.
syedahsn pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Jun 7, 2024
This pull request introduces the [DatabricksTaskOperator](https://github.com/astronomer/astro-provider-databricks/blob/main/src/astro_databricks/operators/common.py#L26) 
to the Airflow Databricks provider from the [astro-provider-databricks](https://github.com/astronomer/astro-provider-databricks/tree/main)
repository. Unlike the `DatabricksNotebookOperator` which only
allows to run Notebook tasks, the `DatabricksTaskOperator` allows
running all kinds of tasks across a wide range of types including 
notebooks, JAR files, Python scripts, Databricks SQL queries 
and dashboards, Delta Live Tables pipelines, dbt tasks and more
that are supported by the [Jobs API in its tasks attribute](https://docs.databricks.com/api/workspace/jobs/create#tasks).

It marks another pull request aimed at contributing  operators 
and features from that repository into the Airflow  Databricks 
provider. This PR also abstracts the common implementation
between the `DatabricksNotebookOperator` and 
`DatabricksTaskOperator` into a base abstract class 
`DatabricksTaskBaseOperator` so that we do not introduce 
duplicate code for the common set of implementation methods.
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
This pull request introduces the [DatabricksTaskOperator](https://github.com/astronomer/astro-provider-databricks/blob/main/src/astro_databricks/operators/common.py#L26) 
to the Airflow Databricks provider from the [astro-provider-databricks](https://github.com/astronomer/astro-provider-databricks/tree/main)
repository. Unlike the `DatabricksNotebookOperator` which only
allows to run Notebook tasks, the `DatabricksTaskOperator` allows
running all kinds of tasks across a wide range of types including 
notebooks, JAR files, Python scripts, Databricks SQL queries 
and dashboards, Delta Live Tables pipelines, dbt tasks and more
that are supported by the [Jobs API in its tasks attribute](https://docs.databricks.com/api/workspace/jobs/create#tasks).

It marks another pull request aimed at contributing  operators 
and features from that repository into the Airflow  Databricks 
provider. This PR also abstracts the common implementation
between the `DatabricksNotebookOperator` and 
`DatabricksTaskOperator` into a base abstract class 
`DatabricksTaskBaseOperator` so that we do not introduce 
duplicate code for the common set of implementation methods.
pankajkoti added a commit to astronomer/astro-provider-databricks that referenced this pull request Aug 8, 2024
As part of Astronomer's internal plans and decisions, we've decided to contribute the existing functionality provided by the operators and plugins in this repository to the official Apache Airflow Databricks provider. To achieve this, we submitted the following PRs to the Airflow provider:

1. apache/airflow#39178
2. apache/airflow#39771
3. apache/airflow#40013
4. apache/airflow#40724
5. apache/airflow#39295

All functionality has now been contributed to the Airflow Databricks provider, and ongoing support will be maintained there. As a result, we're deprecating the operators and plugins in this repository. Users are encouraged to transition to the official Apache Airflow Databricks provider as soon as possible. The migration process is straightforward—simply update the import path to point to the Airflow provider and ensure that you install `apache-airflow-providers-databricks>=6.8.0`, which includes all the contributions mentioned above.

closes: astronomer/issues-airflow#715
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants