-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: #756 - implement python workflow submissions #762
Draft: #756 - implement python workflow submissions #762
Conversation
… on job Signed-off-by: Kyle Valade <kylevalade@rivian.com>
Signed-off-by: Kyle Valade <kylevalade@rivian.com>
…be reset Signed-off-by: Kyle Valade <kylevalade@rivian.com>
@kdazzle can you rebase/target your PR against 1.9.latest? I have a couple of things that I need to wrap up, but I'm planning to take some version of this into the 1.9 release. |
@@ -247,97 +247,3 @@ def test_build_job_spec_with_post_hooks(self, mock_api_client): | |||
assert len(result["tasks"]) == 2 | |||
assert result["tasks"][1]["task_key"] == "task_b" | |||
assert result["tasks"][1]["new_cluster"]["spark_version"] == "14.3.x-scala2.12" | |||
|
|||
@patch("dbt.adapters.databricks.python_models.python_submissions.DatabricksApiClient") | |||
def test_build_job_spec_with_post_hooks_serverless_job_cluster(self, mock_api_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing these since the logic to muck around with the cluster settings in additional tasks was removed here
Going to merge in 1.9.latest changes (which is basically only 1.8 changes), ensure tests still pass, then merge. |
WIP - Stubs out implementation for #756
This pretty much implements what a workflow job submission type would look like, though I'm sure I'm missing something. Tests haven't been added yet.
Sample
Outside of the new submission type, models are the same. Here is what one could look like:
The config for a model could look like (forgive my jsonification...yaml data structures still freak me out):
Explanation
For all of the dbt configs that I added (in addition to the Databricks API attributes), I tried to roughly mediate between the dbt convention of requiring minimal configuration, but also allowing for the full flexibility of the Databricks API. Attribute names were trying to split the difference between the Databricks API and the dbt API. Happy to change the approach for anything.
existing_job_id
in case users want to reuse an existing workflow. If noname
is provided in this config, it will get renamed to the default job name (currentlyf"dbt__{self.database}-{self.schema}-{self.identifier}"
)existing_job_id
is also providedtask_a
- configurable inadditional_task_settings
new_cluster
orexisting_cluster_id
. Leaving blank is serverlesspost_hook
might be a misnomer, because you could technically set the dbt model to depend on one of these tasks, making it also a pre hookgrants
- allow for permissions to be set on the workflow so that additional users/teams can run the job ad hoc if needed (for initial runs/backfills, etc). The owner is the user/service principal that deployed, and the format needs to follow the Databricks API where you specify whether the user is a user, group, or SP.additional_task_settings
to add to/override the default dbt model taskTodo:
all_purpose_cluster
attribute, similar tojob_cluster_config
?Description
Checklist
CHANGELOG.md
and added information about my change to the "dbt-databricks next" section.