Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multi-steps Custom DBT Transformations #5590

Closed
ChristopheDuong opened this issue Aug 24, 2021 · 11 comments
Closed

Handle multi-steps Custom DBT Transformations #5590

ChristopheDuong opened this issue Aug 24, 2021 · 11 comments
Labels
area/connectors Connector related issues autoteam team/platform-move type/enhancement New feature or request

Comments

@ChristopheDuong
Copy link
Contributor

ChristopheDuong commented Aug 24, 2021

Tell us about the problem you're trying to solve

One typical use case for multi-steps custom DBT transformations is when using custom dbt transformations that require additional dbt packages, the user can create two (or more) custom transformations where the first step install deps while the following ones actually runs the transformations (using those dependency packages). However, this usually fails where dbt run transformation is showing dependencies are not installed.

(note that custom transformations are not currently working in Kube deployments, see #5091 and multi-steps between multiple pods is even more difficult)

Describe the solution you’d like

Allow the user in the UI to specify a bash script to execute in a single operation instead of a unique dbt command.
The user would therefore be able to configure a sequence of dbt commands (multi-steps) within the same operation run instead of splitting them over multiple operations

Describe the alternative you’ve considered or used

A current trick (when running through docker, not kube) is to make sure to specify the following variable in the dbt_project.yml file of the user's custom step:
modules-path: "../dbt_modules"
This will make sure that the first dbt deps step is able to persist the cloned package in the workspace folder of the sync (outside of the git_repo folder) which can therefore be accessible by a second dbt run custom transformation step.

Additional context

When building normalization docker image:

As a result, the dbt deps command will perform a git clone of the package (once) when building the docker image as part of Airbyte CI process when releasing a new docker image for normalization.

If the user were to re-use the generated normalization project by exporting it and include it back as a custom step of the sync. This becomes confusing because the /tmp/dbt_modules directory is not persisted between two custom dbt transformations steps.

The solution in this scenario is the user need to tweak the exported project by modifying the dbt_project.yml file and edit to reflect the following change:modules-path: "../dbt_modules".
This will make sure that the first dbt deps step is able to persist the cloned package in the workspace folder of the sync (outside of the git_repo folder) which can therefore be accessible by a second dbt run custom transformation step.

(note that this change can't be done in the airbyte repository or it will require to run dbt deps everytime we do a sync running normalization to download the package into the sync workspace, instead of doing it only once at "compile" time of the docker image)
Documentation should probably be updated to reflect these in #4351

@ChristopheDuong ChristopheDuong added the type/enhancement New feature or request label Aug 24, 2021
@ChristopheDuong ChristopheDuong changed the title Improve how to handle where dbt deps install dbt modules Handle multi-steps Custom DBT Transformations Aug 24, 2021
@ChristopheDuong
Copy link
Contributor Author

Btw @zestyping, I think you mentioned something on custom dbt transformations to john recently, this issue might be of interest to you as an FYI?

@jd-sanders
Copy link

We have the same use case for this issue: Our models have a dependency on the dbt package dbt_utils, and in Airbyte's execution, the state created by -dbt deps doesn't persist to the next run command.

I've used the workaround as described (compiling the package locally and then persisting the dbt_modules contents into github), but it's fairly awkward and doesn't work well in several scenarios (if you aren't on Docker, if you aren't using the same version of dbt as Airbyte, etc.).

Our desired outcome (however it is achieved) is to be able to easily use packages and dependencies that require the -dbt deps installation step.

I work with cc: @zestyping

@ChristopheDuong
Copy link
Contributor Author

Adding a bash script input to your custom operation should leave you the freedom to run multiple dbt commands for the same operation and sequence a dbt deps with a dbt run

in the meantime, you can follow that trick or let us know on slack if we can help you get that working.

@sherifnada sherifnada added the area/connectors Connector related issues label Nov 15, 2021
@philippeboyd
Copy link
Contributor

Any news on this? If I understand correctly, there's no way of running a DBT project with dependencies which makes Custom DBT Transformations almost useless.

According to dbt-labs/dbt-core#4784 and it's comment, even if we create our own Docker image with dbt deps inside, dependencies won't be persisted.

@ChristopheDuong
Copy link
Contributor Author

Any news on this? If I understand correctly, there's no way of running a DBT project with dependencies which makes Custom DBT Transformations almost useless.

Does the trick with modules-path: "../dbt_modules" not work for you?

@philippeboyd
Copy link
Contributor

philippeboyd commented Mar 28, 2022

@ChristopheDuong it works but it means that locally we have to work with a dbt_packages/dbt_modules folder at an upper folder level which is not the cleanest solution.

What if we use other tools using the same git repo that are not compatible with this workaround?

Side note: in DBT >= 1.0.0 the default config has changed from modules-path to packages-install-path with a new default value of dbt_packages.

https://docs.getdbt.com/reference/project-configs/packages-install-path

@apostoltego
Copy link
Contributor

Hey 👋 just wanted to add here as we've also recently run across this particular issue since updating to dbt v1.0.
We have a monorepo structure and the prior approach of ../dbt_modules no longer works with dbt cloud on jobs using that version. Hence we had to commit dbt_utils to the packages we're using in order for the existing transformations to work.

@machariamuguku
Copy link

FYI, for the latest airbyte version (and with dbt >= v1.0.0), override this instead packages-install-path: ../dbt. Working with airbyte v0.40.7 and dbt v1.0.0

@MiguelMadero
Copy link
Contributor

This is very annoying.
The workaround works, but annoying.

@cgardens
Copy link
Contributor

cgardens commented Feb 9, 2024

We want to stay out of the business of orchestrating complex dbt workflows. We recommend using airflow or dagster to do this. Docs

@cgardens cgardens closed this as completed Feb 9, 2024
@brettallred
Copy link

brettallred commented Mar 19, 2024

Is this still the suggested workaround?

I've generated and exported the dbt files.
Changed the dbt_project.yml to have the packages-install-path: '../dbt_packages'

And I get this error:

2024-03-19 02:55:20 INFO i.a.c.i.LineGobbler(voidCall):149 - ----- START DBT TRANSFORMATION -----
2024-03-19 02:55:20 INFO i.a.c.i.LineGobbler(voidCall):149 - 
2024-03-19 02:55:20 INFO i.a.c.i.LineGobbler(voidCall):149 - Checking if airbyte/custom-transformation-prep:1.0 exists...
2024-03-19 02:55:20 INFO i.a.c.i.LineGobbler(voidCall):149 - airbyte/custom-transformation-prep:1.0 was found locally.
2024-03-19 02:55:20 INFO i.a.w.p.DockerProcessFactory(create):140 - Creating docker container = custom-transformation-prep-custom-1142-0-tmati with resources io.airbyte.config.ResourceRequirements@76374c7d[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=8g,additionalProperties={}] and allowedHosts null
2024-03-19 02:55:20 INFO i.a.w.p.DockerProcessFactory(create):187 - Preparing command: docker run --rm --init -i -w /data/1142/0/transform --log-driver none --name custom-transformation-prep-custom-1142-0-tmati --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_VERSION=0.50.34 --memory=8g airbyte/custom-transformation-prep:1.0 configure-dbt --integration-type postgres --config destination_config.json --git-repo https://brettallred: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@github.com/mozr-data/r360_airbyte_dbt.git
2024-03-19 02:55:20 dbt > WARNING: The requested image's platform (linux/arm64) does not match the detected host platform (linux/amd64/v4) and no specific platform was requested
2024-03-19 02:55:20 dbt > Running: git clone --depth 5 --single-branch  $GIT_REPO git_repo
2024-03-19 02:55:20 dbt > Cloning into 'git_repo'...
2024-03-19 02:55:21 dbt > Last 5 commits in git_repo:
2024-03-19 02:55:21 dbt > 5c09f02 add path
2024-03-19 02:55:21 dbt > bc6c107 change path
2024-03-19 02:55:21 dbt > fe8bc85 remove modules
2024-03-19 02:55:21 dbt > 3c9398b try modules-path
2024-03-19 02:55:21 dbt > 43086df remove
2024-03-19 02:55:21 dbt > /data/1142/0/transform
2024-03-19 02:55:21 dbt > Running: transform-config --config destination_config.json --integration-type postgres --out /data/1142/0/transform
2024-03-19 02:55:22 dbt > Namespace(config='destination_config.json', integration_type=<destinationtype.postgres:>, out='/data/1142/0/transform')
2024-03-19 02:55:22 dbt > transform_postgres
2024-03-19 02:55:22 INFO i.a.c.i.LineGobbler(voidCall):149 - Checking if fishtownanalytics/dbt:1.0.0 exists...
2024-03-19 02:55:22 INFO i.a.c.i.LineGobbler(voidCall):149 - fishtownanalytics/dbt:1.0.0 was found locally.
2024-03-19 02:55:22 INFO i.a.w.p.DockerProcessFactory(create):140 - Creating docker container = dbt-custom-1142-0-zsbhi with resources io.airbyte.config.ResourceRequirements@76374c7d[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=8g,additionalProperties={}] and allowedHosts null
2024-03-19 02:55:22 INFO i.a.w.p.DockerProcessFactory(create):187 - Preparing command: docker run --rm --init -i -w /data/1142/0/transform --log-driver none --name dbt-custom-1142-0-zsbhi --network host -v airbyte_workspace:/data -v /tmp/airbyte_local:/local -e DEPLOYMENT_MODE=OSS -e AIRBYTE_ROLE= -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_VERSION=0.50.34 --entrypoint /bin/bash --memory=8g fishtownanalytics/dbt:1.0.0 entrypoint.sh run
2024-03-19 02:55:22 dbt > Running from /data/1142/0/transform/git_repo
2024-03-19 02:55:22 dbt > detected no config file for ssh, assuming ssh is off.
2024-03-19 02:55:22 dbt > Running: dbt run --profiles-dir=/data/1142/0/transform --project-dir=/data/1142/0/transform/git_repo
2024-03-19 02:55:26 dbt > 02:55:26  Running with dbt=1.0.0
2024-03-19 02:55:26 dbt > 02:55:26  Encountered an error:
2024-03-19 02:55:26 dbt > Compilation Error
2024-03-19 02:55:26 dbt >   dbt found 1 package(s) specified in packages.yml, but only 0 package(s) installed in ../dbt_packages. Run "dbt deps" to install package dependencies.
2024-03-19 02:55:27 INFO i.a.w.t.s.a.AppendToAttemptLogActivityImpl(log):56 - Retry State: RetryManager(completeFailureBackoffPolicy=BackoffPolicy(minInterval=PT10S, maxInterval=PT30M, base=3), partialFailureBackoffPolicy=null, successiveCompleteFailureLimit=5, totalCompleteFailureLimit=10, successivePartialFailureLimit=1000, totalPartialFailureLimit=10, successiveCompleteFailures=1, totalCompleteFailures=1, successivePartialFailures=0, totalPartialFailures=0)
 Backoff before next attempt: 10 seconds

It doesn't appear that dbt deps never gets run. I'm changing the dbt_project.yml in the exported project and not in any default files in airbyte.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues autoteam team/platform-move type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants