-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write logs to a temp dir when using dbt_ls
parsing mode
#411
Comments
Users could resolve this by setting the environment variable |
I just found the original bug report by Adam Underwood: The problem goes beyond the logs directory since the |
A solution to fix both issues if the user does not set DBT-specific environment variables can be to create a temporary directory from where we'll run This would assume that the user/process running cosmos has to write permissions to create/write temporary directories. |
Previously, LoadMode.DBT_LS used the original dbt project directory. This command outputs to DBT_LOG_PATH and DBT_TARGET_PATH, if defined, otherwise to the original directory. Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behaviour of to output to copy the original project directory into a temporary directory and run the command from there. Closes: #411
As of Cosmos 1.0.0, `LoadMode.DBT_LS` ran `dbt ls` from within the original dbt project directory. The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified. Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behavior of `dbt ls` to make a copy of the original project directory into a temporary directory and run the command `dbt ls` from there. Closes: #411
As of Cosmos 1.0.0, `LoadMode.DBT_LS` ran `dbt ls` from within the original dbt project directory. The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified. Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behavior of `dbt ls` to make a copy of the original project directory into a temporary directory and run the command `dbt ls` from there. Closes: #411
@jlaneve let's align here on the approach, given the feedback you gave in: A proposal:
Are you happy with this approach? |
yes, this sounds great - thank you! |
Some additional comments on this. I played a bit with Logs path
Target path
So, the one thing to remember with this approach is that if users use an older version of |
What we agreed is that we're going to attempt to run |
As of Cosmos 1.0.0, `LoadMode.DBT_LS` ran `dbt ls` from within the original dbt project directory. The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified. Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This PR changes the behavior of `dbt ls` to make a copy of the original project directory into a temporary directory and run the command `dbt ls` from there. Closes: #411
As of Cosmos 1.0.0, `LoadMode.DBT_LS` runs `dbt ls` from within the original dbt project directory. The `dbt ls` outputs files to the directory it's running from unless the environment variables `DBT_LOG_PATH` and `DBT_TARGET_PATH` are specified (as of dbt 1.6). Depending on the deployment, the Airflow worker does not have write permissions to the dbt project directory. This can lead to an error message similar to the following: ``` 20:43:06 Encountered an error: [Errno 30] Read-only file system: '/usr/local/airflow/dags/dbt_grindr/logs/dbt.log' ``` This PR changes the behavior of `dbt ls` to try to make the `dbt ls` artifacts (logs and target directory) not be written to the original project directory. In addition to the introduced test, this change was validated using airflow 2.6 and dbt 1.6, by following these steps: (1) Delete folders `logs` and `target` from `astronomer-cosmos/dev/dags/dbt/jaffle_shop` (2) Add a breakpoint after `stdout, stderr = process.communicate()` in `dbt/graph.py` (3) Run a DAG that uses `astronomer-cosmos/dev/dags/dbt/jaffle_shop`, e.g.: ``` airflow dags test basic_cosmos_dag `date -Iseconds` ``` (4) When the breakpoint happens, check that no `target` or `logs` folder was created after running `dbt ls` in `astronomer-cosmos/dev/dags/dbt/jaffle_shop` A limitation with the current approach is that, although `dbt ls` is not creating these directories in the given circumstances, if the user is using the local execution mode or an earlier version of `dbt`, the files will still, be written to the project directory. Closes: #411
After further debugging I was able to get it to work. Turns out the issue was with dbt packages the code need to run dbt deps before running |
@DanMawdsleyBA I did a hot fix for my work here: And config in dbt_project.yml: |
In Slack, a user was getting an issue where the
dbt ls
command for generating the manifest was trying to write logs to the same directory, which is typically read-only.We should instead have dbt write to a temp dir and paste the logs to stdout if there are any written to disk
The text was updated successfully, but these errors were encountered: