run_query works in run-operation mode but doesn't seem to work during dbt_project.yaml querying #1760

whisperstream · 2019-09-17T15:09:52Z

Describe the bug

I have a case where I'm trying to get the users roles from the database when dbt runs (in either compile/run mode). I have a macro called get_user_roles and have added the following to my dbt_project.yml

models:
   vars:
     USER_ROLES: {{ get_user_roles() }}

When I execute dbt run or dbt compile, dbt does connect to the database but it does not return any results from the run_query command in my macro. If I run the same macro in run-operation mode it works just fine.

Was trying to see if there was any documentation to say when run_query works or doesn't work but was unable to find it.

Steps To Reproduce

models:
   vars:
     MY_VAR: {{ run_query('select 1 as one') }}

The value of MY_VAR should be the result of the query but instead it's None

Expected behavior

Expected dbt to connect to the database and execute the query

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.14.2
   latest version: 0.14.2

The operating system you're using:
OSX
The output of python --version:
Python 3.7.3

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

drewbanin · 2019-09-17T18:19:12Z

Hey @whisperstream - this won't work for two reasons:

macros are not loaded into the dbt context when the dbt_project.yml file is being evaluated
dbt doesn't run queries during "parsing", which is when the dbt vars are evaluated

The first item here is subject to change, but the second one probably is not. Can you tell me more about why you're storing the results of this query in a dbt variable? Is it to cache the results so they can be used across multiple different models? If so, there might be alternative ways to accomplish this that we can explore!

whisperstream · 2019-09-17T19:42:50Z

Yes, am trying to cache the results, again all this goes back to solving for having different roles access different parts of the dbt model.

In order to rewrite the models correctly (as discussed here: #1695 (comment)) I'm trying to figure out which roles the user has. (because I need to know in my xref function). So was hoping I could get them as early as possible by storing it in a value in the dbt_projects.yml file.

whisperstream · 2019-09-17T19:43:23Z

Any other alternatives are greatly welcomed :)

drewbanin · 2019-09-18T02:09:29Z

Ok - thanks for that context! It's hard for me to remember which issues are correlated - I appreciate you referencing #1695 here!

I think that it would be totally reasonable for something like this to work in dbt:

models:
   vars:
     MY_VAR: {{ run_query('select 1 as one') }}

Unfortunately, there's a whole lot of bootstrapping that dbt needs to do at startup, and at the point where the models: block is evaluated, dbt doesn't yet know anything macros (including run_query()) or your database connection. While I hope that we can support something like this in the future, I'm unsure that we're going to be able to get there any time particularly soon.

A much more direct path to supporting this type of use case would be to provide an arbitrary key/value cache in the dbt context. I think we'd just need to expose a dictionary context variable, called run_cache, which could be manipulated and read by models. This might look like:

{% macro get_permissions() %}

    {% if run_cache.get('permissions') %}
        {{ log("cache hit!") }}
    {% else %}
        {{ log("Running query!") }}
        {% set permissions = run_query('select 1 as id') %}
        {% do run_cache.update({'permissions': permissions}) %}
    {% endif %}

    {% do return(run_cache.get('permissions')) %}

{% endmacro %}

So, the first time you call get_permissions, dbt will execute a query, populate the cache, and return the results. In subsequent calls to get_permissions, dbt will just return the value directly from the cache.

There are other ways to implement something like this too -- the run_query macro could accept a cache boolean argument which would cache the query results for the given SQL statement. If an identical SQL statement is made, then dbt could return the cached results instead of re-querying the database.

Both of these are interesting ideas, but there's one very big problem: they don't play well with parallelism. If you run 8 dbt models at in parallel, all of which call the get_permissions macro at the same time, then they will all end up querying the database and overwriting the cache 8 times over. That might be appropriate for certain use cases, but it's not a very good general solution to this problem.

So, this is all to say: the thing you want to do is reasonable, but we don't have a great answer for how to address it in dbt today. In general, the problem is that the state that dbt needs to operate on lives in your database, and not inside of a dbt resource. This is definitely reasonable (and almost necessary) for permissions, but it's not really a pattern that we've designed for explicitly.

Curious to hear what you think about all this!

Edit: maybe there's a way to cache the results in the on-run-start query? Then we can avoid the parallelism issue...

LVHermosa · 2021-12-16T12:57:14Z

I am trying to create a database clone before all models start. Models are supposed to use that clone as a target. I can not achieve this with "on-run-start" for the reason explained above. I am trying to do blue-green flow, where a clone is temporally created as stage database. (snowflake)
@drewbanin how could I make run a " CREATE DATABASE ...CLONE <target_model_database> " statement before all models start.

github-actions · 2022-11-12T02:12:31Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions · 2022-11-20T02:13:06Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

whisperstream added bug Something isn't working triage labels Sep 17, 2019

drewbanin added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors discussion and removed bug Something isn't working triage help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors labels Sep 17, 2019

jtcohen6 removed the discussion label Apr 19, 2022

github-actions bot added the stale Issues that have gone stale label Nov 12, 2022

github-actions bot closed this as completed Nov 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run_query works in run-operation mode but doesn't seem to work during dbt_project.yaml querying #1760

run_query works in run-operation mode but doesn't seem to work during dbt_project.yaml querying #1760

whisperstream commented Sep 17, 2019 •

edited

Loading

drewbanin commented Sep 17, 2019

whisperstream commented Sep 17, 2019

whisperstream commented Sep 17, 2019

drewbanin commented Sep 18, 2019 •

edited

Loading

LVHermosa commented Dec 16, 2021 •

edited

Loading

github-actions bot commented Nov 12, 2022

github-actions bot commented Nov 20, 2022

run_query works in run-operation mode but doesn't seem to work during dbt_project.yaml querying #1760

run_query works in run-operation mode but doesn't seem to work during dbt_project.yaml querying #1760

Comments

whisperstream commented Sep 17, 2019 • edited Loading

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

drewbanin commented Sep 17, 2019

whisperstream commented Sep 17, 2019

whisperstream commented Sep 17, 2019

drewbanin commented Sep 18, 2019 • edited Loading

LVHermosa commented Dec 16, 2021 • edited Loading

github-actions bot commented Nov 12, 2022

github-actions bot commented Nov 20, 2022

whisperstream commented Sep 17, 2019 •

edited

Loading

drewbanin commented Sep 18, 2019 •

edited

Loading

LVHermosa commented Dec 16, 2021 •

edited

Loading