Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): add support for matview #123

Closed
wants to merge 1 commit into from

Conversation

israelg99
Copy link

Introduction

Supports a target_table (TO table) for Materialized View, or it'd use its own table, which is fully configurable: engine, order by, partition by, etc... defaults to MergeTree() order by tuple().

Usage

A model to define a target table:

{{ config(order_by='(updated_at, id, name)', engine='MergeTree()', materialized='incremental', unique_key='id', inserts_only=True) }}

with actor_summary_table as (
    select id,
        name,
        num_movies,
        avg_rank,
        genres,
        directors,
        updated_at
    from {{ ref('actor_summary_transform') }}
) select *
from actor_summary_table

A mode to define a materialized view:

{{ config(materialized='view', matview=True, target_table='actor_summary_table`) }}

with actor_summary_transform as (
    SELECT id,
        any(actor_name) as name,
        uniqExact(movie_id)    as num_movies,
        avg(rank)                as avg_rank,
        uniqExact(genre)         as genres,
        uniqExact(director_name) as directors,
        max(created_at) as updated_at
    FROM (
        SELECT {{ source('imdb', 'actors') }}.id as id,
            concat({{ source('imdb', 'actors') }}.first_name, ' ', {{ source('imdb', 'actors') }}.last_name) as actor_name,
            {{ source('imdb', 'movies') }}.id as movie_id,
            {{ source('imdb', 'movies') }}.rank as rank,
            genre,
            concat({{ source('imdb', 'directors') }}.first_name, ' ', {{ source('imdb', 'directors') }}.last_name) as director_name,
            created_at
    FROM {{ source('imdb', 'actors') }}
        JOIN {{ source('imdb', 'roles') }} ON {{ source('imdb', 'roles') }}.actor_id = {{ source('imdb', 'actors') }}.id
        LEFT OUTER JOIN {{ source('imdb', 'movies') }} ON {{ source('imdb', 'movies') }}.id = {{ source('imdb', 'roles') }}.movie_id
        LEFT OUTER JOIN {{ source('imdb', 'genres') }} ON {{ source('imdb', 'genres') }}.movie_id = {{ source('imdb', 'movies') }}.id
        LEFT OUTER JOIN {{ source('imdb', 'movie_directors') }} ON {{ source('imdb', 'movie_directors') }}.movie_id = {{ source('imdb', 'movies') }}.id
        LEFT OUTER JOIN {{ source('imdb', 'directors') }} ON {{ source('imdb', 'directors') }}.id = {{ source('imdb', 'movie_directors') }}.director_id
    )
    GROUP BY id
)
select *
from actor_summary_transform

@genzgd
Copy link
Contributor

genzgd commented Jan 1, 2023

Thanks for your PR! It would be extremely helpful if you could add a test for the new Materialized View functionality; if you don't have a chance to that I'll try to get to it later this week.

@israelg99
Copy link
Author

@genzgd sure would be great if you can add tests or I can try to get to it when I have time. I'm already using this feature on a local branch for a project and it works well.

@genzgd
Copy link
Contributor

genzgd commented Jan 14, 2023

Thanks -- at the moment the PR is failing a test and I haven't had the chance to dig into it. If you any thoughts as to why that failure is happening that would be helpful. In any case I'll try to get to it this coming week.

@israelg99
Copy link
Author

@genzgd The test fails with this error:

Table dbt_clickhouse_4727_test_basic_1672594971825.swappable already exists. (TABLE_ALREADY_EXISTS)

I don't think my PR should cause that failure. No new code is actually running unless matview=True:

{% if matview %}
  {{ clickhouse__create_matview_as(target_relation, sql) }}
{% else %}
  {{ get_create_view_as_sql(target_relation, sql) }}
{% endif %}

And tests aren't using matview so why should it affect tests?

@genzgd
Copy link
Contributor

genzgd commented Jan 14, 2023

I felt the same way :) but I haven't had a chance to figure out the problem. There are no code changes since tests last successfully ran against the main branch, but maybe something has changed our test framework. Have you tried running tests locally?

{% if matview %}
{{ clickhouse__create_matview_as(target_relation, sql) }}
{% else %}
{{ get_create_view_as_sql(target_relation, sql) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is what's breaking tests -- the original references backup_relation but the new line is target_relations

@timgl
Copy link

timgl commented May 30, 2023

Hey! Any updates on this PR?

1 similar comment
@nailele
Copy link

nailele commented Jun 14, 2023

Hey! Any updates on this PR?

@genzgd
Copy link
Contributor

genzgd commented Jun 14, 2023

The short answer is there are no updates. The broken test was not fixed and the ClickHouse team hasn't prioritized this functionality in dbt due to other work. There is also concern about how materialized views will work on replicated tables and they are almost certain to break dbt incremental materializations (DELETEs, for example, are not propagated to materialized views in ClickHouse), so official support would require testing around these use cases.

@genzgd
Copy link
Contributor

genzgd commented Nov 30, 2023

Closed in favor of #207

@genzgd genzgd closed this Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants