Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard table materialization does not work with ReplicateMergeTree engine #95

Closed
mharrisb1 opened this issue Sep 8, 2022 · 6 comments

Comments

@mharrisb1
Copy link

Not sure if this is related to #14 since I didn't explore the attached PR, but I wanted to call out this problem specifically.

It is my understanding that the way dbt-clickhouse handles table materializations is to:

  1. Rename table (if exists) to {{ this }}__dbt_tmp or something similar (I don't have the logs in front of me)
  2. Create a new table with name {{ this }}
  3. Drop temp table

With the Replication tables, this results in an error because there already exists a table with the replica path of /clickhouse/tables/{shard}/{database}/{table_name}.

I am currently getting around this using an incremental materialization with a config like:

{{ config(
    materialized="incremental",
    engine="ReplicatedMergeTree('/clickhouse/tables/{database}/{table}', '{replica}')",
    partition_by=[...],
    order_by=[...],
    unique_key=[...],
    pre_hook="truncate table if exists {{ this }} on cluster '...'",
    inserts_only=true
) }}

I'm not sure how we could handle a simple table materialization using the current rename-create-drop strategy since I don't believe you can update the replica path after creation and I don't believe there is a random value substitution provided which could potentially allow a replica path like /clickhouse/tables/{shard}/{database}/{table_name}__{random_uuid}

@genzgd
Copy link
Contributor

genzgd commented Sep 8, 2022

Some of this may be solved by using the Replicated database engine. See ClickHouse/ClickHouse#31661. I'll be doing some testing with Replicated databases over the next couple of weeks.

@genzgd
Copy link
Contributor

genzgd commented Sep 19, 2022

Tests work with with Replicated database engine so explicit support for ReplicatedMergeTree engines will probably not be a priority.

@mharrisb1
Copy link
Author

For anyone that stumbles across this issue and is still using ReplicatedMergeTree engine like us, here is a better workaround:

Project config

# dbt_project.yml

vars:
  # Creates a unique ID for this dbt run using the run's timestamp
  run_id: "{{ run_started_at.strftime('%Y%m%d%H%M%S%f') }}"

  # Use this engine template for all models materialized as `table` that use `ReplicatedMergeTree` engine
  # Zookeeper path will use default layout but the table name is suffixed with the run's ID
  engine_template: >
    ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}_{{ var("run_id") }}', '{replica}')

Model config

-- my_table.sql

{{ config(
    materialized="table",
    engine=var("engine_template"),
    partition_by=[...],
    order_by=[...],
    unique_key=[...]
) }}

SELECT ...
FROM ...

@tema-popov
Copy link

@mharrisb1 Is run_id scheme somehow differs from using standard {uuid} clickhouse have?
i.e.
engine = "ReplicatedReplacingMergeTree('/clickhouse/tables/{layer}-{shard}/{database}/{table}/{uuid}', '{replica}')"

@mharrisb1
Copy link
Author

@tema-popov I wasn't aware of that template. That would be a much better solution than mine. I will test it out and update this issue if it works. Thanks!

@genzgd
Copy link
Contributor

genzgd commented Jan 8, 2023

@mharrisb1 Do you know if this is fixed by the latest update + the {uuid} macro?

@genzgd genzgd closed this as completed Nov 29, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants