Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt relationships test incorrectly work with ClickHouse #19

Closed
artamoshin opened this issue Oct 2, 2021 · 1 comment
Closed

dbt relationships test incorrectly work with ClickHouse #19

artamoshin opened this issue Oct 2, 2021 · 1 comment

Comments

@artamoshin
Copy link
Contributor

A dbt generic test relationships doesn't work with ClickHouse DBMS.
For example, let's see the model:

schema.yml:

version: 2

models:
  - name: customers
    columns:
      - name: id
      - name: name

  - name: orders
    columns:
      - name: id
      - name: good
      - name: customer_id
        tests:
          - relationships:
              to: ref('customers')
              field: id

customers.sql:

SELECT 1 AS id, 'Alice' AS name
UNION ALL
SELECT 2, 'Bob'

orders.sql:

SELECT 101 AS id, 'TV' AS good, 1 AS customer_id
UNION ALL
SELECT 102, 'Computer', 2
UNION ALL
SELECT 103, 'Smartphone', 3

There is an incorrect foreign key customer_id=3, but dbt test passes that integrity test.
Cause of that is ClickHouse default join behavior:

SELECT *
FROM orders
LEFT OUTER JOIN customers ON orders.customer_id = customers.id

┌──id─┬─good───────┬─customer_id─┬─customers.id─┬─name──┐
│ 101 │ TV         │           11 │ Alice │
│ 102 │ Computer   │           22 │ Bob   │
│ 103 │ Smartphone │           30 │       │
└─────┴────────────┴─────────────┴──────────────┴───────┘

There is empty cells are filled with the default value, but not NULLs.
dbt macros relationships do same query and expects NULL cells.

To set NULL for this cells set join_use_nulls = 1:

┌──id─┬─good───────┬─customer_id─┬─customers.id─┬─name──┐
│ 101 │ TV         │           11 │ Alice │
│ 102 │ Computer   │           22 │ Bob   │
│ 103 │ Smartphone │           3 │         ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ  │
└─────┴────────────┴─────────────┴──────────────┴───────┘

With join_use_nulls = 1 generic test relationships work as expected:

Failure in test relationships_orders_customer_id__id__ref_customers_ (models\schema.yml)
  Got 1 result, configured to fail if != 0

ClickHouse allow to set settings by different ways:

  • in configuration file;
  • for session;
  • inplace with SELECT query.

My proposal: add a dispatched macro test_relationships in the dbt-clickhouse package, like this:

{% macro clickhouse__test_relationships(model, column_name, to, field) %}

    {{ default__test_relationships(model, column_name, to, field) }}
    settings join_use_nulls = 1

{% endmacro %}

Another way is set join_use_nulls for session, but it will affect other queries.
Or, at least this behavior should be documented with README.

@silentsokolov
Copy link
Collaborator

I think, we could override the default test_relationships test and either use settings right in the SELECT query or write SELECT with expected effect.

The session and the configuration could confuse a user.

artamoshin added a commit to artamoshin/dbt-clickhouse that referenced this issue Oct 3, 2021
artamoshin added a commit to artamoshin/dbt-clickhouse that referenced this issue Oct 3, 2021
silentsokolov added a commit that referenced this issue Oct 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants