Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2998] [SPIKE] Handle unit testing of JSON and ARRAY data types #8423

Closed
Tracked by #8283
gshank opened this issue Aug 16, 2023 · 6 comments
Closed
Tracked by #8283

[CT-2998] [SPIKE] Handle unit testing of JSON and ARRAY data types #8423

gshank opened this issue Aug 16, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request Impact: Adapters
Milestone

Comments

@gshank
Copy link
Contributor

gshank commented Aug 16, 2023

Description

We should enable adding unit test mock inputs and outputs that contain fields of type JSON or type ARRAY.

Example using in-line yaml dict format:

unit-tests:
  - name: test_json_unpack # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
    given: # optional: list of inputs to provide as fixtures
      - input: ref('employees')
        rows:
         - {id: 1, empl_info: {"employee":{"name":"John", "age":30, "city":"New York"}}}
         - {id: 2, empl_info: {"employee":{"name":"Doug", "age":97, "city":"Flagstaff"}}}
      - input: ref('office_locations')
        rows:
         - {id: 1, offices: [1,3]}
         - {id: 2: offices: [2,4]}
    expect: # required: the expected output given the inputs above
      rows:
      - {employee_id: 1, name: John, age: 30, city: New York, offices: [1,3]}
      - {employee_id: 2, name: Doug, age: 97, city: Flagstaff, offices: [2,4]}
    ...

Example using csv format:

unit-tests:
  - name: test_json_unpack # this is the unique name of the test
    model: my_model # name of the model I'm unit testing
    given: # optional: list of inputs to provide as fixtures
      - input: ref('employees')
        format: csv
        rows:  |
        id, empl_info
        1,{"employee":{"name":"John", "age":30, "city":"New York"}}
        2,{"employee":{"name":"Doug", "age":97, "city":"Flagstaff"}}
      - input: ref('office_locations')
        format: csv 
        rows: |
        id,offices
        1,[1,3]
        2,[2,4]
    expect: # required: the expected output given the inputs above
      format: csv
      rows: |
      employee_id,name,age,city,offices
      1,John,30,New York,[1,3]
      2,Doug,97,Flagstaff,[2,4]
    ...

Note: we can assume we already know the data types and columns of the inputs.

Acceptance criteria

FOR SPIKE:

  • Understand the technical complexities, edge cases, and how we would go about testing this
  • Draft implementation

FOR IMPLEMENTATION:

  • You can specify inputs or expected outputs for unit tests with columns of JSON data type (in at least one of the available formats, ideally all)
  • You can specify inputs or expected outputs for unit tests with columns of ARRAY data type (in at least one of the available formats, ideally all)
  • The error message for when a unit test fails should still be readable when using JSON data types.
  • The error message for when a unit test fails should still be readable when using ARRAY data types.
  • Bonus if it works for STRUCT to (lol, if not it's chill we'll open another issue)

Impact to other teams

Impact adapter teams

Will backports be required?

No

Context

@github-actions github-actions bot changed the title Handle unit testing of complex data types [CT-2998] Handle unit testing of complex data types Aug 16, 2023
@graciegoheen
Copy link
Contributor

graciegoheen commented Sep 18, 2023

From refinement:

  • we don't allow complex data types for seeds
  • we should enumerate what all the "complex" types are and what all the tests are that need to be added in acceptance criteria
  • design could be similar to contracts, so if they provide us something that can be cast-ed to a user-inputed data type
  • we already have an array_construct macro

Example for testing constraints - https://github.com/dbt-labs/dbt-core/blob/main/tests/adapter/dbt/tests/adapter/constraints/test_constraints.py#L70-L73

@graciegoheen
Copy link
Contributor

Relevant to #8499

@graciegoheen
Copy link
Contributor

graciegoheen commented Oct 6, 2023

Example for why someone might want to unit test a model that has inputs with JSON data type:

  • JSON data type in snowflake for testing JSON unpacking logic

@graciegoheen graciegoheen changed the title [CT-2998] Handle unit testing of complex data types [CT-2998] Handle unit testing of initial complex data types Oct 12, 2023
@graciegoheen graciegoheen changed the title [CT-2998] Handle unit testing of initial complex data types [CT-2998] Handle unit testing of JSON and ARRAY data types Oct 12, 2023
@graciegoheen graciegoheen added the enhancement New feature or request label Oct 12, 2023
@aranke
Copy link
Member

aranke commented Oct 31, 2023

@aranke to document edge cases (will update this ticket)

@martynydbt martynydbt changed the title [CT-2998] Handle unit testing of JSON and ARRAY data types [CT-2998] [SPIKE] Handle unit testing of JSON and ARRAY data types Oct 31, 2023
@graciegoheen
Copy link
Contributor

Reason this is higher priority:

  • you can't currently define a unit test on a model that depends on an input with 1 or more complex data type (even if that column isn't relevant to the unit test / not defined in the mock data)

This is in addition to not being able to use input data with complex data type (the initial reason we opened the issue)

@martynydbt martynydbt assigned gshank and emmyoop and unassigned gshank and emmyoop Jan 31, 2024
@dbeatty10 dbeatty10 mentioned this issue Jan 31, 2024
3 tasks
@martynydbt martynydbt added this to the v1.8 milestone Feb 12, 2024
@MichelleArk
Copy link
Contributor

MichelleArk commented Feb 28, 2024

Spike Report / Update

Current State

dbt-snowflake, dbt-bigquery, and dbt-spark support mocking inputs with complex types in unit testing, including json and array types. tests here:

dbt-postgres and dbt-redshift support json, but not arrays.

These implementations largely required very minor & precise changes to safe_cast so that it handled more input value types and conformed them as appropriate before attempting to cast to the desired type.

However, dbt-postgres and dbt-redshift will require a different approach to support complex types. This is because the strategy for obtaining the column schemas in unit testing is adapter.get_columns_in_relation. adapter.get_columns_in_relation works flawlessly for the 3 adapters above, but is lossy for dbt-postgres and dbt-redshift. For example, array types are simply 'ARRAY' and don't include the type of array which is necessary for safe casting to an appropriate type.

An alternative strategy we can use for these adapters is adapter.get_column_schema_from_query, which was implemented for model contracts and spiked for unit testing here. However, this strategy makes support for complex types for the other adapters unreasonably complex (e.g. bigquery struct type would need to be reconstructed into its 'sql' type from the obtained column schema).

So: we'll need to support both mechanisms of retrieving the column schema, as dictated by a specific adapter implementation because one strategy will work for many adapters but not others, and vice versa.

Proposal

  1. At the dbt-adapters level, in the unit materialization: support both mechanisms for obtaining the column schema based on an adapter-dispatch macro unit_testing_column_schema_strategy, that defaults to relation.
  2. Implement unit_testing_column_schema_strategy to return empty_query in dbt-postgres and dbt-redshift
  • new strategy uses: get_column_schema_from_query
  • Additional safe_cast or type mapping modifications may be necessary in either adapter to get this entirely working based on the new column schema strategy
  1. Extend functional tests to include test cases for complex types in unit testing in dbt-postgres and dbt-redshift. Existing tests are in place for primative types:

New Issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Impact: Adapters
Projects
None yet
Development

No branches or pull requests

6 participants