Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2035] [Spike] Parsing: model ‘version’ attribute #6866

Closed
Tracked by #6747
MichelleArk opened this issue Feb 6, 2023 · 8 comments
Closed
Tracked by #6747

[CT-2035] [Spike] Parsing: model ‘version’ attribute #6866

MichelleArk opened this issue Feb 6, 2023 · 8 comments

Comments

@MichelleArk
Copy link
Contributor

MichelleArk commented Feb 6, 2023

Introduce an optional ‘versions’ attribute on model yaml entry. versions is a list of 'version' structures. Pseudocode:

ModelVersion:
    version: str
    implemented_by: Optional[str] = None # defaults to <model_name>.v<version>
    # all model properties (e.g. config, description, etc) 
    columns: Optional[List[Union[ColumnInfo, IncludeExclude]]]

versions will be specified within a single model yaml entry, but each version should maps to an individual node in the manifest with a unique_id: model.<project_name>.<model_name>.v<version>.

Example:

models:
  - name: dim_customers
    latest_version: 2
    config:
      materialized: table
    columns:
      - name: customer_id
        description: This is the primary key
        data_type: int

    versions: # new!
      - version: 2
        # implemented_by: dim_customers.v2 - optional, this is the default naming convention
      - version: 1
        # implemented_by: dim_customers.v1 - optional, this is the default naming convention
        config:
          alias: dim_customers
        columns:
          - include: * 
            exclude: customer_id
          - name: customer_id
            data_type: float  # this was previously a float in version 1 for some reason

Notes:

  • The model name is defined in the yaml file, and is globally unique but shared among versions of that model
  • Each version of this model is "implemented by" a model SQL/python file. The implemented_by property points to the name of a .sql (or .py) file that implements that model version. By convention, this file is <model_name>.v<version>.sql, e.g. dim_customers.v2.sql. However, users could set implemented_by to point to a different SQL file that doesn't meet the default naming convention. (These file names must be globally unique, even though they are not the actual model name.)
@github-actions github-actions bot changed the title Parsing: model ‘version’ attribute [CT-2035] Parsing: model ‘version’ attribute Feb 6, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 6, 2023

version should be a valid integer (for now)

#6736 (comment)

We should leave space for adding other versioning schemes in the future (namely SemVer). For now, that can be as simple as creating lightweight abstractions over the logic to:

@jtcohen6
Copy link
Contributor

jtcohen6 commented Mar 1, 2023

@MichelleArk assigning us for refinement :)

@emmyoop
Copy link
Member

emmyoop commented Mar 6, 2023

Are we going to have multiple models with the same name separated only by the version? What would the unique id look like in this case? It's a departure from how we're handling unique id now. Would it be handled similar to how we handle disabled models?

Do all versions of the same model get linked into the DAG?

@jtcohen6
Copy link
Contributor

Conversation during Language team estimation yesterday:

  • Scariest thing here is updating a model's unique_id between "parsing" (SQL file name, model.<project>.<implemented_by>) and "patching" (model.<project>.<name>.<vX>). That said, we agree that it's long-term better, and more correct for the interface, if the unique_id includes the canonical model name and the version number, versus being tied to the file name of the SQL implementation.
  • We should spike this to start, to understand where the rough edges might be.
  • We might want to add an abstraction over looking up nodes by unique_id, rather than just indexing a dictionary it should be a method with an understanding of versions.

@jtcohen6
Copy link
Contributor

Idea from @heysweet: Now that we're requiring users to tell us which model version is latest, should we hoist it up to the top-level? So instead of latest: true nested within versions (and validate that there's exactly one with that value), the user specifies latest: 2 up top.

@MichelleArk
Copy link
Contributor Author

MichelleArk commented Mar 16, 2023

I like it! Less validation necessary and it's easier on the eyes in terms of sifting through versions to find the latest one. I've updated the description.

@jtcohen6
Copy link
Contributor

I've proposed v: instead of name: as the uniquely identifying field for each version

    versions:
      - v: 2
      - v: 1
        config:
          alias: dim_customers
          ...
      - v: 3

Relevant comment in discussion:

@MichelleArk MichelleArk changed the title [CT-2035] Parsing: model ‘version’ attribute [CT-2035] [Spike] Parsing: model ‘version’ attribute Apr 3, 2023
@MichelleArk
Copy link
Contributor Author

MichelleArk commented Apr 5, 2023

closing in favor of implementation issue: #7263

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants