Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2635] [Feature] Node selection on versioned models should work with underscore-delimiting #7639

Closed
3 tasks done
Tracked by #7372
joellabes opened this issue May 16, 2023 · 7 comments · Fixed by #7995
Closed
3 tasks done
Tracked by #7372
Assignees
Labels
enhancement New feature or request jira model_versions node selection Functionality and syntax for selecting DAG nodes

Comments

@joellabes
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Triggered by dbt-labs/docs.getdbt.com#3316

I can select the model with the dot, but I can't with the underscore:

$ dbt --quiet ls -s dim_customers.v1
my_dbt_project.dim_customers.v1
$ dbt --quiet ls -s dim_customers_v1
... nothing ...

Should we just fix that, and support selecting versioned models as <model_name>_v?

Describe alternatives you've considered

Staying strong with the dots, which are technically more correct but aren't something most people interact with unless they poke around in dbt's artifacts

Who will this benefit?

No response

Are you interested in contributing this feature?

No response

Anything else?

No response

@joellabes joellabes added enhancement New feature or request triage labels May 16, 2023
@dbeatty10
Copy link
Contributor

Thanks for raising this @joellabes!

Let's assume the user has a models directory with files for versions 3, 1, and 2, respectively):

models/
├── dim_customers.sql
├── dim_customers_v1.sql
├── dim_customers_v2.sql

It seems like the issue you are trying to raise is that using the file name (without the extension) won't select the anticipated node?

i.e., folks are used to being able to do this:

dbt ls -s dim_customers_v1

But currently they would need to add a .sql suffix to select it:

dbt ls -s dim_customers_v1.sql

@joellabes
Copy link
Contributor Author

That’s correct - although I hadn't thought of filename selection, and assumed they would only be able to do dbt ls -s dim_customers.v1 per jerco's example.

The bones of my argument are that selection has always looked like you're using the file name, even if that isn't technically true, and so it is an (IMO unnecessary + unpleasant) surprise to discover that it doesn't work like that in this case

@dbeatty10
Copy link
Contributor

@joellabes Agreed that it could be a little jarring to typically do dbt ls -s dim_customers and all the sudden dbt ls -s dim_customers_v1 doesn't "work the same way" once this model is converted to being a versioned model.

I'd think that using the fqn selector method (like dim_customers.v2) would still be preferable for code examples though.

Potential solution

The default selection methods are path, file, and fqn, and the current file selector method implementation uses Path.name.

I didn't test it, but I'm guessing that if two more lines* are added that use Path.stem, then it might behave the way you describe:

            if fnmatch(Path(real_node.original_file_path).name, selector):
                yield node
            if fnmatch(Path(real_node.original_file_path).stem, selector):
                yield node

* Note: there's multiple ways to express logic similar to the above.

Flip side

Suppose the user has a models directory with files for versions 1, 2, and 3, respectively):

models/
├── dim_customers_v1.sql
├── dim_customers_v2.sql
├── dim_customers.sql

The flip side is that it might be surprising to those same people that dbt ls -s dim_customers_v3 won't select any nodes given this scenario. It's logical that they wouldn't be selected, but you could see how easy it would be to confuse alias name with file name.

Refinement

@jtcohen6 how would you feel about us enabling all of these:

Already working:

dbt ls -s dim_customers  # fqn
dbt ls -s dim_customers.v1  # fqn
dbt ls -s dim_customers.v2  # fqn
dbt ls -s dim_customers.v3  # fqn
dbt ls -s dim_customers_v1.sql  # file
dbt ls -s dim_customers_v2.sql  # file
dbt ls -s dim_customers.sql  # file

and newly proposed:

dbt ls -s dim_customers_v1  # file
dbt ls -s dim_customers_v2  # file

but not enable this one?

dbt ls -s dim_customers_v3 # alias

I'd feel comfortable with the above, but want to double-check with you.

@dbeatty10 dbeatty10 added Refinement Maintainer input needed and removed triage labels May 16, 2023
@joellabes
Copy link
Contributor Author

joellabes commented May 17, 2023

I'd think that using the fqn selector method (like dim_customers.v2) would still be preferable for code examples though.

Yeah this could be a place where it's time to teach people about fqns in earnest, which would make it a documentation issue

I like your workaround as long as this sort of thing is already prevented:

versions: 
  - v: 1
    implemented_in: dim_customers_v2.sql #woke up and chose violence

With two models in different directories
models/customers/dim_customers_v2.sql - creates analytics.dim_customers_v1
models/another_directory/dim_customers_v2.sql - creates analytics.dim_customers_v2

There's now two file names that are dim_customers_v2, but they create different node names so I don't think anything would care up to now? I haven't tested this

@dbeatty10
Copy link
Contributor

@joellabes I think we would be okay with allowing this rather than doing extra work to prevent it:

models:
  - name: dim_customers
    latest_version: 3
    versions:
      - v: 3
      - v: 2
        defined_in: dim_customers_v1  # no .sql extension here
      - v: 1
        defined_in: dim_customers_v2  # `defined_in` rather than `implemented_in`

And I think we'd want the output to look like this (which it does currently):

$ dbt ls -s dim_customers_v1
16:06:13  Running with dbt=1.5.0
16:06:14  Found 5 models, 0 tests, 0 snapshots, 0 analyses, 313 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
my_project.dim_customers.v2

You can actually try this out right now by adopting the YAML above and then doing the following:

dbt ls -s dim_customers_v1.sql

@jtcohen6
Copy link
Contributor

@dbeatty10 I buy what you're proposing above! Let's add it to the multi-project epic.

@jtcohen6 jtcohen6 added node selection Functionality and syntax for selecting DAG nodes model_versions and removed Refinement Maintainer input needed labels May 23, 2023
@jtcohen6 jtcohen6 removed their assignment May 23, 2023
@dbeatty10
Copy link
Contributor

Acceptance criteria

Suppose the user has a models directory with files for versions 1, 2, and 3, respectively):

models/
├── dim_customers_v1.sql
├── dim_customers_v2.sql
├── dim_customers.sql

Then each of these should select one node each:

dbt ls -s dim_customers_v1  # file
dbt ls -s dim_customers_v2  # file

But this should not select any nodes:

dbt ls -s dim_customers_v3 # alias

Other resources

@dbeatty10 dbeatty10 removed the triage label May 23, 2023
@gshank gshank self-assigned this Jun 5, 2023
@gshank gshank added the jira label Jun 5, 2023
@github-actions github-actions bot changed the title [Feature] Node selection on versioned models should work with underscore-delimiting [CT-2635] [Feature] Node selection on versioned models should work with underscore-delimiting Jun 5, 2023
@martynydbt martynydbt assigned MichelleArk and unassigned gshank Jun 28, 2023
@jtcohen6 jtcohen6 modified the milestone: v1.5.x Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request jira model_versions node selection Functionality and syntax for selecting DAG nodes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants