Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-589] [Bug] dbt deps does not precisely match the regex for SemVer #5201

Closed
1 task done
dbeatty10 opened this issue Apr 30, 2022 · 6 comments · Fixed by #5370
Closed
1 task done

[CT-589] [Bug] dbt deps does not precisely match the regex for SemVer #5201

dbeatty10 opened this issue Apr 30, 2022 · 6 comments · Fixed by #5370
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors!

Comments

@dbeatty10
Copy link
Contributor

dbeatty10 commented Apr 30, 2022

Is there an existing issue for this?

  • I have searched the existing issues

TL;DR

There's probably not a very impactful issue here -- opening up this issue at risk of being very nit picky.

Current Behavior

This regular expression is used within dbt deps to determine valid installable versions. It might also be utilized during dbt --version to determine if the installed adapters are up-to-date with the latest versions available.

If I did the string substitution correctly, it produces these results for a range of input strings:
https://regex101.com/r/YyaxyD/1

Expected Behavior

There is a suggested regular expression (RegEx) to check a SemVer string.

It produces these results:
https://regex101.com/r/Ly7O1x/3/

Steps To Reproduce

  1. Compile the regex here (manually or via Python)
    • ^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+(?P<buildmetadata>[0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
  2. Drop it into regex101.com
  3. Re-use the test cases from here
  4. Browse through the output of valid semantic versions (for false negatives) and invalid semantic versions (for false positives)

Relevant log output

image

image

Additional Context

Summary

Why it's not that impactful

There's two cases of errors (false negatives and false positives). The former is more impactful to users; there are very few known theoretical cases (and zero known actual cases in the wild).

False negative - a valid semantic version doesn't match the regex

This is the case that would impact users the most.
There are only a couple of these, and it's easy to change to one that is recognized:

  • 1.0.0-alpha.0valid
  • 1.0.0-0A.is.legal

False positive - an invalid semantic version does matches the regex

There are quite a few examples of these, but maybe many/most wouldn't be seen in the wild. If so, maybe "no harm, no foul?"

See "Current Behavior" section for examples of invalid semantic versions.

Disclosure: I regularly flip-flop my false negatives with my false positives 😅 So sorry if I got it backwards.

What is impactful though...

An example error message is:

Could not parse version "1.0.3.post1"

One user got this message and didn't know what action they should take. The error message should be more clear about what the user should do to fix the problem.

@dbeatty10 dbeatty10 added bug Something isn't working triage labels Apr 30, 2022
@github-actions github-actions bot changed the title [Bug] <title> [CT-589] [Bug] <title> Apr 30, 2022
@dbeatty10 dbeatty10 changed the title [CT-589] [Bug] <title> [CT-589] [Bug] dbt deps does not precisely match the regex for SemVer Apr 30, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented May 2, 2022

@dbeatty10 Thanks for opening! You're right, that dbt.semver._VERSION_REGEX is used for both dbt deps version resolution and for dbt --version version reporting on adapter plugins.

I'd be very supportive of taking two steps here:

  1. Updating our regex to use one of the official/recommended semver regexes (regices?)
  2. Updating this error message to be clearer. Rather than Could not parse version "{}", something more like "{}" is not a valid semantic version. Does that sound right to you?
    raise dbt.exceptions.SemverException(
    'Could not parse version "{}"'.format(version_string)
    )

Going to mark this a good first issue!

@jtcohen6 jtcohen6 added good_first_issue Straightforward + self-contained changes, good for new contributors! Team:Language and removed triage labels May 2, 2022
@dbeatty10
Copy link
Contributor Author

Yes, "{}" is not a valid semantic version. would be a nice upgrade in clarity! 🏆

💡 As part of implementing #1, one suggestion would be to create unit tests for each of the cases listed here. That would serve to help catch any accidental regressions.

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 2, 2022

Right on! We've got a bunch already over in test/unit/test_semver.py. We should have 'em all :)

@fivetran-joemarkiewicz
Copy link
Contributor

Hey @jtcohen6 and @dbeatty10 I see this is marked as a good_first_issue 👀.

I have been wanting to pick up a good first issue and was wondering if you would be open if I attempt a PR for this issue?

To confirm my understanding of what is ultimately desired here:

  • Someone can include an improper semantic version of a package in their packages.yml
packages:
  - package: fivetran/fivetran_log
    version: 0.5.0.b1
  • The error message returns Could not parse version "{0.5.0.b1}" which is not very descriptive of the issue at hand. Rather, it isn't a valid semantic version and should provide that context to help users understand that they may need to provide closer inspection on the version provided. (ie. 0.5.0-b1 is the correct version).
  • Because of the above, we want to provide "0.5.0.b1" is not a valid semantic version. to aid users.

Therefore to address this, we want to apply to following adjustments:

  1. Leverage (RegEx) to check a SemVer string within the packages.yml (and maybe in the dbt --version command execution? Or it is already addressed there?)
  2. Update the error message to be more descriptive of a SemVer error if that truly is the case on dbt deps
  3. Add unit testing to help catch any accidental regressions.

Let me know if you would mind if I tried to tackle this. Full disclosure, I would definitely need require support with the unit testing, but happy to give it a go! 😄

@dbeatty10
Copy link
Contributor Author

Yep, you pretty much nailed it, @fivetran-joemarkiewicz . Would definitely welcome a PR for this issue, and I'd be happy to provide assistance for the unit testing and anything else.

Here's my recommended order:

  1. Follow these instructions to run all the unit tests locally (everything should pass ✅ ).
  2. Add each of these cases to the SemVer unit tests here. Re-run the tests. There should be some failures now ❌ (to be fixed by your PR).
  3. Update the RegEx here to utilize the logic from the recommended SemVer RegEx. You'll probably want to break the SemVer RegEx into chunks to fit it into the existing code.
  4. Re-run the tests and iterate until they all pass again ✅
  5. Follow the rest of the contributing guidelines to finalize the PR

I'll try to write up an example of editing the RegEx unit tests and re-running just the one you want (rather than the entire suite).

@fivetran-joemarkiewicz
Copy link
Contributor

Hey @dbeatty10 thanks so much for the recommended order! 🙌

This was so helpful in figuring out what exactly needed to be updated for this issue to be addressed. I actually ended up having time this afternoon to dig into this and I think I got the issue addressed in the above linked PR! I did go a bit rogue and created a new function to leverage in the unit test. I have a feeling this was not needed and I was missing something right in front of me. So please let me know regarding that haha.

Finally, I noticed the RegEx rules didn't change too much once I broke them out like you suggested, but did see the difference between the current and this patch unit tests. Let me know if I missed something. We can probably move any convo for this to the PR. Thank you so much though for being so helpful with guiding me to opening the PR! 🕺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Straightforward + self-contained changes, good for new contributors!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants