-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 9656 Raise an error when check_cols has duplicated values #9698
base: main
Are you sure you want to change the base?
Issue 9656 Raise an error when check_cols has duplicated values #9698
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
1 similar comment
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
@@ -52,6 +52,10 @@ def validate(cls, data): | |||
if data.get("materialized") and data.get("materialized") != "snapshot": | |||
raise ValidationError("A snapshot must have a materialized value of 'snapshot'") | |||
|
|||
# Validate if there are duplicate column names in check_cols. | |||
if len(data.get("check_cols")) != len(set(data.get("check_cols"))): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though .get()
is used, if check_cols
is missing, this will result in an error because NoneType has no len()
.
if len(data.get("check_cols")) != len(set(data.get("check_cols"))): | |
if "check_cols" in data and len(data["check_cols"]) != len(set(data["check_cols"])): |
But, judging by the code above it, this check should only apply in case of the check
strategy, where the existence of this check_cols
has been validated beforehand. I suggest moving this new validation to after line 40.
I expect the current implementation to break the breaks the timestamp
strategy, because of the absence of check_cols
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jordivandooren Could you please review the latest changes? I updated my code, and everything is now up-to-date. The duplicated columns validation has been moved inside the final_validate function.
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: ariosramirez.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: ariosramirez.
|
2 similar comments
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: ariosramirez.
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: ariosramirez.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #9698 +/- ##
==========================================
+ Coverage 88.90% 88.99% +0.09%
==========================================
Files 180 181 +1
Lines 22856 22974 +118
==========================================
+ Hits 20319 20445 +126
+ Misses 2537 2529 -8
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@ariosramirez did you do these steps already as well? |
Hi @ariosramirez, I think I understand what the problem is. I checked our CLA database, and it does look like you've signed it. What I believe is happening is that cla-bot checks every commit in the PR. A majority of your commits are signed with the email ariosramirez.data@gmail.com, and that email address seems correct. However, four commits (8082fa7, 5343684, 67d5cb4, and 7bd6cb5) are signed with the email andres@beek.io. It appears that email address is no longer associated with your github account. If you still have access to that email address, I recommend re-adding it to your list of associated email address in github (you can have multiple, I myself have five email address associated with my github account). If you no longer have access to that email address, you may need to modify the signature on those commits. |
@ariosramirez Alternatively, you can squash the problematic (or all the commits) into one new commit. This should give all the work your current email, I believe |
f6a7335
to
b859dea
Compare
…_cols_has_duplicated_values
resolves #9656
Problem
Problem described in #9656 (comment)
Solution
This PR adds validation to detect duplicate column names in the
check_cols
configuration of a snapshot. It raises an error if duplicates are found.Checklist