-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(dataset-import): support empty strings for extra fields #24663
Conversation
* fix(dataset-import):support empty strings for extra fields * Adding unit test * black update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #24663 +/- ##
==========================================
- Coverage 69.05% 68.97% -0.08%
==========================================
Files 1907 1902 -5
Lines 74151 74011 -140
Branches 8182 8186 +4
==========================================
- Hits 51204 51049 -155
- Misses 20824 20841 +17
+ Partials 2123 2121 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 7 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
superset/datasets/schemas.py
Outdated
if extra.strip(): | ||
data["extra"] = json.loads(extra) | ||
else: | ||
data["extra"] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to try the EAFP approach here.. it could simplify this logic. Something like:
data["extra"] = {} | |
if isinstance(data.get("extra"), str): | |
try: | |
data["extra"] = json.loads(data["extra"]) | |
except JSONDecodeError: | |
data["extra"] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley had also suggested that we could try to fix the underlying code in order to make the data structures consistent. Unless you're feeling ambitious in this PR we could also fix the default in the import to be the correct format in a subsequent PR. I assume it should be an empty dictionary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course @betodealmeida's solution with the ternary is even shorter. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up using try/except
, because if we use the walrus operator and the extra
field is an empty string the if
block wouldn't be executed and the import would still fail. Also defaulted the extra
value to None
since the schema actually accepts it.
I 100% agree this PR only fixes the "symptom" but doesn't address the cause. I can try to work on that in a future PR. Some thoughts to consider:
- Currently the dataset modification modal doesn't validate the text added to the
extra
field, so users can add strings. - The string would be successfully exported, but then the import would fail until this PR. Once this gets merged, if the
extra
is set to a string in the YAML, its value would be discarded andextra
would be set toNone
during import.
We could either:
- Apply validation in the dataset modification form so that it only accepts json/dict data in the
extra
field: I think it's a stable solution and prevents future data type variations. However, users that currently have a string set could struggle to update the datasets in case the error message is not clear. - Modify the dataset schema so that the
extra
field accepts either a dictionary or a string: not really helpful in ensuring a data type, but should allow users with strings to import their existing datasets and reflect the information.
@betodealmeida @eschutho @john-bodley any thoughts? Also if you could approve the tests, that would be awesome! Thanks!
* fix(dataset-import):support empty strings for extra fields * Adding unit tests * black update * Simplifying logic * Updating tests
This is great, thanks @Vitor-Avila! |
(cherry picked from commit 65fb8e1)
SUMMARY
During the import process, the
extra
value for datasets is loaded as a dictionary. This operation fails in case theextra
key has an empty string (extra: ""
).BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
extra
value (in the UI).extra: ""
.ADDITIONAL INFORMATION