-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularize default load and save argument handling #15
Conversation
See my comment in #14 for an alternative proposition edit: "alternative" is a strong word here, as what I proposed is almost identical to what you presented 🤦♂ |
@idanov @tolomea @tsanikgr Updated with emojis (and changes discussed with Ivan in #14). Let me know if it looks good! I'm pretty confident in the implementation, less so in the added tests (I generally picked arguments that don't affect anything to cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just a couple of small comments:
- Could we keep the
DEFAULT_...
class properties to the classes extending theAbstractDataSet
for core? - Could you make sure we use
copy.deepcopy()
instead of.copy()
? - We should be aware that if someone decides to do
object.DEFAULT_LOAD_ARGS["arg1"] = True
, they might ruin the default args to all other objects created after that. Not sure if it's worth preventing that, but at least we need to keep that in mind.
…into fix/default-args
@idanov @tolomea @tsanikgr Made the aforementioned changes, and tests pass locally, but running into Java errors on the build. I see some StackOverflow answers around forcing Java 1.8 (i.e. Spark not supporting Java 11), but since lot of other build are failing on this and I feel you all must have encountered this, putting it off till morning. :) |
The Spark issue will be resolved when you update to latest develop. Regarding the ordering of base classes, you are correct about the version mixins, there is work underway to fix that as part of merging the two different mixins into a proper base class. |
Types specified as required by 3.5. 😿 That should be everything! |
else default_save_args | ||
) | ||
|
||
# Handle default load and save arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this (and all other datasets in kedro.io
that are eligible) are not inheriting from the new MixIn
? (maybe I missed something here, sorry about that!)
The code feels inconsistent now (+ all this duplication can go away!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks! @idanov, since the mix in was introduced, and is solving the problem "of the wrong abstraction", should we just leverage it in the core datasets as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@idanov @tsanikgr @tolomea Let me know what you all decide as the core team here; I’m OK with it as is (a current marginal improvement to core
with potential to move the mix-in from contrib
after proving value in the future) or pushing the mix-in to core
now. I’m more keen on merging to develop
sooner than later due to the number of datasets touched, if possible. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current version is good for now. Thanks @deepyaman for accommodating for all the comments!
if save_args is not None | ||
else default_save_args | ||
) | ||
super().__init__(load_args, save_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer calling super
at the top of the constructor, so the subclass would overwrite stuff from the parent, as a "specialisation" of the superclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer calling
super
at the top of the constructor, so the subclass would overwrite stuff from the parent, as a "specialisation" of the superclass.
Fair argument. I just left it in the same place where default arguments were previously handled (as close to the original as I could), but that makes sense.
Notice
I acknowledge and agree that, by checking this box and clicking “Submit Pull Request”:
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.
Motivation and Context
Why was this PR created?
Close #14
How has this been tested?
What testing strategies have you used?
Unit tests still pass, plus limited manual testing. More testing of edge cases possible.
Checklist
RELEASE.md
fileType
label to the PR