Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modularize default load and save argument handling #15

Merged
merged 24 commits into from
Jul 23, 2019
Merged

Modularize default load and save argument handling #15

merged 24 commits into from
Jul 23, 2019

Conversation

deepyaman
Copy link
Member

@deepyaman deepyaman commented Jun 9, 2019

Notice

  • I acknowledge and agree that, by checking this box and clicking “Submit Pull Request”:

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

Motivation and Context

Why was this PR created?

Close #14

How has this been tested?

What testing strategies have you used?

Unit tests still pass, plus limited manual testing. More testing of edge cases possible.

image

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes
  • Assigned myself to the PR
  • Added Type label to the PR

@tsanikgr
Copy link
Contributor

tsanikgr commented Jun 10, 2019

See my comment in #14 for an alternative proposition


edit: "alternative" is a strong word here, as what I proposed is almost identical to what you presented 🤦‍♂

kedro/io/core.py Outdated Show resolved Hide resolved
@deepyaman
Copy link
Member Author

@idanov @tolomea @tsanikgr Updated with emojis (and changes discussed with Ivan in #14). Let me know if it looks good!

I'm pretty confident in the implementation, less so in the added tests (I generally picked arguments that don't affect anything to cover load_args/save_args, but I haven't been using most of these datasets).

kedro/contrib/io/azure/csv_blob.py Outdated Show resolved Hide resolved
kedro/contrib/io/core.py Outdated Show resolved Hide resolved
Copy link
Member

@idanov idanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just a couple of small comments:

  • Could we keep the DEFAULT_... class properties to the classes extending the AbstractDataSet for core?
  • Could you make sure we use copy.deepcopy() instead of .copy()?
  • We should be aware that if someone decides to do object.DEFAULT_LOAD_ARGS["arg1"] = True, they might ruin the default args to all other objects created after that. Not sure if it's worth preventing that, but at least we need to keep that in mind.

kedro/io/core.py Outdated Show resolved Hide resolved
@deepyaman
Copy link
Member Author

@idanov @tolomea @tsanikgr Made the aforementioned changes, and tests pass locally, but running into Java errors on the build. I see some StackOverflow answers around forcing Java 1.8 (i.e. Spark not supporting Java 11), but since lot of other build are failing on this and I feel you all must have encountered this, putting it off till morning. :)

@tolomea
Copy link
Contributor

tolomea commented Jul 10, 2019

The Spark issue will be resolved when you update to latest develop.

Regarding the ordering of base classes, you are correct about the version mixins, there is work underway to fix that as part of merging the two different mixins into a proper base class.

@ghost ghost assigned deepyaman Jul 10, 2019
@deepyaman
Copy link
Member Author

deepyaman commented Jul 10, 2019

Types specified as required by 3.5. 😿 That should be everything!

else default_save_args
)

# Handle default load and save arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this (and all other datasets in kedro.io that are eligible) are not inheriting from the new MixIn? (maybe I missed something here, sorry about that!)

The code feels inconsistent now (+ all this duplication can go away!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks! @idanov, since the mix in was introduced, and is solving the problem "of the wrong abstraction", should we just leverage it in the core datasets as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idanov @tsanikgr @tolomea Let me know what you all decide as the core team here; I’m OK with it as is (a current marginal improvement to core with potential to move the mix-in from contrib after proving value in the future) or pushing the mix-in to core now. I’m more keen on merging to develop sooner than later due to the number of datasets touched, if possible. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current version is good for now. Thanks @deepyaman for accommodating for all the comments!

if save_args is not None
else default_save_args
)
super().__init__(load_args, save_args)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer calling super at the top of the constructor, so the subclass would overwrite stuff from the parent, as a "specialisation" of the superclass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer calling super at the top of the constructor, so the subclass would overwrite stuff from the parent, as a "specialisation" of the superclass.

Fair argument. I just left it in the same place where default arguments were previously handled (as close to the original as I could), but that makes sense.

@idanov idanov merged commit 9733fc6 into kedro-org:develop Jul 23, 2019
@deepyaman deepyaman deleted the fix/default-args branch July 23, 2019 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modularize default argument handling for datasets
4 participants