Modularize default load and save argument handling #15

deepyaman · 2019-06-09T21:35:26Z

Notice

I acknowledge and agree that, by checking this box and clicking “Submit Pull Request”:
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

Motivation and Context

Why was this PR created?

Close #14

How has this been tested?

What testing strategies have you used?

Unit tests still pass, plus limited manual testing. More testing of edge cases possible.

Checklist

Read the contributing guidelines
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added new entries to the RELEASE.md file
Added tests to cover my changes
Assigned myself to the PR
Added Type label to the PR

kedro/contrib/io/pyspark/spark_jdbc.py

tsanikgr · 2019-06-10T10:05:55Z

See my comment in #14 for an alternative proposition

edit: "alternative" is a strong word here, as what I proposed is almost identical to what you presented 🤦‍♂

kedro/io/core.py

deepyaman · 2019-07-03T23:09:19Z

@idanov @tolomea @tsanikgr Updated with emojis (and changes discussed with Ivan in #14). Let me know if it looks good!

I'm pretty confident in the implementation, less so in the added tests (I generally picked arguments that don't affect anything to cover load_args/save_args, but I haven't been using most of these datasets).

kedro/contrib/io/azure/csv_blob.py

kedro/contrib/io/core.py

idanov

Looks good to me. Just a couple of small comments:

Could we keep the DEFAULT_... class properties to the classes extending the AbstractDataSet for core?
Could you make sure we use copy.deepcopy() instead of .copy()?
We should be aware that if someone decides to do object.DEFAULT_LOAD_ARGS["arg1"] = True, they might ruin the default args to all other objects created after that. Not sure if it's worth preventing that, but at least we need to keep that in mind.

kedro/io/core.py

…into fix/default-args

deepyaman · 2019-07-10T02:48:10Z

@idanov @tolomea @tsanikgr Made the aforementioned changes, and tests pass locally, but running into Java errors on the build. I see some StackOverflow answers around forcing Java 1.8 (i.e. Spark not supporting Java 11), but since lot of other build are failing on this and I feel you all must have encountered this, putting it off till morning. :)

tolomea · 2019-07-10T09:34:15Z

The Spark issue will be resolved when you update to latest develop.

Regarding the ordering of base classes, you are correct about the version mixins, there is work underway to fix that as part of merging the two different mixins into a proper base class.

…nto fix/default-args

This reverts commit 5896daa.

deepyaman · 2019-07-10T17:01:29Z

Types specified as required by 3.5. 😿 That should be everything!

tsanikgr · 2019-07-11T11:08:54Z

kedro/io/hdf_s3.py

-            else default_save_args
-        )
+
+        # Handle default load and save arguments


Why this (and all other datasets in kedro.io that are eligible) are not inheriting from the new MixIn? (maybe I missed something here, sorry about that!)

The code feels inconsistent now (+ all this duplication can go away!)

@tsanikgr #14 (comment)

OK thanks! @idanov, since the mix in was introduced, and is solving the problem "of the wrong abstraction", should we just leverage it in the core datasets as well?

@idanov @tsanikgr @tolomea Let me know what you all decide as the core team here; I’m OK with it as is (a current marginal improvement to core with potential to move the mix-in from contrib after proving value in the future) or pushing the mix-in to core now. I’m more keen on merging to develop sooner than later due to the number of datasets touched, if possible. :)

I think the current version is good for now. Thanks @deepyaman for accommodating for all the comments!

ghost · 2019-07-23T08:51:27Z

kedro/contrib/io/bioinformatics/sequence_dataset.py

-            if save_args is not None
-            else default_save_args
-        )
+        super().__init__(load_args, save_args)


I prefer calling super at the top of the constructor, so the subclass would overwrite stuff from the parent, as a "specialisation" of the superclass.

I prefer calling super at the top of the constructor, so the subclass would overwrite stuff from the parent, as a "specialisation" of the superclass.

Fair argument. I just left it in the same place where default arguments were previously handled (as close to the original as I could), but that makes sense.

Modularize default load and save argument handling

b9bf25d

deepyaman requested review from idanov and tsanikgr as code owners June 9, 2019 21:35

deepyaman mentioned this pull request Jun 9, 2019

Modularize default argument handling for datasets #14

Closed

2 tasks

deepyaman commented Jun 9, 2019

View reviewed changes

kedro/contrib/io/pyspark/spark_jdbc.py Show resolved Hide resolved

Suppress super-init-not-called pylint messages

ba18548

tolomea reviewed Jun 13, 2019

View reviewed changes

kedro/io/core.py Outdated Show resolved Hide resolved

deepyaman added 9 commits June 14, 2019 14:52

Copy default args to prevent accidental mutation

41b40b2

Restore super().__init__ given default arg fix

c10a654

Merge branch 'develop' into fix/default-args

bf2643f

Refactor abstract base class modification as mixin

e83502c

Homogenize default load and save argument handling

63fda57

Demarcate load and save argument handling 🐉

0505773

Cover load and save argument handling 🐾

a93abf2

Add tests to cover load/save argument conditionals

4226c2e

Fix non-ASCII characters in legal header ✏️

a17ae9e

tolomea reviewed Jul 5, 2019

View reviewed changes

kedro/contrib/io/azure/csv_blob.py Outdated Show resolved Hide resolved

kedro/contrib/io/core.py Outdated Show resolved Hide resolved

idanov approved these changes Jul 5, 2019

View reviewed changes

kedro/io/core.py Outdated Show resolved Hide resolved

deepyaman added 6 commits July 6, 2019 23:00

Remove load/save defaults from AbstractDataSet

f7b2373

Call super().__init__ in mix-in implementation

124d663

Fix MRO when subclassing DefaultArgumentsMixIn

d3c7153

Merge branch 'fix/default-args' of https://github.com/deepyaman/kedro …

da10346

…into fix/default-args

Copy default argument dicts with copy.deepcopy

cac0c78

Merge branch 'develop' into fix/default-args

681beb0

deepyaman added 2 commits July 10, 2019 06:20

Merge branch 'develop' of https://github.com/quantumblacklabs/kedro i…

0d31b7c

…nto fix/default-args

Merge branch 'develop' into fix/default-args

473d725

Merge branch 'develop' into fix/default-args

2a575d6

ghost assigned deepyaman Jul 10, 2019

deepyaman added 3 commits July 10, 2019 12:11

Annotate types for default load and save arguments

5896daa

Revert "Annotate types for default load and save arguments"

3931744

This reverts commit 5896daa.

Annotate types for default load and save arguments

b2e4c1c

tsanikgr reviewed Jul 11, 2019

View reviewed changes

Merge branch 'develop' into fix/default-args

184d9f7

ghost approved these changes Jul 23, 2019

View reviewed changes

ghost reviewed Jul 23, 2019

View reviewed changes

idanov merged commit 9733fc6 into kedro-org:develop Jul 23, 2019

deepyaman deleted the fix/default-args branch July 23, 2019 18:27

deepyaman mentioned this pull request Oct 31, 2022

Easier CustomDataset Creation #1936

Open

deepyaman mentioned this pull request Jun 7, 2023

[kedro-datasets] fsspec mixin kedro-org/kedro-plugins#200

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modularize default load and save argument handling #15

Modularize default load and save argument handling #15

deepyaman commented Jun 9, 2019 •

edited

Loading

tsanikgr commented Jun 10, 2019 •

edited

Loading

deepyaman commented Jul 3, 2019

idanov left a comment

deepyaman commented Jul 10, 2019

tolomea commented Jul 10, 2019

deepyaman commented Jul 10, 2019 •

edited

Loading

tsanikgr Jul 11, 2019

deepyaman Jul 11, 2019

tsanikgr Jul 11, 2019

deepyaman Jul 16, 2019

idanov Jul 23, 2019

ghost Jul 23, 2019

deepyaman Jul 23, 2019

Modularize default load and save argument handling #15

Modularize default load and save argument handling #15

Conversation

deepyaman commented Jun 9, 2019 • edited Loading

Notice

Motivation and Context

How has this been tested?

Checklist

tsanikgr commented Jun 10, 2019 • edited Loading

deepyaman commented Jul 3, 2019

idanov left a comment

Choose a reason for hiding this comment

deepyaman commented Jul 10, 2019

tolomea commented Jul 10, 2019

deepyaman commented Jul 10, 2019 • edited Loading

tsanikgr Jul 11, 2019

Choose a reason for hiding this comment

deepyaman Jul 11, 2019

Choose a reason for hiding this comment

tsanikgr Jul 11, 2019

Choose a reason for hiding this comment

deepyaman Jul 16, 2019

Choose a reason for hiding this comment

idanov Jul 23, 2019

Choose a reason for hiding this comment

ghost Jul 23, 2019

Choose a reason for hiding this comment

deepyaman Jul 23, 2019

Choose a reason for hiding this comment

deepyaman commented Jun 9, 2019 •

edited

Loading

tsanikgr commented Jun 10, 2019 •

edited

Loading

deepyaman commented Jul 10, 2019 •

edited

Loading