Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace and deprecate DataSet use in class names #2500

Merged
merged 81 commits into from
Jun 16, 2023
Merged

Conversation

deepyaman
Copy link
Member

@deepyaman deepyaman commented Apr 10, 2023

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Deprecate core "DataSet" names and warn about the impending rename to "Dataset".

Once 0.19.0 is released, the deprecated aliases (and even the utils, if so desired) can be removed.

Development notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

@merelcht
Copy link
Member

@deepyaman I'm not sure it actually makes sense to the rename on the Kedro framework side. It's a breaking change and we're removing all datasets from here when releasing 0.19.0 anyway.

@deepyaman
Copy link
Member Author

deepyaman commented Apr 11, 2023

@deepyaman I'm not sure it actually makes sense to the rename on the Kedro framework side. It's a breaking change and we're removing all datasets from here when releasing 0.19.0 anyway.

@merelcht Won't the datasets in kedro.io.core stay in core framework, even in 0.19.0? Looking at #2129, are you suggesting that we would keep DataSet in the core repo and Dataset in plugins? Or that we change DataSet -> Dataset in core repo without deprecation in 0.19.0, since it's a breaking release anyway?

@merelcht
Copy link
Member

Ignore me, I should've had my morning coffee before commenting 😂

We should rename all of them, but it will be a breaking change so maybe it's better to wait until we start work on 0.19.0 to make sure develop and main don't diverge too much until then.

@merelcht
Copy link
Member

If we do a deprecation warning what will be users see and how often? It might be annoying if there's nothing they can do about it..

@deepyaman
Copy link
Member Author

If we do a deprecation warning what will be users see and how often? It might be annoying if there's nothing they can do about it..

If this functionality were to be released, they would get a warning if they used MemoryDataSet (or pandas.CSVDataSet in an updated kedro-datasets); but they could change their code to MemoryDataset (or pandas.CSVDataset) and avoid the warning. So the idea would be, yes, they can do something about it, and they have a grace period until it's removed that their projects wouldn't be broken on a newer version.

kedro/utils.py Outdated Show resolved Hide resolved
kedro/utils.py Outdated Show resolved Hide resolved
@stichbury
Copy link
Contributor

You are truly the patron saint of consistency @deepyaman and I'm happy to approve this change because it seems wholesale and straightforward. However, I'm not able to speak for @merelcht and @idanov in terms of whether it's the right point to merge this change or the implications thereof. But if they're happy with the intent, I'm happy with the execution.

@merelcht merelcht removed the request for review from idanov June 13, 2023 14:19
kedro/utils.py Outdated
if alias is not None:
warn(
f"{mcs.__name__} has been renamed to {alias.__name__}, and the "
f"alias will be removed Kedro 0.19.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"alias will be removed Kedro 0.19.0",
f"alias will be removed in Kedro 0.19.0",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! I fixed it in a separate place but forgot to add it back in here. :)

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for persisting with this @deepyaman ! I'm happy for this to get merged ⭐

Copy link
Contributor

@stichbury stichbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's OK with @merelcht it's a 👍 from me too.

@astrojuanlu
Copy link
Member

RTD failing with:

sphinx.errors.SphinxWarning: /home/docs/checkouts/readthedocs.org/user_builds/kedro/checkouts/2500/kedro/extras/datasets/api/api_dataset.py:docstring of kedro.io.core.AbstractDataSet.exists:1:py:exc reference target not found: DatasetError

@deepyaman deepyaman merged commit 1fb1ff7 into main Jun 16, 2023
@deepyaman deepyaman deleted the rename-data-set branch June 16, 2023 04:27
astrojuanlu pushed a commit that referenced this pull request Jun 21, 2023
* Replace and deprecate `DataSet` use in class names

* Replace another format string with an f-string

* Perform deprecations for cached, lambda, and partitioned datasets

* Deprecated `MemoryDataSet` in favor of `MemoryDataset`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Fix keyword argument to specify metaclass on `CachedDataSet`

* Fix reference to `PartitionedDataset`

* Keep `AbstractDataSet` subscriptable

* Update __init__.py files, __all__ definitions, etc

* Warn of impending Kedro 0.19 (not abstract future)

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

* Update `VideoDataSet` to `VideoDataset` (and refs)

* Add missing `kedro.utils.DeprecatedClassMeta` imps

* Change deprecated references to `AbstractDataSet`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Warn of impending Kedro 0.19 (not abstract future)

* Rename `pandas.CSVDataSet` to `pandas.CSVDataset`

* Fix some pylint errors and blacken code

* Update `dask.ParquetDataSet`

* Undo changes for `VideoDataSet`, inherit from new base

* Undo changes to `APIDataSet`, inherit from new base

* Fix some imports and missed references

* Undo changes to `BioSequenceDataSet`, inherit from new base

* Undo changes to Dask and Pandas datasets, inherit from new bases

* Remove the `AbstractDataset` and `AbstractVersionedDataset` alias, update `kedro.io.core`

* Undo changes in `kedro/extras/datasets`

* Update branch

* Change `DataSetError` to `DatasetError`

* Remove deprecated aliases for Abstract*DataSet

* Change `DataSetError` to `DatasetError` in tests/

* Change DataSet*Error to Dataset*Error in tests/

* Fix references to DataSet in a lot of tests

* Change `CSVDataset` back to `CSBVDataSet`

* Rename core datasets used across `tests` directory

* Fix "Saving 'None' to a 'DataSet' is not allowed." messages

* Fix `test_http_filesystem_no_versioning` everywhere

* Fix removal of "data"

* Deprecate `_SharedMemoryDataSet` in favor of `_SharedMemoryDataset`

* Fix tests/pipeline/test_pipeline_from_missing.py

* Fix list datasets test

* Change patched IncrementalDataSet to IncrementalDataset

* Fix default checkpoint dataset

* Fix data catalog tests

* Fix error message

* Use `MemoryDataset`, not `MemoryDataSet`, by default

* Use `MemoryDataset`, not `MemoryDataSet`, for missing datasets in data catalog

* Rename DefaultDataSet key to DefaultDataset

* Change `LambdaDataSet` to `LambdaDataset` in `test_node_run.py`

* Update error message--but should I?

* Update error message--but should I?

* Update error message in kedro/io/core.py--but should I?

* Update RELEASE.md

* Fix remaining tests

* Fix lint issues

* Align capitalization

* Add `DeprecatedClassMeta` tests from StackOverflow

* Blacken kedro/utils.py

* Ignore "No value for argument 'subclass' in unbound method call"

* Rename 'foo' to 'value' to satisfy linter

* Disable pylint messages on deprecated class definitions

* Blacken kedro/utils.py

* Wrap `kedro.io.core` to fix error deprecation

* Simplify deprecation of error names to try to fix docs

* Undo attempt to make docs pass

Revert "Simplify deprecation of error names to try to fix docs"

This reverts commit e9294be.

Revert "Wrap `kedro.io.core` to fix error deprecation"

This reverts commit db9b7ac.

* Replace `DataSetError` with `DatasetError` in test

* Add missing "in" to a `DeprecationWarning` message

* Add "Dataset" versions of errors to `kedro.io` doc

* Add updated "Dataset" names to `kedro.io.rst` and sort the entries

* Add `_SharedMemoryDataset` to type targets in conf

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
astrojuanlu pushed a commit that referenced this pull request Jun 21, 2023
* Replace and deprecate `DataSet` use in class names

* Replace another format string with an f-string

* Perform deprecations for cached, lambda, and partitioned datasets

* Deprecated `MemoryDataSet` in favor of `MemoryDataset`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Fix keyword argument to specify metaclass on `CachedDataSet`

* Fix reference to `PartitionedDataset`

* Keep `AbstractDataSet` subscriptable

* Update __init__.py files, __all__ definitions, etc

* Warn of impending Kedro 0.19 (not abstract future)

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

* Update `VideoDataSet` to `VideoDataset` (and refs)

* Add missing `kedro.utils.DeprecatedClassMeta` imps

* Change deprecated references to `AbstractDataSet`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Warn of impending Kedro 0.19 (not abstract future)

* Rename `pandas.CSVDataSet` to `pandas.CSVDataset`

* Fix some pylint errors and blacken code

* Update `dask.ParquetDataSet`

* Undo changes for `VideoDataSet`, inherit from new base

* Undo changes to `APIDataSet`, inherit from new base

* Fix some imports and missed references

* Undo changes to `BioSequenceDataSet`, inherit from new base

* Undo changes to Dask and Pandas datasets, inherit from new bases

* Remove the `AbstractDataset` and `AbstractVersionedDataset` alias, update `kedro.io.core`

* Undo changes in `kedro/extras/datasets`

* Update branch

* Change `DataSetError` to `DatasetError`

* Remove deprecated aliases for Abstract*DataSet

* Change `DataSetError` to `DatasetError` in tests/

* Change DataSet*Error to Dataset*Error in tests/

* Fix references to DataSet in a lot of tests

* Change `CSVDataset` back to `CSBVDataSet`

* Rename core datasets used across `tests` directory

* Fix "Saving 'None' to a 'DataSet' is not allowed." messages

* Fix `test_http_filesystem_no_versioning` everywhere

* Fix removal of "data"

* Deprecate `_SharedMemoryDataSet` in favor of `_SharedMemoryDataset`

* Fix tests/pipeline/test_pipeline_from_missing.py

* Fix list datasets test

* Change patched IncrementalDataSet to IncrementalDataset

* Fix default checkpoint dataset

* Fix data catalog tests

* Fix error message

* Use `MemoryDataset`, not `MemoryDataSet`, by default

* Use `MemoryDataset`, not `MemoryDataSet`, for missing datasets in data catalog

* Rename DefaultDataSet key to DefaultDataset

* Change `LambdaDataSet` to `LambdaDataset` in `test_node_run.py`

* Update error message--but should I?

* Update error message--but should I?

* Update error message in kedro/io/core.py--but should I?

* Update RELEASE.md

* Fix remaining tests

* Fix lint issues

* Align capitalization

* Add `DeprecatedClassMeta` tests from StackOverflow

* Blacken kedro/utils.py

* Ignore "No value for argument 'subclass' in unbound method call"

* Rename 'foo' to 'value' to satisfy linter

* Disable pylint messages on deprecated class definitions

* Blacken kedro/utils.py

* Wrap `kedro.io.core` to fix error deprecation

* Simplify deprecation of error names to try to fix docs

* Undo attempt to make docs pass

Revert "Simplify deprecation of error names to try to fix docs"

This reverts commit e9294be.

Revert "Wrap `kedro.io.core` to fix error deprecation"

This reverts commit db9b7ac.

* Replace `DataSetError` with `DatasetError` in test

* Add missing "in" to a `DeprecationWarning` message

* Add "Dataset" versions of errors to `kedro.io` doc

* Add updated "Dataset" names to `kedro.io.rst` and sort the entries

* Add `_SharedMemoryDataset` to type targets in conf

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
astrojuanlu pushed a commit that referenced this pull request Jun 21, 2023
* Replace and deprecate `DataSet` use in class names

* Replace another format string with an f-string

* Perform deprecations for cached, lambda, and partitioned datasets

* Deprecated `MemoryDataSet` in favor of `MemoryDataset`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Fix keyword argument to specify metaclass on `CachedDataSet`

* Fix reference to `PartitionedDataset`

* Keep `AbstractDataSet` subscriptable

* Update __init__.py files, __all__ definitions, etc

* Warn of impending Kedro 0.19 (not abstract future)

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

* Update `VideoDataSet` to `VideoDataset` (and refs)

* Add missing `kedro.utils.DeprecatedClassMeta` imps

* Change deprecated references to `AbstractDataSet`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Warn of impending Kedro 0.19 (not abstract future)

* Rename `pandas.CSVDataSet` to `pandas.CSVDataset`

* Fix some pylint errors and blacken code

* Update `dask.ParquetDataSet`

* Undo changes for `VideoDataSet`, inherit from new base

* Undo changes to `APIDataSet`, inherit from new base

* Fix some imports and missed references

* Undo changes to `BioSequenceDataSet`, inherit from new base

* Undo changes to Dask and Pandas datasets, inherit from new bases

* Remove the `AbstractDataset` and `AbstractVersionedDataset` alias, update `kedro.io.core`

* Undo changes in `kedro/extras/datasets`

* Update branch

* Change `DataSetError` to `DatasetError`

* Remove deprecated aliases for Abstract*DataSet

* Change `DataSetError` to `DatasetError` in tests/

* Change DataSet*Error to Dataset*Error in tests/

* Fix references to DataSet in a lot of tests

* Change `CSVDataset` back to `CSBVDataSet`

* Rename core datasets used across `tests` directory

* Fix "Saving 'None' to a 'DataSet' is not allowed." messages

* Fix `test_http_filesystem_no_versioning` everywhere

* Fix removal of "data"

* Deprecate `_SharedMemoryDataSet` in favor of `_SharedMemoryDataset`

* Fix tests/pipeline/test_pipeline_from_missing.py

* Fix list datasets test

* Change patched IncrementalDataSet to IncrementalDataset

* Fix default checkpoint dataset

* Fix data catalog tests

* Fix error message

* Use `MemoryDataset`, not `MemoryDataSet`, by default

* Use `MemoryDataset`, not `MemoryDataSet`, for missing datasets in data catalog

* Rename DefaultDataSet key to DefaultDataset

* Change `LambdaDataSet` to `LambdaDataset` in `test_node_run.py`

* Update error message--but should I?

* Update error message--but should I?

* Update error message in kedro/io/core.py--but should I?

* Update RELEASE.md

* Fix remaining tests

* Fix lint issues

* Align capitalization

* Add `DeprecatedClassMeta` tests from StackOverflow

* Blacken kedro/utils.py

* Ignore "No value for argument 'subclass' in unbound method call"

* Rename 'foo' to 'value' to satisfy linter

* Disable pylint messages on deprecated class definitions

* Blacken kedro/utils.py

* Wrap `kedro.io.core` to fix error deprecation

* Simplify deprecation of error names to try to fix docs

* Undo attempt to make docs pass

Revert "Simplify deprecation of error names to try to fix docs"

This reverts commit e9294be.

Revert "Wrap `kedro.io.core` to fix error deprecation"

This reverts commit db9b7ac.

* Replace `DataSetError` with `DatasetError` in test

* Add missing "in" to a `DeprecationWarning` message

* Add "Dataset" versions of errors to `kedro.io` doc

* Add updated "Dataset" names to `kedro.io.rst` and sort the entries

* Add `_SharedMemoryDataset` to type targets in conf

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Juan Luis Cano Rodríguez <juan_luis_cano@mckinsey.com>
deepyaman added a commit that referenced this pull request Jun 22, 2023
* Replace and deprecate `DataSet` use in class names (#2500)

* Replace and deprecate `DataSet` use in class names

* Replace another format string with an f-string

* Perform deprecations for cached, lambda, and partitioned datasets

* Deprecated `MemoryDataSet` in favor of `MemoryDataset`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Fix keyword argument to specify metaclass on `CachedDataSet`

* Fix reference to `PartitionedDataset`

* Keep `AbstractDataSet` subscriptable

* Update __init__.py files, __all__ definitions, etc

* Warn of impending Kedro 0.19 (not abstract future)

Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com>

* Update `VideoDataSet` to `VideoDataset` (and refs)

* Add missing `kedro.utils.DeprecatedClassMeta` imps

* Change deprecated references to `AbstractDataSet`

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Warn of impending Kedro 0.19 (not abstract future)

* Rename `pandas.CSVDataSet` to `pandas.CSVDataset`

* Fix some pylint errors and blacken code

* Update `dask.ParquetDataSet`

* Undo changes for `VideoDataSet`, inherit from new base

* Undo changes to `APIDataSet`, inherit from new base

* Fix some imports and missed references

* Undo changes to `BioSequenceDataSet`, inherit from new base

* Undo changes to Dask and Pandas datasets, inherit from new bases

* Remove the `AbstractDataset` and `AbstractVersionedDataset` alias, update `kedro.io.core`

* Undo changes in `kedro/extras/datasets`

* Update branch

* Change `DataSetError` to `DatasetError`

* Remove deprecated aliases for Abstract*DataSet

* Change `DataSetError` to `DatasetError` in tests/

* Change DataSet*Error to Dataset*Error in tests/

* Fix references to DataSet in a lot of tests

* Change `CSVDataset` back to `CSBVDataSet`

* Rename core datasets used across `tests` directory

* Fix "Saving 'None' to a 'DataSet' is not allowed." messages

* Fix `test_http_filesystem_no_versioning` everywhere

* Fix removal of "data"

* Deprecate `_SharedMemoryDataSet` in favor of `_SharedMemoryDataset`

* Fix tests/pipeline/test_pipeline_from_missing.py

* Fix list datasets test

* Change patched IncrementalDataSet to IncrementalDataset

* Fix default checkpoint dataset

* Fix data catalog tests

* Fix error message

* Use `MemoryDataset`, not `MemoryDataSet`, by default

* Use `MemoryDataset`, not `MemoryDataSet`, for missing datasets in data catalog

* Rename DefaultDataSet key to DefaultDataset

* Change `LambdaDataSet` to `LambdaDataset` in `test_node_run.py`

* Update error message--but should I?

* Update error message--but should I?

* Update error message in kedro/io/core.py--but should I?

* Update RELEASE.md

* Fix remaining tests

* Fix lint issues

* Align capitalization

* Add `DeprecatedClassMeta` tests from StackOverflow

* Blacken kedro/utils.py

* Ignore "No value for argument 'subclass' in unbound method call"

* Rename 'foo' to 'value' to satisfy linter

* Disable pylint messages on deprecated class definitions

* Blacken kedro/utils.py

* Wrap `kedro.io.core` to fix error deprecation

* Simplify deprecation of error names to try to fix docs

* Undo attempt to make docs pass

Revert "Simplify deprecation of error names to try to fix docs"

This reverts commit e9294be.

Revert "Wrap `kedro.io.core` to fix error deprecation"

This reverts commit db9b7ac.

* Replace `DataSetError` with `DatasetError` in test

* Add missing "in" to a `DeprecationWarning` message

* Add "Dataset" versions of errors to `kedro.io` doc

* Add updated "Dataset" names to `kedro.io.rst` and sort the entries

* Add `_SharedMemoryDataset` to type targets in conf

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>

* Update `DataSetError` to `DatasetError` in test_api_dataset.py

---------

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Co-authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
@noklam noklam mentioned this pull request Jun 28, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants