Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta Datasets: Loosen up delta-spark requirement #808

Closed
rwpurvis opened this issue Aug 17, 2024 · 7 comments
Closed

Delta Datasets: Loosen up delta-spark requirement #808

rwpurvis opened this issue Aug 17, 2024 · 7 comments
Labels

Comments

@rwpurvis
Copy link
Contributor

Description

Currently the delta-base dataset requires delta-spark~=1.2.1 which requires spark 3.2. This means that any databricks users are limited to spark 3.2. Can we loosen this requirement?

Context

How has this bug affected you? What were you trying to accomplish?

  • Unable to use databricks.ManagedTableDataset in current project

Steps to Reproduce

  1. Create a project requiring pyspark>=3.3
  2. attempt to install kedro-datasets[databricks.ManagedTableDataset]

Expected Result

Able to use latest version of spark or delta lake

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.19.3
  • Kedro plugin and kedro plugin version used (pip show kedro-airflow): any
  • Python version used (python -V): 3.10
  • Operating system and version: windows/ubuntu
@rwpurvis
Copy link
Contributor Author

rwpurvis commented Aug 17, 2024

It appears this may be to the update of the dependency names quoted below. I have updated dependencies to the PEP 685 compliant names and am testing.

EDIT: I am able to get dependencies to resolve but this is still pinned to pyspark~=3.2, which should be loosened IMO.

From kedro-datasets version 3.0.0 onwards, the names of the optional dataset-level dependencies have been normalised to follow PEP 685. The '.' character has been replaced with a '-' character and the names are in lowercase. For example, if you had kedro-datasets[pandas.ExcelDataset] in your requirements file, it would have to be changed to kedro-datasets[pandas-exceldataset].

@DimedS
Copy link
Contributor

DimedS commented Aug 20, 2024

Thank you for bringing this up, @rwpurvis! This issue was actually discussed earlier in issue #571, but for some reasons, it wasn't implemented at that time.

Would you be interested in making a pull request to address this?

@rwpurvis
Copy link
Contributor Author

Yes, I can take it up.

@rwpurvis
Copy link
Contributor Author

Looks like @felipemonroy beat me to it with #780! 🚀

@astrojuanlu
Copy link
Member

Uh as discussed there, not quite. We'll need another bump.

@rwpurvis
Copy link
Contributor Author

Opened #814

@astrojuanlu
Copy link
Member

Fixed in #814

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

3 participants