Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets): create separate ibis.FileDataset #842

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

deepyaman
Copy link
Member

@deepyaman deepyaman commented Sep 20, 2024

Description

Resolves #828

Development notes

So far, I copied ibis.TableDataset, removing code paths for reading database tables, and adding support for file export.

Just wanted to put this out there for early feedback, but what else should be done?

  • Deprecate file I/O in TableDataset
  • Support versioning (I don't think there's any reason why this wouldn't work out of the box for FileDataset)
  • Docs stuff (add FileDataset to toctree, etc.)

Update: Versioning is actually not a trivial subject, because backends don't implement a consistent interface for checking whether a file exists. I plan to do this with PyArrow filesystem, but I will do that in a follow-up PR (to limit complexity added here); this PR handles local versioning.

Checklist

  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the relevant RELEASE.md file
  • Added tests to cover my changes

@deepyaman deepyaman force-pushed the feat/datasets/ibis-filedataset branch 2 times, most recently from fbcf8ff to 814c514 Compare September 20, 2024 15:47
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
@deepyaman deepyaman force-pushed the feat/datasets/ibis-filedataset branch 2 times, most recently from 0bfc761 to 2a633a2 Compare September 23, 2024 21:37
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Refs: b7ff0c7

Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
Signed-off-by: Deepyaman Datta <deepyaman.datta@utexas.edu>
@deepyaman deepyaman marked this pull request as ready for review September 24, 2024 05:04
Copy link
Contributor

@datajoely datajoely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!


@property
def connection(self) -> BaseBackend:
def hashable(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section needs comments because it's super clever and complicated, but someone else maintaining the class won't be able to understand what's going on without a lot of reverse engineering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add write support for ibis.TableDataset to files
2 participants