Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink decision to expose the public interface in namespaces #1900

Closed
MrPowers opened this issue Nov 22, 2023 · 6 comments
Closed

Rethink decision to expose the public interface in namespaces #1900

MrPowers opened this issue Nov 22, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@MrPowers
Copy link
Collaborator

Description

The public interface is currently exposed in different namespaces, which can make imports a little tricky. Users currently have two options when writing code. They can do something like this:

import deltalake as dl

dl.writer.write_deltalake("some_path", df)
dt = dl.DeltaTable("some-path")
dt.optimize.z_order(["some_col"])

Or they can do something like this:

from deltalake.writer import write_deltalake
from deltalake import DeltaTable

write_deltalake("some_path", df)
dt = DeltaTable("some-path")
dt.optimize.z_order(["some_col"])

We should consider following the pandas style and exposing the entire public API in the deltalake namespace. Here's some idiomatic pandas code:

import pandas as pd

df = pd.DataFrame({"num": [1, 2, 3], "letter": ["a", "b", "c"]})
df.to_cvs("a_place")
pd.read_csv("a_place")

I think this end-user interface could be nicer:

import deltalake as dl

dl.write_deltalake("some_path", df)
dt = dl.DeltaTable("some-path")
dt.z_order(["some_col"])

This would also allow for a nice single-line import if users prefer this style:

from deltalake import write_deltalake, DeltaTable

write_deltalake("some_path", df)
dt = DeltaTable("some-path")
dt.z_order(["some_col"])
@MrPowers MrPowers added the enhancement New feature or request label Nov 22, 2023
@MrPowers
Copy link
Collaborator Author

Turns out that write_deltalake can also be accessed like this:

from deltalake import write_deltalake

So the only issue we need to think about is if we want to keep this:

dt.optimize.z_order(["some_col"])

Or if we'd like to switch it to something like this:

dt.z_order(["some_col"])

@ion-elgreco
Copy link
Collaborator

I think we should keep these namespaces such as optimize, and in the future the alter namespace. It makes it easy for users to find a set of interactions on the table clustered together.

Example:
DeltaTable.alter.set_table_properties etc.

@MrPowers
Copy link
Collaborator Author

@ion-elgreco - Yea, that sounds fine with me, especially because optimize and alter don't require any additional imports.

Let's make sure to keep all the verb conjugations of any new namespaces consistent with the optimize / alter conjugations.

I do think we should have at least three methods inside of a namespace to justify the separate namespace. Right now, optimize only has compact and z_order which is fine because I'm sure more will be added in the future. Let me know what you think!

@ion-elgreco
Copy link
Collaborator

@MrPowers yeah sounds good!

@ion-elgreco
Copy link
Collaborator

@MrPowers shall we close this one?

@MrPowers
Copy link
Collaborator Author

MrPowers commented Jan 2, 2024

@ion-elgreco - yep, we can close this one, thank you!!

@MrPowers MrPowers closed this as completed Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants