Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing to a branch #306

Open
Fokko opened this issue Jan 26, 2024 · 13 comments · May be fixed by #941
Open

Support writing to a branch #306

Fokko opened this issue Jan 26, 2024 · 13 comments · May be fixed by #941

Comments

@Fokko
Copy link
Contributor

Fokko commented Jan 26, 2024

Feature Request / Improvement

Right now we hardcoded that we write to the main branch all the time, would be great to make this configurable.

@kevinjqliu
Copy link
Contributor

kevinjqliu commented Jan 27, 2024

The API to write to a branch should look something like

def append(self, df: pa.Table, branch: str = MAIN_BRANCH)
...
def overwrite(self, df: pa.Table, overwrite_filter: BooleanExpression = ALWAYS_TRUE, branch: str = MAIN_BRANCH)
...

But in order to write to a branch, the branch needs to be created first.

From https://iceberg.apache.org/docs/latest/spark-writes/#writing-to-branches:

the branch must exist before performing the write. The operation does not create the branch if it does not exist.

@kevinjqliu
Copy link
Contributor

First pass, just refactoring
#312

@kevinjqliu
Copy link
Contributor

We first need a create branch API.

Then update places currently gated by MAIN_BRANCH.

  1. if MAIN_BRANCH not in table_metadata.refs:
  2. if update.ref_name == MAIN_BRANCH:

And finally add a test for writing to a branch

@kevinjqliu
Copy link
Contributor

also note dev/provision.py which is used for integration tests already have statements to create tags and branchs

spark.sql(f"ALTER TABLE {catalog_name}.default.test_positional_mor_deletes CREATE TAG tag_12")
spark.sql(f"ALTER TABLE {catalog_name}.default.test_positional_mor_deletes CREATE BRANCH without_5")

@kevinjqliu kevinjqliu removed their assignment Feb 14, 2024
@Gowthami03B
Copy link
Contributor

I had an offline chat with @kevinjqliu , I shall work on this to build off of the PR created by kevin.

@Gowthami03B
Copy link
Contributor

Hello, can I hop back on this train if no one else is actively working on this (again building off of kevin's work)? @kevinjqliu @Fokko

@kevinjqliu
Copy link
Contributor

@Gowthami03B definitely, I've assigned the issue to you!

@Gowthami03B Gowthami03B removed their assignment Jun 27, 2024
@Gowthami03B
Copy link
Contributor

Gowthami03B commented Jun 27, 2024

@Gowthami03B definitely, I've assigned the issue to you!

Opening up this to the community as I am gonna be out for the next month! @kevinjqliu

@vinjai
Copy link

vinjai commented Jul 6, 2024

@kevinjqliu @Fokko
I can take this forward if no one is actively working on this.

@kevinjqliu
Copy link
Contributor

@vinjai yes! please go ahead.

@vinjai vinjai linked a pull request Jul 18, 2024 that will close this issue
@sungwy sungwy added this to the PyIceberg 0.8.0 release milestone Jul 20, 2024
@sungwy
Copy link
Collaborator

sungwy commented Sep 24, 2024

Hi @vinjai thank you very much for working on this issue. I'm just working through the list of open items to check if they are still actively being worked on. Are you still interested in contributing this feature to PyIceberg? 🧊

@vinjai
Copy link

vinjai commented Sep 29, 2024

Hey @sungwy
I am working on this at the moment
Will open the PR for review by next week

@vinjai
Copy link

vinjai commented Oct 17, 2024

PR is ready for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants