Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: move unity catalog integration to it's own crate #3044

Merged
merged 4 commits into from
Dec 16, 2024

Conversation

hntd187
Copy link
Collaborator

@hntd187 hntd187 commented Dec 2, 2024

Description

This moves the catalog integration of unity catalog to it's own crate. It also replaces the mock server with the httpmock crate and the custom retry logic with request_middleware's retry middleware. I think this will be easier to manage as a client and I do not believe many people were using this integration anyways.

Related Issue(s)

Documentation

@github-actions github-actions bot added binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Dec 2, 2024
Copy link

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 53.72340% with 87 lines in your changes missing coverage. Please review.

Project coverage is 72.40%. Comparing base (a51d75a) to head (93701f4).

Files with missing lines Patch % Lines
crates/catalog-unity/src/client/retry.rs 14.89% 40 Missing ⚠️
crates/catalog-unity/src/credential.rs 70.90% 7 Missing and 9 partials ⚠️
crates/catalog-unity/src/lib.rs 67.39% 11 Missing and 4 partials ⚠️
crates/catalog-unity/src/client/mod.rs 64.70% 4 Missing and 2 partials ⚠️
crates/catalog-unity/src/datafusion.rs 0.00% 3 Missing ⚠️
crates/core/src/delta_datafusion/expr.rs 0.00% 3 Missing ⚠️
crates/core/src/kernel/models/actions.rs 50.00% 3 Missing ⚠️
crates/core/src/operations/merge/filter.rs 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3044      +/-   ##
==========================================
- Coverage   72.65%   72.40%   -0.25%     
==========================================
  Files         129      128       -1     
  Lines       41584    41314     -270     
  Branches    41584    41314     -270     
==========================================
- Hits        30211    29915     -296     
- Misses       9405     9430      +25     
- Partials     1968     1969       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

crates/catalog-unity/src/client/retry.rs Outdated Show resolved Hide resolved
crates/catalog-unity/src/credential.rs Outdated Show resolved Hide resolved
r#"
let client = reqwest_middleware::ClientBuilder::new(Client::new()).build();

let _retry_config = RetryConfig::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if I was going to remove it at that point from the testing, so it's a side effect left over from that.

Response::new(Body::from(
r#"
let client = reqwest_middleware::ClientBuilder::new(Client::new()).build();
let _retry_config = RetryConfig::default();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unused variable that should be removed? 🤔

crates/core/Cargo.toml Outdated Show resolved Hide resolved
crates/core/Cargo.toml Outdated Show resolved Hide resolved
@@ -137,4 +138,4 @@ datafusion = [
datafusion-ext = ["datafusion"]
json = ["parquet/json"]
python = ["arrow/pyarrow"]
unity-experimental = ["reqwest", "hyper"]
unity-experimental = ["reqwest", "reqwest-middleware"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this feature come off the core crate now that this code is moved into its own crate?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a discussion point I wanted to talk about. The main trait for the catalog and the errors remained in the core crate. So core can build I assume generic catalog support and catalog crates just depend on core and implement. At first glance I kinda didn't have a cohesive error handling thing here, but I think thinking about it. There should be a generic catalog error in core, but have no conditionals features on it. And individual crates implement a conversion to that error type. Sound good?

@hntd187 hntd187 marked this pull request as ready for review December 10, 2024 20:44
let compare_expr = match generalize_filter(
let compare_expr = generalize_filter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed a few warnings along the way, this statement could be just written with an ? operator instead.

Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just some minor questions.

More gene4rally speaking though, I think we need to make some more deliberate choices around how we want to integrate with catalogs.

AS I have been out for a while, I am not sure where we are in terms of eagerness to push the next release. I would hope that we can do some more cleaning and most importantly testing before we release get "more serious" with unity ...

w.r.t. testing I hope that we can even do some integration tests using OSS UC. Happy the help with that.

@@ -216,7 +226,10 @@ impl FromStr for UnityCatalogConfigKey {
"workspace_url" | "unity_workspace_url" | "databricks_workspace_url" => {
Ok(UnityCatalogConfigKey::WorkspaceUrl)
}
_ => Err(UnityCatalogError::UnknownConfigKey(s.into()).into()),
_ => Err(DataCatalogError::UnknownConfigKey {
catalog: "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in this case the "catalog" would be "Unity".

Btw... a lot fi the client impl was just stolen from Object store, where this field identifies the specific implementation (azure, gcp, ...). If we have this error live in a sepate create, this dioes not make much sense anymore.


Ok(resp.json().await?)
Ok(resp.json().await.map_err(UnityCatalogError::from)?)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have an impl From<...> for the error, why do we need the map? does the function return a different kind of err.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of jank, the base error type in core no longer has a from for rewqest::Error so we convert to our own error type that does have a From to DataCatalogError

"GOOGLE_SERVICE_ACCOUNT",
account_path.as_path().to_str().unwrap(),
);
unsafe {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something, or did we remove unsafe around set_var further above, specifically in crates/aws/src/storage.rs, and here we are adding it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to add it back to the other one, thanks for pointing it out

roeap
roeap previously approved these changes Dec 13, 2024
Copy link
Collaborator

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

One ore thing though, and not sure if this would actually happen, but it seems GH wants to re-apply a bunch of commits already on main. If all of these were duplicated, this would give us (IMO) a fairly dirty history. Could we squash these to make it a bit cleaner?

@hntd187
Copy link
Collaborator Author

hntd187 commented Dec 14, 2024

@roeap yes, I'll squash prior to merge

@hntd187 hntd187 added this pull request to the merge queue Dec 16, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Dec 16, 2024
@rtyler
Copy link
Member

rtyler commented Dec 16, 2024

@hntd187 unfortunately it looks like this needs to be rebased again 🤦

hntd187 and others added 4 commits December 16, 2024 15:04
Signed-off-by: Stephen Carman <shcarman@gmail.com>
Signed-off-by: Stephen Carman <hntd187@users.noreply.github.com>
Signed-off-by: Stephen Carman <hntd187@users.noreply.github.com>
Signed-off-by: Stephen Carman <hntd187@users.noreply.github.com>
@rtyler
Copy link
Member

rtyler commented Dec 16, 2024

I have resolved conflicts, I will push and land this before I cut another release today

@rtyler rtyler enabled auto-merge December 16, 2024 15:13
@rtyler rtyler added this pull request to the merge queue Dec 16, 2024
Merged via the queue into delta-io:main with commit a5d0097 Dec 16, 2024
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants