-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: clean up dependencies and feature flags #1014
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-ran the failed CI job, which was an unrelated cloud timeout 🤷
rust/Cargo.toml
Outdated
# AWS_PROFILE is experimental in object_store, and comes with a hefty load of dependencies. | ||
# must be used in conjunction with s3 or s3-rustls feature | ||
aws-profile = ["object_store/aws_profile"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind putting this functionality behind a feature flag, but this sort of change would be easier for us to discuss as a community if it were submitted separately.
Since @fvaleye just landed this functionality, I would like him to take a look at this pull request before merging too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this would have been better in a separate PR - here I was working in WSL and struggling with some memory issues, which is why this change made my live a bit easier there and then 😆.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized, that the current setup is likely slightly broken anyhow. IF we include the feature in the object-store crate by default, this will enable aws / s3 support in the object store crate as well, however we are not enabling either s3 or s3-rustls by default on our end, in which case delta.-rs will not be able to make use of these features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the previous comment, we could set this aws-profile
feature with S3 features since it's only related to the AWS S3 backend storage.
|
||
- `s3` - enable the S3 storage backend to work with Delta Tables in AWS S3. | ||
- `s3-rustls` - enable the S3 storage backend but rely on [rustls](https://github.com/ctz/rustls) rather than OpenSSL (`native-tls`). | ||
- `glue` - enable the Glue data catalog to work with Delta Tables with AWS Glue. | ||
- `azure` - enable the Azure storage backend to work with Delta Tables in Azure Data Lake Storage Gen2 accounts. | ||
- `gcs` - enable the Google storage backend to work with Delta Tables in Google Cloud Storage. | ||
- `datafusion-ext` - enable the `datafusion::datasource::TableProvider` trait implementation for Delta Tables, allowing them to be queried using [DataFusion](https://github.com/apache/arrow-datafusion). | ||
- `datafusion` - enable the `datafusion::datasource::TableProvider` trait implementation for Delta Tables, allowing them to be queried using [DataFusion](https://github.com/apache/arrow-datafusion). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
))] | ||
compile_error!("Feature aws-profile must be used together with s3 or s3-rustls feature"); | ||
|
||
#[cfg(all(feature = "s3", feature = "s3-rustls"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
rust/tests/datafusion_test.rs
Outdated
let expected = vec![ | ||
"+----+----+----+", | ||
"| c3 | c1 | c2 |", | ||
"+----+----+----+", | ||
"| 5 | 4 | c |", | ||
"| 6 | 5 | b |", | ||
"| 4 | 6 | a |", | ||
"+----+----+----+", | ||
]; | ||
|
||
assert_batches_sorted_eq!(&expected, &batches); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to understand this code, this macro in datafusion is a very surprising test helper to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I like these helpers quite a bit, especially once we dive into more complex operations, I believe this makes it quite straight forward to reason about the tests.
Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
@@ -77,14 +77,15 @@ glibc_version = { path = "../glibc_version", version = "0.1" } | |||
|
|||
[features] | |||
default = ["arrow", "parquet"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also add aws-profile in the default feature list? profile based auth is a very common practice in production environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add it as default to s3* features, since those are the only ones where it is useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other cleanup piece we might want: we don't have a CI job that runs cargo check --no-default-features
and on the main branch it doesn't succeed. Do you want to look at addressing that in this PR? Or should we create a separate issue for that?
Good point - happy to add this in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great cleanup, thank you!
# Description ~~This PR updates datafusion and related dependencies to their latest versions. Since datafusion now has improved support for loading partition columns with non string types, we update our scan methods to take advantage of that.~~ While working on dependencies, I took the opportunity to do some housekeeping. - do not use chrono with default features - make `aws-profile` from object_store optional. The upstream create explicitly discourages its usage, and it brings quite a few new dependencies, as it pulls in some aws sdk. - rename `datafusion-ext` feature to `datafusion`. The ext suffix is still from a time where there were less options to define features. I kept the ols feature around as an alias. # Related Issue(s) closes delta-io#914 # Documentation <!--- Share links to useful documentation ---> Co-authored-by: R. Tyler Croy <rtyler@brokenco.de>
Description
This PR updates datafusion and related dependencies to their latest versions. Since datafusion now has improved support for loading partition columns with non string types, we update our scan methods to take advantage of that.While working on dependencies, I took the opportunity to do some housekeeping.
aws-profile
from object_store optional. The upstream create explicitly discourages its usage, and it brings quite a few new dependencies, as it pulls in some aws sdk.datafusion-ext
feature todatafusion
. The ext suffix is still from a time where there were less options to define features. I kept the ols feature around as an alias.Related Issue(s)
closes #914
Documentation