-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for reading remote storage systems #811
Changes from all commits
0821202
6f59715
5545ac7
b0a353c
42b6f43
9779395
9a8614b
90e4d88
af106a9
2c2650b
cf2c203
0596c9c
9869c69
908f445
a9f9a5e
7d894cc
f6239b5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -269,8 +269,8 @@ mod test { | |
}; | ||
} | ||
|
||
#[test] | ||
fn distributed_hash_aggregate_plan() -> Result<(), BallistaError> { | ||
#[tokio::test] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In case anyone else is interested, this is what happens if you don't have
|
||
async fn distributed_hash_aggregate_plan() -> Result<(), BallistaError> { | ||
let mut ctx = datafusion_test_context("testdata")?; | ||
|
||
// simplified form of TPC-H query 1 | ||
|
@@ -352,8 +352,8 @@ mod test { | |
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn distributed_join_plan() -> Result<(), BallistaError> { | ||
#[tokio::test] | ||
async fn distributed_join_plan() -> Result<(), BallistaError> { | ||
let mut ctx = datafusion_test_context("testdata")?; | ||
|
||
// simplified form of TPC-H query 12 | ||
|
@@ -523,8 +523,8 @@ order by | |
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn roundtrip_serde_hash_aggregate() -> Result<(), BallistaError> { | ||
#[tokio::test] | ||
async fn roundtrip_serde_hash_aggregate() -> Result<(), BallistaError> { | ||
let mut ctx = datafusion_test_context("testdata")?; | ||
|
||
// simplified form of TPC-H query 1 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,9 +56,9 @@ paste = "^1.0" | |
num_cpus = "1.13.0" | ||
chrono = "0.4" | ||
async-trait = "0.1.41" | ||
futures = "0.3" | ||
futures = { version = "0.3", features = ["executor"] } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we already have I tried removing this change locally and it seems to work There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you can use So something like Handle::current()
.block_on(async { .... }); There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While using
Since
Do you have any suggestions on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the only real suggestion is to plumb There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How about this alternative to reduce the scope of this PR? i.e. implement both sync and async, but only use sync API to migrate existing code to the new IO abstraction, then work on async propagation as a fast follow up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The other thing I was thinking about was what about adding in the ObjectStore interfaces in one PR and then start hooking that up into the rest of the system / rewrite the existing data sources (like Parquet, etc) as separate PRs. I think @yjshen has done a great job with this PR showing how everything would hook together, but I do feel like this PR is slightly beyond my ability to comprehend given its size and scope. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am onboard with further reducing the scope by focusing only on the ObjectStore interface :) |
||
pin-project-lite= "^0.2.0" | ||
tokio = { version = "1.0", features = ["macros", "rt", "rt-multi-thread", "sync"] } | ||
tokio = { version = "1.0", features = ["macros", "rt", "rt-multi-thread", "sync", "fs"] } | ||
tokio-stream = "0.1" | ||
log = "^0.4" | ||
md-5 = { version = "^0.9.1", optional = true } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are always returning a single path for the partitions field? This changes the behavior doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior is unchanged indeed, the origin filenames all comes from the root_path, and here I just use the root_path instead, to avoid touching too much code in ballista proto definition as well as its serde (to_proto and from_proto)