-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add range and ObjectMeta to GetResult (#4352) (#4495) #4677
Conversation
/// The [`ObjectMeta`] for this object | ||
pub meta: ObjectMeta, | ||
/// The range of bytes returned by this request | ||
pub range: Range<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to make this required, as it allows for accurate buffer sizing among other things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternate would be to make it optional (and read file length directly from the metadata if it was set to None
)?
If that is the tradeoff I agree that always including the sometimes redundant range
is a good choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in this PR looks very good to me. Thank you @tustvold
Is this a breaking API change to object_store (as it changes GetResult)
I also didn't see any tests for this new feature -- I think we should add some, both to cover the chunking as I mentioned inline as well as to ensure that we don't accidentally break this API or its implementation during future refactors
/// The [`ObjectMeta`] for this object | ||
pub meta: ObjectMeta, | ||
/// The range of bytes returned by this request | ||
pub range: Range<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternate would be to make it optional (and read file length directly from the metadata if it was set to None
)?
If that is the tradeoff I agree that always including the sometimes redundant range
is a good choice
@@ -729,54 +719,64 @@ impl GetOptions { | |||
} | |||
|
|||
/// Result for a get request | |||
#[derive(Debug)] | |||
pub struct GetResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for making the fields in this struct pub
? If they are all pub
we can't add fields to GetResult
in the future (such as optional object store specific metadata, for example) without it being a breaking change.
What do you think about leave the fields as non pub
and add accessors / and a
fn into_parts(self) -> (GetResultPayload, ObjectMeta) {
...
}
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the various implementations need to be able to construct this, and so this just seemed simpler
object_store/src/lib.rs
Outdated
) | ||
.boxed() | ||
GetResultPayload::File(file, path) => { | ||
local::chunked_stream(file, path, self.range, 8 * 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping the name of CHUNK_SIZE
for the 8 * 1024
would increase this code's readability
range: Range<usize>, | ||
chunk_size: usize, | ||
) -> BoxStream<'static, Result<Bytes, super::Error>> { | ||
futures::stream::once(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about using tokio::fs but it seems like the warnings on that page are still fairly significant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, at least currently tokio::fs has pretty terrible performance charateristics, I would not recommend using it for anything really. Perhaps at some point io_uring will get sufficiently stable, but that will be Linux specific
}) | ||
.await?; | ||
|
||
let stream = futures::stream::try_unfold( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if the object_store tests have coverage for files that are greater than 8KB in size? Aka is this code covered by tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chunked store tests should provide good coverage of this and make use of various chunk sizes smaller than 8KB - https://github.com/apache/arrow-rs/blob/master/object_store/src/chunked.rs#L210
object_store/src/memory.rs
Outdated
|
||
let (range, data) = match options.range { | ||
Some(range) => { | ||
ensure!(range.end <= data.len(), OutOfRangeSnafu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it occurs to me that these errors would be improved if the included the ranges and lengths as values. I understand that this PR doesn't change the behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thank you @tustvold
#[snafu(display( | ||
"Requested range {}..{} is out of bounds for object with length {}", range.start, range.end, len | ||
))] | ||
OutOfRange { range: Range<usize>, len: usize }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Which issue does this PR close?
Closes #4352
Relates to #4495
Rationale for this change
Not including the byte range results in unexpected behaviour of
GetResult::bytes
.Additionally it is beneficial to return the
ObjectMeta
alongside the returned data, as this is effectively free, and can be useful for additional data validationWhat changes are included in this PR?
Are there any user-facing changes?