Add range and ObjectMeta to GetResult (#4352) (#4495) #4677

tustvold · 2023-08-10T13:39:21Z

Which issue does this PR close?

Closes #4352
Relates to #4495

Rationale for this change

Not including the byte range results in unexpected behaviour of GetResult::bytes.

Additionally it is beneficial to return the ObjectMeta alongside the returned data, as this is effectively free, and can be useful for additional data validation

What changes are included in this PR?

Are there any user-facing changes?

tustvold · 2023-08-10T13:40:41Z

object_store/src/lib.rs

+    /// The [`ObjectMeta`] for this object
+    pub meta: ObjectMeta,
+    /// The range of bytes returned by this request
+    pub range: Range<usize>,


I opted to make this required, as it allows for accurate buffer sizing among other things

The alternate would be to make it optional (and read file length directly from the metadata if it was set to None)?

If that is the tradeoff I agree that always including the sometimes redundant range is a good choice

alamb

The code in this PR looks very good to me. Thank you @tustvold

Is this a breaking API change to object_store (as it changes GetResult)

I also didn't see any tests for this new feature -- I think we should add some, both to cover the chunking as I mentioned inline as well as to ensure that we don't accidentally break this API or its implementation during future refactors

alamb · 2023-08-11T10:31:13Z

object_store/src/lib.rs

+    /// The [`ObjectMeta`] for this object
+    pub meta: ObjectMeta,
+    /// The range of bytes returned by this request
+    pub range: Range<usize>,


The alternate would be to make it optional (and read file length directly from the metadata if it was set to None)?

If that is the tradeoff I agree that always including the sometimes redundant range is a good choice

alamb · 2023-08-11T10:35:15Z

object_store/src/lib.rs

@@ -729,54 +719,64 @@ impl GetOptions {
 }

 /// Result for a get request
+#[derive(Debug)]
+pub struct GetResult {


What is the reason for making the fields in this struct pub? If they are all pub we can't add fields to GetResult in the future (such as optional object store specific metadata, for example) without it being a breaking change.

What do you think about leave the fields as non pub and add accessors / and a

fn into_parts(self) -> (GetResultPayload, ObjectMeta) { ... }

🤔

The problem is the various implementations need to be able to construct this, and so this just seemed simpler

alamb · 2023-08-11T10:36:32Z

object_store/src/lib.rs

-                )
-                .boxed()
+            GetResultPayload::File(file, path) => {
+                local::chunked_stream(file, path, self.range, 8 * 1024)


I think keeping the name of CHUNK_SIZE for the 8 * 1024 would increase this code's readability

alamb · 2023-08-11T10:39:56Z

object_store/src/local.rs

+    range: Range<usize>,
+    chunk_size: usize,
+) -> BoxStream<'static, Result<Bytes, super::Error>> {
+    futures::stream::once(async move {


I was wondering about using tokio::fs but it seems like the warnings on that page are still fairly significant

Yeah, at least currently tokio::fs has pretty terrible performance charateristics, I would not recommend using it for anything really. Perhaps at some point io_uring will get sufficiently stable, but that will be Linux specific

alamb · 2023-08-11T10:40:58Z

object_store/src/local.rs

+        })
+        .await?;
+
+        let stream = futures::stream::try_unfold(


Do you know if the object_store tests have coverage for files that are greater than 8KB in size? Aka is this code covered by tests?

The chunked store tests should provide good coverage of this and make use of various chunk sizes smaller than 8KB - https://github.com/apache/arrow-rs/blob/master/object_store/src/chunked.rs#L210

alamb · 2023-08-11T10:42:32Z

object_store/src/memory.rs


+        let (range, data) = match options.range {
+            Some(range) => {
+                ensure!(range.end <= data.len(), OutOfRangeSnafu);


it occurs to me that these errors would be improved if the included the ranges and lengths as values. I understand that this PR doesn't change the behavior

alamb

LGTM -- thank you @tustvold

alamb · 2023-08-14T10:17:24Z

object_store/src/memory.rs

+    #[snafu(display(
+        "Requested range {}..{} is out of bounds for object with length {}", range.start, range.end, len
+    ))]
+    OutOfRange { range: Range<usize>, len: usize },


Add range and ObjectMeta to GetResult (apache#4352) (apache#4495)

ee615e3

github-actions bot added the object-store Object Store Interface label Aug 10, 2023

tustvold commented Aug 10, 2023

View reviewed changes

alamb reviewed Aug 11, 2023

View reviewed changes

tustvold added the api-change Changes to the arrow API label Aug 11, 2023

tustvold added 3 commits August 11, 2023 13:13

Review feedback

4f00d6b

Merge remote-tracking branch 'upstream/master' into get-result-range

89b6ac6

Fix docs

f449bca

tustvold requested a review from alamb August 11, 2023 15:19

alamb approved these changes Aug 14, 2023

View reviewed changes

tustvold merged commit 820e40a into apache:master Aug 14, 2023

alamb mentioned this pull request Aug 14, 2023

New object_store release - 0.7.0 #4696

Closed

tustvold mentioned this pull request Aug 15, 2023

Add Range to GetResult::File #4352

Closed

JackKelly mentioned this pull request Jan 23, 2024

Does LSIO need to exist?! Does object_store already do everything we need? If not, can we extend object_store instead of creating LSIO? JackKelly/light-speed-io#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add range and ObjectMeta to GetResult (#4352) (#4495) #4677

Add range and ObjectMeta to GetResult (#4352) (#4495) #4677

tustvold commented Aug 10, 2023

tustvold Aug 10, 2023

alamb Aug 11, 2023

alamb left a comment

alamb Aug 11, 2023

alamb Aug 11, 2023

tustvold Aug 11, 2023

alamb Aug 11, 2023

alamb Aug 11, 2023

tustvold Aug 11, 2023

alamb Aug 11, 2023

tustvold Aug 11, 2023

alamb Aug 11, 2023

alamb left a comment

alamb Aug 14, 2023

Add range and ObjectMeta to GetResult (#4352) (#4495) #4677

Add range and ObjectMeta to GetResult (#4352) (#4495) #4677

Conversation

tustvold commented Aug 10, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment