Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync reader with Seek+Read implementation #5

Merged
merged 14 commits into from
Aug 31, 2023
Merged

Sync reader with Seek+Read implementation #5

merged 14 commits into from
Aug 31, 2023

Conversation

pka
Copy link
Owner

@pka pka commented Aug 29, 2023

A first attempt implementing a sync API with Seek+Read support.

To have a similar API I've also changed the get_range API. What do you think @michaelkirk?

Module doc examples:

// Async API (without feature `sync`):
let mut client = BufferedHttpRangeClient::new("https://flatgeobuf.org/test/data/countries.fgb");
let bytes = client.min_req_size(256).get_range(0, 3).await?;
assert_eq!(bytes, "fgb".as_bytes());
let version = client.get_bytes(1).await?; // From buffer - no HTTP request!
assert_eq!(version, &[3]);

// Blocking API (with feature `sync`):
let mut client = BufferedHttpRangeClient::new("https://flatgeobuf.org/test/data/countries.fgb");
let bytes = client.min_req_size(256).get_range(0, 3)?;
assert_eq!(bytes, "fgb".as_bytes());
let version = client.get_bytes(1)?; // From buffer - no HTTP request!
assert_eq!(version, &[3]);

// Seek+Read API (with feature `sync`):
use std::io::{Read, Seek, SeekFrom};
let mut client = BufferedHttpRangeClient::new("https://flatgeobuf.org/test/data/countries.fgb");
client.seek(SeekFrom::Start(3)).ok();
let mut version = [0; 1];
client.min_req_size(256).read_exact(&mut version)?;
assert_eq!(&version, &[3]);
let mut bytes = [0; 3];
client.read_exact(&mut bytes)?;
assert_eq!(&bytes, b"fgb");

@michaelkirk
Copy link
Contributor

This seems reasonable. Are these in pursuit of some changes you're hoping to make to the FGB client? Or some other project?

I can understand why Seek might be useful, and I can understand why someone might prefer a sync client, but are they necessarily dependendent on eachother? e.g. Is there a reason that Seek isn't supported for the async context?

// Seek+Read API (with feature `sync`):
use std::io::{Read, Seek, SeekFrom};
let mut client = BufferedHttpRangeClient::new("https://flatgeobuf.org/test/data/countries.fgb");
client.seek(SeekFrom::Start(3)).ok();
let mut version = [0; 1];
client.min_req_size(256).read_exact(&mut version)?;
assert_eq!(&version, &[3]);
let mut bytes = [0; 3];
client.read_exact(&mut bytes)?;
+ // This confused me for a minute! 😆 
+ // The "fgb" we're reading here is not MAGIC_BYTES[0..3], but rather MAGIC_BYTES[5..7] (right?)
+ // I forgot that "fgb" was in the MAGIC_BYTES twice!
assert_eq!(&bytes, b"fgb");

@pka
Copy link
Owner Author

pka commented Aug 30, 2023

Implementing Seek+Read makes it a drop-in replacement for local files in most libraries. AsyncSeek+AsyncRead isn't very common yet. The motivation was HTTP support for cloud optimized formats like GeoTIFF (pka/georaster@34617dc) and COPC (pka/copc-rs@ea5d662). For FlatGeoBuf we have a specialized async HTTP reader, so there is no benefit for it the only benefit is the possibilty for reading from HTTP in a sync environment.

@pka pka force-pushed the seek-read branch 2 times, most recently from 8ce3464 to 6af6d3a Compare August 30, 2023 21:18
@pka
Copy link
Owner Author

pka commented Aug 30, 2023

Made features additive and fixed #3.

New Seek+Read example is hopefully less puzzling:

// Seek+Read API
use std::io::{Read, Seek, SeekFrom};
let mut client = HttpReader::new("https://www.rust-lang.org/static/images/favicon-32x32.png");
client.seek(SeekFrom::Start(1)).ok();
let mut bytes = [0; 3];
client.read_exact(&mut bytes)?;
assert_eq!(&bytes, b"PNG");

Copy link
Contributor

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a minor hangup on the logging feature, but it's not critical if you want to leave it. Overall this seems like a useful addition, nice work.

Cargo.toml Outdated
@@ -11,15 +11,22 @@ license = "MIT/Apache-2.0"
keywords = ["http", "reader", "buffer"]

[features]
default = ["reqwest"]
default = ["reqwest-sync", "reqwest-async"]
logging = ["log"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More typically, logging output is handled by configuring LOG_LEVEL, e.g. LOG_LEVEL=debug,http-range-client-debug=trace.

If you are worried about bloat from the log messages, the end user can compile out certain levels altogether via: https://docs.rs/log/latest/log/index.html#compile-time-filters

I appreciate that log is so ubiquitous that I don't need to look up documentation for every crate to know how to enable it's logging. Is there a reason to diverge?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do a little bit more than logging by aggregating some read statstics, which is not necessary in some cases. But I'm fine with removing ths feature flag, since reqwest depends on the logger anyway.

let mut bytes = [0; 8];
// We could only read 4 bytes in this case
client.min_req_size(4).read(&mut bytes)?;
assert_eq!(bytes, [b'f', b'g', b'b', 3, b'f', b'g', b'b', 0]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ultimately the client is waiting to issue a second http request here within the call to read? I guess that surprises me a little, but it is what it is.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just one HTTP request, but in this case read would allow us to return 4 bytes only. For that, we had to distinguish between read_all and read In the buffer logic, which isn't worth it, IMO.

@pka pka merged commit f9ec7ca into main Aug 31, 2023
6 checks passed
@pka pka deleted the seek-read branch August 31, 2023 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants