Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Bring Streaming Read Back #4672

Closed
6 tasks done
Tracked by #4640
Xuanwo opened this issue Jun 2, 2024 · 2 comments
Closed
6 tasks done
Tracked by #4640

core: Bring Streaming Read Back #4672

Xuanwo opened this issue Jun 2, 2024 · 2 comments

Comments

@Xuanwo
Copy link
Member

Xuanwo commented Jun 2, 2024

In opendal v0.46, we've introduced chunked reading as the sole option.

I designed this API with a focus on high-throughput applications such as databend, greptime and iceberg-rust. The new API allows those applications to read file concurrenty by simple:

let r = op.reader_with(path).concurrent(16).chunk(4 * 1024 * 1024).await?;

However, chunked reading increases latency and memory usage in use cases such as reading files from start to finish and processing content in a streaming manner, as demonstrated by another user, risingwave.

Risingwave will concurrently read hundreds of files, perform a merge sort on the retrieved content, and then compact and upload the results. Without enabling streaming read, they will continue to encounter out-of-memory errors, even in the simplest scenarios.

So the only way is bring streaming read back. I introduced two different reader which controled by chunk:

  • If chunk is not set, data will be read in a streaming manner, requiring only a minimal internal buffer in memory.
  • If chunk is set, data will be read in chunks. Users can also enable concurrent processing to fetch multiple chunks simultaneously.

Tasks

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Jun 3, 2024

This refactor is very meaningful, allowing users to choose the reading mode according to the scenario.

@Xuanwo
Copy link
Member Author

Xuanwo commented Jun 5, 2024

All needed feature has been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants