core: Bring Streaming Read Back #4672

Xuanwo · 2024-06-02T09:23:17Z

In opendal v0.46, we've introduced chunked reading as the sole option.

I designed this API with a focus on high-throughput applications such as databend, greptime and iceberg-rust. The new API allows those applications to read file concurrenty by simple:

let r = op.reader_with(path).concurrent(16).chunk(4 * 1024 * 1024).await?;

However, chunked reading increases latency and memory usage in use cases such as reading files from start to finish and processing content in a streaming manner, as demonstrated by another user, risingwave.

Risingwave will concurrently read hundreds of files, perform a merge sort on the retrieved content, and then compact and upload the results. Without enabling streaming read, they will continue to encounter out-of-memory errors, even in the simplest scenarios.

So the only way is bring streaming read back. I introduced two different reader which controled by chunk:

If chunk is not set, data will be read in a streaming manner, requiring only a minimal internal buffer in memory.
If chunk is set, data will be read in chunks. Users can also enable concurrent processing to fetch multiple chunks simultaneously.

Tasks

feat(core): Streaming reading while chunk is not set #4658
feat(core): Add more context in error context #4673
feat: Implement retry for streaming based read #4683
feat(core): Implement TimeoutLayer for concurrent tasks #4688
feat(core): Add reader size check in complete reader #4690
~~concurrent limit needs to adapt the new logic~~

The text was updated successfully, but these errors were encountered:

wcy-fdu · 2024-06-03T05:23:20Z

This refactor is very meaningful, allowing users to choose the reading mode according to the scenario.

Xuanwo · 2024-06-05T11:39:00Z

All needed feature has been implemented.

This was referenced Jun 2, 2024

feat(core): Streaming reading while chunk is not set #4658

Merged

feat(core): Add more context in error context #4673

Merged

Xuanwo mentioned this issue Jun 3, 2024

Tracking issues of OpenDAL 0.47.0 Release #4640

Closed

21 tasks

tisonkun added the release-blocker label Jun 4, 2024

Xuanwo closed this as completed Jun 5, 2024

tisonkun mentioned this issue Jun 27, 2024

build(deps): Upgrade OpenDAL to 0.47 GreptimeTeam/greptimedb#4224

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: Bring Streaming Read Back #4672

core: Bring Streaming Read Back #4672

Xuanwo commented Jun 2, 2024 •

edited by tisonkun

Loading

wcy-fdu commented Jun 3, 2024

Xuanwo commented Jun 5, 2024

core: Bring Streaming Read Back #4672

core: Bring Streaming Read Back #4672

Comments

Xuanwo commented Jun 2, 2024 • edited by tisonkun Loading

Tasks

wcy-fdu commented Jun 3, 2024

Xuanwo commented Jun 5, 2024

Xuanwo commented Jun 2, 2024 •

edited by tisonkun

Loading