You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In opendal v0.46, we've introduced chunked reading as the sole option.
I designed this API with a focus on high-throughput applications such as databend, greptime and iceberg-rust. The new API allows those applications to read file concurrenty by simple:
let r = op.reader_with(path).concurrent(16).chunk(4*1024*1024).await?;
However, chunked reading increases latency and memory usage in use cases such as reading files from start to finish and processing content in a streaming manner, as demonstrated by another user, risingwave.
Risingwave will concurrently read hundreds of files, perform a merge sort on the retrieved content, and then compact and upload the results. Without enabling streaming read, they will continue to encounter out-of-memory errors, even in the simplest scenarios.
So the only way is bring streaming read back. I introduced two different reader which controled by chunk:
If chunk is not set, data will be read in a streaming manner, requiring only a minimal internal buffer in memory.
If chunk is set, data will be read in chunks. Users can also enable concurrent processing to fetch multiple chunks simultaneously.
In opendal v0.46, we've introduced chunked reading as the sole option.
I designed this API with a focus on high-throughput applications such as databend, greptime and iceberg-rust. The new API allows those applications to read file concurrenty by simple:
However, chunked reading increases latency and memory usage in use cases such as reading files from start to finish and processing content in a streaming manner, as demonstrated by another user, risingwave.
Risingwave will concurrently read hundreds of files, perform a merge sort on the retrieved content, and then compact and upload the results. Without enabling streaming read, they will continue to encounter out-of-memory errors, even in the simplest scenarios.
So the only way is bring streaming read back. I introduced two different reader which controled by
chunk
:chunk
is not set, data will be read in a streaming manner, requiring only a minimal internal buffer in memory.chunk
is set, data will be read in chunks. Users can also enableconcurrent
processing to fetch multiple chunks simultaneously.Tasks
concurrent limit needs to adapt the new logicThe text was updated successfully, but these errors were encountered: