You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should change it so it uses put_multipart and AsyncArrowWriter. That would make the type AsyncArrowWriter<Box<dyn AsyncWrite + Unpin + Send>>
What we want to do instead is combine
"
Another thing is to explore the Datafusion sink functionality as per suggestion of @roeap
The text was updated successfully, but these errors were encountered:
btw. had a quick look into the datafusion sinks, and I believe they may not be the best fit for us, considering the work delta needs to do on write. More specifically I had look if it would make sense to implement a FileFormat for Delta...
The TableProvider does however have more methods available now, that integrate into the framkework - they also did some really great work integrating with @wjones127's multi-part writer....
Description
The rust writer in it current state keeps a buffer instead of steaming to disk which causes the writer use quite some extra memory.
We need to address this performance issue.
@wjones127 mentioned what needs to happen here: https://delta-users.slack.com/archives/C013LCAEB98/p1700330750311529?thread_ts=1700325888.484149&cid=C013LCAEB98
"
What we want to do instead is stream out to disk. Right now the writer is ArrowWriter :
delta-rs/crates/deltalake-core/src/writer/record_batch.rs
Line 252 in daa700e
We should change it so it uses put_multipart and AsyncArrowWriter. That would make the type AsyncArrowWriter<Box<dyn AsyncWrite + Unpin + Send>>
What we want to do instead is combine
"
Another thing is to explore the Datafusion sink functionality as per suggestion of @roeap
The text was updated successfully, but these errors were encountered: