-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust): lazy commit, draft #2601
Conversation
@alexwilcoxson-rel, sorry for leaving you hanging so long. Please help refresh my mind :). the use case would be that we want to write into a table where we MIGHT require additional actions? I.e. for append only we could use The I guess my main question - an bear in mind that i need a refresher on our discussion :) - can we serve your use case with |
Thanks @roeap ! In our use case we have the following:
Our implementation though is getting more requirements and complexity though. I worry with Snapshot only we'll miss some edge case where we need to check for conflicts when version already exists. |
after much more digging, the primary issue we're seeing is the amount of memory used during checkpointing not commit i tested with some of our larger (many small files pre optimize) and those tables use GBs of memory which is actually reported here: #2628 that said we've already done some work on our end to do some pre optimization of files prior to commit to reduce noise in the commit log and that is helping so therefore i don't think we necessarily have a need to act on this for our use case right now. so i'm happy to close this also landed this yesterday to help with larger checkpoints: #2717 |
yeah, our checkpoint writing needs some updating as well. there is a lot of code we would no longer need since we are now dependent on arrow anyhow. with the new arrow backend we "might" be able to do a more efficient replay / checkpoint writing, however we will always need a full replay :(.
if there is no use case right now I would hold off, and just hope we can get our "reagular" handling as efficient as possible... there are some opportunities after all. |
I'm closing this but am going to open something else to track improving checkpoint serialization. |
Description
This is just a draft to see what you all think about this way to achieve commits without fully loading table state. This is useful for append only, low/no concurrency workflows, since we would be unlikely to have a version conflict and need to check for Conflicts. Perhaps further the Conflict checker could lazily load the state if the type of conflict resolution would require it.
Putting this out as a draft as I've talked with @roeap about it and it could help a case that came up on Slack today.
Related Issue(s)
Documentation