-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add clone command to move data between stores #66
Conversation
rfs/src/fungi/meta.rs
Outdated
@@ -268,6 +268,14 @@ impl Reader { | |||
Ok(results) | |||
} | |||
|
|||
pub async fn all_blocks(&self) -> Result<Vec<Block>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be A LOT for big flists. that has a lot of big files. You need to do this with pages. It can be implemented as an iterator as well or async stream
const WORKERS: usize = 10; | ||
const BUFFER: usize = 10; | ||
|
||
pub async fn clone<S: Store>(reader: Reader, store: S, cache: Cache<S>) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the logic looks okay. You trying to do parallel downloads and uploads which is a great idea of course. But my only concern is that there is no way you can report an error except by logging error. But there is no way you can actually do an early termination when you hit an error, and no way to report with error codes (the clone command will probably exit with 0 exit code) that the clone has been done. While one or multiple blocks has failed to upload.
So my suggestion to fix this is as follows:
- You only need a single worker pool (not 2) a worker takes a Block, the worker does both the download from source and the upload to destination (sequentially). This means that all what u need to do is only loop over the blocks and feed the workers.
- Since workers are async and there is no way for you to check their results, All workers can has access to a shared "failures list" a worker that fails to upload a single block can set an error in the failure list with enough information.
- While feeding the workers with blocks the failure list can be checked periodically to do an early exit (instead of waiting to fail on all blocks) since the clone should not continue anyway if one or more blocks failed.
- After feeding all the blocks you do a pool.close() and wait on workers to finish
- Once workers are done, you check the failures list one last time.
- Return an error (maybe also print id of blocks that failed and why)
Note: Check pack
implementation for inspiration on the failure list
…t_add_clone_command
Description
Add clone command to move data between stores.