Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clone command to move data between stores #66

Merged
merged 4 commits into from
Sep 22, 2024

Conversation

AbdelrahmanElawady
Copy link
Contributor

Description

Add clone command to move data between stores.

@@ -268,6 +268,14 @@ impl Reader {
Ok(results)
}

pub async fn all_blocks(&self) -> Result<Vec<Block>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be A LOT for big flists. that has a lot of big files. You need to do this with pages. It can be implemented as an iterator as well or async stream

const WORKERS: usize = 10;
const BUFFER: usize = 10;

pub async fn clone<S: Store>(reader: Reader, store: S, cache: Cache<S>) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the logic looks okay. You trying to do parallel downloads and uploads which is a great idea of course. But my only concern is that there is no way you can report an error except by logging error. But there is no way you can actually do an early termination when you hit an error, and no way to report with error codes (the clone command will probably exit with 0 exit code) that the clone has been done. While one or multiple blocks has failed to upload.

So my suggestion to fix this is as follows:

  • You only need a single worker pool (not 2) a worker takes a Block, the worker does both the download from source and the upload to destination (sequentially). This means that all what u need to do is only loop over the blocks and feed the workers.
  • Since workers are async and there is no way for you to check their results, All workers can has access to a shared "failures list" a worker that fails to upload a single block can set an error in the failure list with enough information.
  • While feeding the workers with blocks the failure list can be checked periodically to do an early exit (instead of waiting to fail on all blocks) since the clone should not continue anyway if one or more blocks failed.
  • After feeding all the blocks you do a pool.close() and wait on workers to finish
  • Once workers are done, you check the failures list one last time.
  • Return an error (maybe also print id of blocks that failed and why)

Note: Check pack implementation for inspiration on the failure list

@rawdaGastan rawdaGastan merged commit ad73f6c into master Sep 22, 2024
2 checks passed
@rawdaGastan rawdaGastan deleted the development_add_clone_command branch September 22, 2024 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants