Add clone command to move data between stores #66

AbdelrahmanElawady · 2024-07-18T13:47:20Z

Description

Add clone command to move data between stores.

Copy one location e.g. dir to other location e.g.s3 #50

muhamadazmy · 2024-07-23T08:00:17Z

rfs/src/fungi/meta.rs

@@ -268,6 +268,14 @@ impl Reader {
        Ok(results)
    }

+    pub async fn all_blocks(&self) -> Result<Vec<Block>> {


This can be A LOT for big flists. that has a lot of big files. You need to do this with pages. It can be implemented as an iterator as well or async stream

muhamadazmy · 2024-07-23T08:23:15Z

rfs/src/clone.rs

+const WORKERS: usize = 10;
+const BUFFER: usize = 10;
+
+pub async fn clone<S: Store>(reader: Reader, store: S, cache: Cache<S>) -> Result<()> {


In general, the logic looks okay. You trying to do parallel downloads and uploads which is a great idea of course. But my only concern is that there is no way you can report an error except by logging error. But there is no way you can actually do an early termination when you hit an error, and no way to report with error codes (the clone command will probably exit with 0 exit code) that the clone has been done. While one or multiple blocks has failed to upload.

So my suggestion to fix this is as follows:

You only need a single worker pool (not 2) a worker takes a Block, the worker does both the download from source and the upload to destination (sequentially). This means that all what u need to do is only loop over the blocks and feed the workers.

Since workers are async and there is no way for you to check their results, All workers can has access to a shared "failures list" a worker that fails to upload a single block can set an error in the failure list with enough information.

While feeding the workers with blocks the failure list can be checked periodically to do an early exit (instead of waiting to fail on all blocks) since the clone should not continue anyway if one or more blocks failed.

After feeding all the blocks you do a pool.close() and wait on workers to finish

Once workers are done, you check the failures list one last time.

Return an error (maybe also print id of blocks that failed and why)

Note: Check pack implementation for inspiration on the failure list

…t_add_clone_command

AbdelrahmanElawady added 2 commits July 18, 2024 16:45

Add clone command to move data between stores

56a2c15

Use mpsc channel to parallelize I/O

a8058ff

muhamadazmy requested changes Jul 23, 2024

View reviewed changes

Use a single worker instead of two

370bdc3

rawdaGastan requested a review from muhamadazmy September 22, 2024 08:05

Merge branch 'master' of github.com:threefoldtech/rfs into developmen…

8cfeeb7

…t_add_clone_command

muhamadazmy approved these changes Sep 22, 2024

View reviewed changes

rawdaGastan merged commit ad73f6c into master Sep 22, 2024
2 checks passed

rawdaGastan deleted the development_add_clone_command branch September 22, 2024 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add clone command to move data between stores #66

Add clone command to move data between stores #66

AbdelrahmanElawady commented Jul 18, 2024

muhamadazmy Jul 23, 2024

muhamadazmy Jul 23, 2024

Add clone command to move data between stores #66

Add clone command to move data between stores #66

Conversation

AbdelrahmanElawady commented Jul 18, 2024

Description

muhamadazmy Jul 23, 2024

Choose a reason for hiding this comment

muhamadazmy Jul 23, 2024

Choose a reason for hiding this comment