Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify zarr upload process design #1464

Merged
merged 3 commits into from
Feb 6, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 16 additions & 7 deletions doc/design/zarr-performance-redesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,20 +105,22 @@ sequenceDiagram
Client->>+S3: Upload individual file using signed URL
end

Client->>+Server: Finalize Zarr Archive (w/ local checksum) ⚡⚡⚡
Client->>+Server: Finalize Zarr Archive ⚡⚡⚡
Server-->>-Client: PENDING Zarr Archive

rect rgb(179, 209, 95)
Server->>+Worker: Compute tree checksum for Zarr Archive, <br/> compare against provided checksum.
Server->>+Worker: Compute tree checksum for Zarr Archive
end

loop
Client->>+Server: Check for COMPLETE Zarr Archive
Server-->>-Client: PENDING/INGESTING/COMPLETE/MISMATCH Zarr Archive
Server-->>-Client: PENDING/INGESTING/COMPLETE Zarr Archive
end

Client->>+Client: Verify zarr checksum with local
```

(Step 1): **`dandi-cli` computes the Zarr checksum locally**. (Note that this step can actually be taken any time before step 7; it is listed here arbitrarily.)
(Step 1): **`dandi-cli` computes the Zarr checksum locally**. (Note that this step can actually be taken any time before step 12; it is listed here arbitrarily.)

(Steps 2 and 3): `dandi-cli` asks the server to create a new Zarr archive, which is put into the `PENDING` state.

Expand All @@ -129,11 +131,11 @@ Important notes:

(Step 6): `dandi-cli` uses these URLs to upload the files **using S3's `Content-MD5` header to verify the uploaded file's integrity**. **Instead of finalizing a batch (since there is no longer a batch concept), `dandi-cli` repeats these steps until all files are uploaded (repeating steps 4, 5, and 6).** (Note that `dandi-cli`'s actual strategy here may be more nuanced than a simple loop as depicted above; instead, it might maintain a queue of files and a set of files "in flight", replenishing them according to some dynamic batching strategy, etc. In any such strategy, some combination of steps 4, 5, and 6 will repeat until all files are uploaded.)

(Steps 7 and 8): **`dandi-cli` asks the server to finalize the Zarr, providing the locally computed zarr checksum to compare against**.
(Steps 7 and 8): **`dandi-cli` asks the server to finalize the Zarr**.

(Step 9): The server kicks off an asynchronous task to compute the treewise Zarr checksum, comparing it against the checksum provided in step 7 and updating the status of the Zarr once it's finished.
(Step 9): The server kicks off an asynchronous task to compute the treewise Zarr checksum, updating the status of the Zarr once it's finished.

(Steps 10 and 11): Meanwhile, `dandi-cli` sits in a loop checking for the Zarr archive to reach either the `COMPLETE` state, or if there was an error, the `MISMATCH` state.
(Steps 10 and 11): Meanwhile, `dandi-cli` sits in a loop checking for the Zarr archive to reach the `COMPLETE` state.

### Benefits

Expand Down Expand Up @@ -200,3 +202,10 @@ with this proposal. Some of the work in `dandi-archive` has already been done,
and the rest is on par with the changes needed for `dandi-cli`. If all concerns
with the plan are allayed, then it should not be difficult to execute the plan
and gain significant performance for Zarr upload.

---
## Previous Design

We previously included extra functionality, which involved *including* the locally computed checksum when finalizing the zarr archive (step 7), and adding a `MISMATCH` state to the zarr `status` field, which would be set if the checksum produced by the asynchronous zarr checksum task didn't match the checksum provided in step 7.

This addition was later reverted in the interest of simplicity, as well as the fact that it is funtionally equivalent to the current design.