dandi · jjnesbitt · Dec 11, 2023 · Oct 2, 2023
diff --git a/doc/design/embargo-redesign.md b/doc/design/embargo-redesign.md
@@ -0,0 +1,118 @@
+# Embargo Redesign
+
+Author: Jacob Nesbitt
+
+The current embargo infrastructure setup is both inefficient and prone to error, as we store embargoed data in one bucket, and regular data in another. This means that to unembargo data, it needs to be copied from one bucket to another, which not only costs money, but time as well. Plus, when it comes to the intersection of Zarrs and embargo (”Zarrbargo”), this approach is a non-starter. Therefore, a new approach is required.
+
+## Problems with the Existing Approach
+
+### Inefficiency
+
+Embargoed data is currently uploaded to a bucket separate from the main sponsored bucket. Unembargoing involves copying data from that bucket into the sponsored bucket. Performing this copy by managing individual copy-object commands proved to show major performance issues due to the need for monitoring errors and from the sheer count of objects that need to be copied.
+
+### Error-proneness
+
+As Dandisets grow in size, comprising of more and more assets, the probability rises that the unembargo process—consisting of individually copying objects from bucket to bucket—will fail. While such failures are recoverable, that in turn requires further engineering efforts to make the process self-healing. This brings complexity and thus continued risk of failures for unembargo.
+
+### Zarrbargo non-starterness
+
+The drawbacks mentioned in the previous two sections are all compounded by Zarr archives, which have proven so far to bring data scales one or two orders of magnitude larger than non-Zarr data. The upload time for the largest Zarr we have processed was on the order of one month; copying such a Zarr over to the sponsored bucket would incur another period of time along the same order of magnitude.
+
+Zarr archives, by nature, are made of many small files; Zarr archives of large size may expand to encompass 100000 files or more, raising the probability of a failure during unembargo.
+
+## In-Place Object Tagging
+
+With a simple bucket policy to deny public access to any object with an `embargoed` tag, the public access of an object can be restricted by simply adding the `embargoed` tag to that object.
+
+```json
+{
+			"Effect": "Deny",
+			"Principal": "*",
+			"Action": "s3:*",
+			"Resource": "arn:aws:s3:::dandiarchive/*",
+			"Condition": {
+				"StringEquals": {
+					"s3:ExistingObjectTag/embargoed": "true"
+				},
+				"StringNotEquals": {
+					"aws:PrincipalAccount": "769362853226"
+				}
+			}
+		}
+```
+
+With this bucket policy enacted, by default, all data in the bucket remains public. However, if any objects contains the `embargoed=true` tag, it will be restricted from public access. If authorized users (dandiset owners, etc.) wish to obtain access, a pre-signed URL can be obtained from the API.
+
+## Change to Upload Procedure
+
+This new approach requires a change to the upload procedure, so that uploads to embargoed dandisets are tagged with the `embargoed` S3 tag. This can be achieved by adding the `embargoed=true` tag as part of the pre-signed put object URL that is issued to the user when uploading an embargoed file, such that if the ******client****** doesn’t include that tag, the upload will fail.
+
+A diagram of the upload procedure is shown below
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant S3
+    participant Client as dandi-cli
+    participant Server
+
+    Client ->> Server: Request pre-signed S3 upload URL for embargoed dandiset
+    Server ->> Client: Pre-signed URL with embargoed tag included
+    Client ->> S3: Upload file with embargoed tag
+    Client ->> Server: Finalize embargoed file upload
+    Server ->> S3: Server verifies access to embargoed file and mints new asset blob
+    rect rgb(235, 64, 52)
+        Client -->> S3: Unauthorized access is denied
+    end
+    Client->>Server: Request pre-signed URL for embargoed file access
+    Server->>Client: If user is permitted, a pre-signed URL is returned
+    rect rgb(179, 209, 95)
+        Client->>S3: Embargoed file is successfully accessed
+    end
+```
+
+## Change to Un-Embargo Procedure
+
+Once the time comes to *********un-embargo********* those files, all that is required is to remove the `embargoed` tag from all of the objects. This can be achieved by an [S3 Batch Operations Job](https://docs.aws.amazon.com/AmazonS3/latest/userguide/batch-ops-create-job.html), in which the list of files is specified (all files belonging to the dandiset), and the desired action is specified (delete/replace tags).
+
+The benefit of this approach is that once the files are uploaded, no further movement is required to change the embargo state, eliminating the storage, egress, and time costs associated with unembargoing from a second bucket. Using S3 Batch Operations to perform the untagging also means we can rely on AWS’s own error reporting mechanisms, while retrying any failed operations requires only minimal engineering effort within the Archive codebase.
+
+### Object Tag Removal Workflow
+
+1. [Create Job](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3control/client/create_job.html) from celery task, storing the resulting Job ID in the dandiset model
+2. Use a recurring celery task cron job to check any dandisets with a status of “unembargoing” and a not null “job ID” field, to see if they’ve finished using [describe_job](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3control/client/describe_job.html)
+3. Once an s3 batch job is found that’s completed, the manifest is downloaded from S3, to ensure that there were no failures
+4. If there are no failures, the Job ID is set to null in the DB model, and the embargo status, metadata, etc. is updated to reflect that the dandiset is now `OPEN`.
+5. Otherwise, an exception is raised and attended to by the developers.
+
+A diagram of the un-embargo procedure (pertaining to just the objects) is shown below
+
+```mermaid
+sequenceDiagram
+    autonumber
+    participant Client
+    participant Server
+    participant Worker
+    participant S3
+
+    Client ->> Server: Un-embargo dandiset
+    Server ->> Worker: Dispatch un-embargo task
+    Worker ->> S3: List of all dandiset objects are aggregated into a manifest
+    Worker ->> S3: S3 Batch Operation job created
+    S3 ->> Worker: Job ID is returned
+    Worker ->> Server: Job ID is stored in the database
+    S3 ->> S3: Tags on all objects in the supplied manifest are removed
+    Note over Worker,S3: After some time, a cron job is run <br> which checks the status of the S3 job
+		Worker ->> Server: Job ID is retrieved
+    Worker ->> S3: Job status retrieved, worker observes that <br> the job has finished and was successful
+    Worker ->> Server: Job ID is cleared, dandiset embargo status is set to OPEN
+
+    rect rgb(179, 209, 95)
+        Client ->> S3: Data is now publicly accessible
+    end
+```
+
+## Experimental Results
+
+- Deleting tags of 10,000 objects took ~18 seconds
+- Deleting tags of 100,000 objects took ~2 minutes