Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing zarr_id in cancel_zarr_upload #1154

Merged
merged 3 commits into from
Jul 1, 2022
Merged

Conversation

jjnesbitt
Copy link
Member

I believe this is the root cause of dandi/dandi-cli#1028, it seems that zarr upload cancelling has never worked from the API.

This was never caught, as while the function did result in an exception being raised, celery didn't propagate the exception up to the test function, and so all seemed good. To mitigate anything like this happening in the future, I've added the CELERY_TASK_EAGER_PROPAGATES setting to our testing configuration. This allows exceptions to propagate up.

Adding that setting also caused an error in the test_dandiset_rest_unembargo function to show up, which this PR fixes.

@jjnesbitt jjnesbitt added release Create a release when this pr is merged bug Something isn't working patch Increment the patch version when merged and removed bug Something isn't working labels Jul 1, 2022
@@ -227,7 +227,7 @@ def upload_cancel(self, request, zarr_id):
raise ValidationError('No upload to cancel.')
# Cancelling involves deleting any data uploaded to S3, which involves a batch of S3 API
# requests. These are done in a task to avoid Heroku request timeouts.
cancel_zarr_upload.delay()
cancel_zarr_upload.delay(zarr_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh , that is the answer to my question here I guess.

Since it is .delayed, would it mean that here we would schedule a task to delete those uploaded files (BTW why???) i.e. some file {zarr_id}/sub/path might be removed later? What if right after canceling prior upload, I immediately perform a new upload of {zarr_id}/sub/path -- couldn't it get removed by this delayed task?

overall -- I am not sure if there is a point in "cleansing" the space of zarr upon canceling the upload. dandi-cli already should take care about removing unnecessary or "replacing" incomplete files on S3 (@jwodder an correct me if I am wrong).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is .delayed, would it mean that here we would schedule a task to delete those uploaded files (BTW why???) i.e. some file {zarr_id}/sub/path might be removed later?

The reason it's delayed is because there might be a lot of active uploads we need to delete, and we don't want our request to timeout due to heroku's 30 seconds request limit (it's also just good practice to have short requests). Only one "batch" of files is ever allowed to be uploaded at once, so if you cancelled a zarr upload, you'd only be "deleting" any files in that batch.

What if right after canceling prior upload, I immediately perform a new upload of {zarr_id}/sub/path -- couldn't it get removed by this delayed task?

If you try to upload before the cancel task has finished, it will fail, since it checks if there are any active uploads still present in the system.

overall -- I am not sure if there is a point in "cleansing" the space of zarr upon canceling the upload. dandi-cli already should take care about removing unnecessary or "replacing" incomplete files on S3 (@jwodder an correct me if I am wrong).

I don't think the generated presigned URLs will allow for deletion (or really anything besides upload), so afaik this is still a concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha -- so we are safe but we will need to wait for cancel to complete and delete all those files. So let's get to that 2nd aspect since I don't think we should do that since there is no need really since client can do that jst as well as part of its upload and may be even avoid delete/reuploading some which successfully finished uploading before upload was canceled:

There is an endpoint DELETE ​/zarr​/{zarr_id}​/files​/ Delete files from a zarr archive. zarr_delete_files to be used to delete files within zarr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that the client could just call that endpoint, but if the client doesn't, then that could cause problems if trying to re-upload the same file, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it? I would have expected just file getting overwritten (well, placing a newer version on top of older since bucket is versioned).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you're right, the object is just overwritten.

@jjnesbitt jjnesbitt merged commit d9f41b3 into master Jul 1, 2022
@jjnesbitt jjnesbitt deleted the fix-cancel-zarr-upload branch July 1, 2022 20:57
@dandibot
Copy link
Member

dandibot commented Jul 1, 2022

🚀 PR was released in v0.2.30 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged release Create a release when this pr is merged released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants