Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix asset update/deletion when associated with zarr #1338

Merged
merged 4 commits into from
Oct 24, 2022

Conversation

jjnesbitt
Copy link
Member

Closes #1336

The reported issue was caused by the following:

  • When a zarr is created and associated to an asset (but not ingested) it initially has a size of zero
  • Ingest can then be performed, giving it a non-zero size
  • If the containing asset is then updated / deleted, all parent paths are updated, decrementing the size of the asset. However, now that size is non-zero, and was never updated, so a possibly zero value is being decremented by a non-zero value, causing the value to be negative, and throwing an integrity error.

This PR updates the ingest_zarr_archive task to delete all asset paths that are associated with the zarr, and then re-add them after the ingestion is finished. This is essentially the same process as updating an asset path (path deletion and immediate addition), but with zarr ingestion in the middle.

@jjnesbitt
Copy link
Member Author

It seems this is still failing for some reason. Investigating...

@jjnesbitt
Copy link
Member Author

@jwodder I can no longer reproduce this error locally, using your MVCE. I was originally able to, but it seems that inexplicably (no local code changes), those now succeed. I've also reset my local DB to no avail. Yet, they still fail in CI.

Would #1339 fix this issue? Even the master tests fail, so it would seem the answer is no, but I'm not totally sure on the inner-workings of those tests.

Are you still able to reproduce this locally (with the changes from this branch)?

@jwodder
Copy link
Member

jwodder commented Oct 21, 2022

@AlmightyYakob Testing locally:

  • The MVCEs I posted fail with a 500 error when run against staging
  • The dandi-cli tests fail when run against a locally-built image of dandi-archive's master branch and pass when using this branch

Note that the CLI integration tests here currently use the master branch of dandi-archive regardless of what PR is being tested. Once #1339 and dandi/dandi-cli#1141 are merged, the tests will go back to using the current PR.

@jjnesbitt
Copy link
Member Author

@AlmightyYakob Testing locally:

  • The MVCEs I posted fail with a 500 error when run against staging
  • The dandi-cli tests fail when run against a locally-built image of dandi-archive's master branch and pass when using this branch

Note that the CLI integration tests here currently use the master branch of dandi-archive regardless of what PR is being tested. Once #1339 and dandi/dandi-cli#1141 are merged, the tests will go back to using the current PR.

I see, that makes sense now. Thanks!

Copy link
Member

@mvandenburgh mvandenburgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about using .iterator(), but this LGTM. I do think that it's not ideal to have to delete and recreate the asset paths on zarr ingest, but this seems more like a flaw in the overall design of Zarr/how we kind of consider zarrs Assets, but not really in some places.

@yarikoptic
Copy link
Member

This PR updates the ingest_zarr_archive task to delete all asset paths that are associated with the zarr, and then re-add them after the ingestion is finished.

are we talking about paths within zarr, ie thousands of them? do you have an estimate of runtime necessary to delete and add e.g. 200k paths?

@jjnesbitt
Copy link
Member Author

are we talking about paths within zarr, ie thousands of them? do you have an estimate of runtime necessary to delete and add e.g. 200k paths?

No, we don't store the paths within zarr files. This is referring to all of the assets that point to zarrs. Zarrs have to be handled differently here, since zarr ingestion can update the asset size, yet subverts the normal asset update process.

@jjnesbitt jjnesbitt merged commit 7c153de into master Oct 24, 2022
@jjnesbitt jjnesbitt deleted the asset-paths-hotfix branch October 24, 2022 16:46
@dandibot
Copy link
Member

dandibot commented Nov 7, 2022

🚀 PR was released in v0.3.0 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New AssetPaths lead to dandi-cli integration testing failing
5 participants