Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embargo re-design doc #1772

Merged
merged 1 commit into from
Dec 11, 2023
Merged

Add embargo re-design doc #1772

merged 1 commit into from
Dec 11, 2023

Conversation

jjnesbitt
Copy link
Member

This PR adds the design doc that outlines the new embargo design, with unified bucket storage.

@satra
Copy link
Member

satra commented Dec 1, 2023

@AlmightyYakob - a few folks have asked for unembargoing a subset of files. they haven't gone into discussing whether this becomes a new dandiset (probably initial approach) or whether a dandiset contains both open and embargoed files (future approach, also related to restricted data in dandiset). also, in the current implementation, one can add a public file to an embargoed dandiset and it won't copy it over to the embargoed bucket.

and this doesn't need to be implemented right away, but this use case may help think about design and ReBAC that we will likely need to implement in the future.

@jjnesbitt
Copy link
Member Author

@AlmightyYakob - a few folks have asked for unembargoing a subset of files. they haven't gone into discussing whether this becomes a new dandiset (probably initial approach) or whether a dandiset contains both open and embargoed files (future approach, also related to restricted data in dandiset). also, in the current implementation, one can add a public file to an embargoed dandiset and it won't copy it over to the embargoed bucket.

and this doesn't need to be implemented right away, but this use case may help think about design and ReBAC that we will likely need to implement in the future.

As far as I can tell, this design allows for that use case, as public read access is permitted on a per-object basis.

@waxlamp
Copy link
Member

waxlamp commented Dec 11, 2023

@AlmightyYakob - a few folks have asked for unembargoing a subset of files. they haven't gone into discussing whether this becomes a new dandiset (probably initial approach) or whether a dandiset contains both open and embargoed files (future approach, also related to restricted data in dandiset). also, in the current implementation, one can add a public file to an embargoed dandiset and it won't copy it over to the embargoed bucket.

and this doesn't need to be implemented right away, but this use case may help think about design and ReBAC that we will likely need to implement in the future.

@satra, I assume these ideas are not blocking development of Jake's proposal. If you would like, we can make mention of these things in Jake's doc as future directions to address, especially if that lets Jake get moving on this proposed work.

@satra
Copy link
Member

satra commented Dec 11, 2023

no, these are non-blocking.

the only comment i have is on the worker job for unembargoing. is there something that catches failure modes there, and what happens when it fails/partially fails?

@jjnesbitt
Copy link
Member Author

the only comment i have is on the worker job for unembargoing. is there something that catches failure modes there, and what happens when it fails/partially fails?

The S3 Batch job returns a report on if there were any failures per object, and what the reason was (after the auto-retrying of trivial errors). So we would be able to see if there was a problem with anything.

@satra
Copy link
Member

satra commented Dec 11, 2023

perhaps it should return those in some form to the user in the validation page.

@jjnesbitt
Copy link
Member Author

perhaps it should return those in some form to the user in the validation page.

I don't think so. Failures in this process indicate an error internally, and should be hidden from the user (we could just say "there was an error un-embargoing this data). There's nothing the user can do with the details of the failure to resolve the situation, and would just lead to more clutter, confusion, etc.

If it were to fail, we'd get a sentry error about it, or in the worst case scenario, the user could say "hey this failed", and we'd look at the details of it to see what happened.

@satra
Copy link
Member

satra commented Dec 11, 2023

Failures in this process indicate an error internally

isn't this the same thing we are seeing with some of the validation logic, and nothing informs us or the user really when failures exist (e.g. stuck in pending validation). hence it would be good to have some kind of timeout or process so the users don't have to manually check when the unembargoing is done. i.e. something that monitors the process and has some sanity checks in it based on size/number of objects being unembargoed.

the human thing hasn't been very robust given the number of issues i have myself filed about a weird state.

i agree the user shouldn't have to bother with this, but the system should have something that notifies/handles errors or stuck states for us.

@jjnesbitt jjnesbitt merged commit 50abf21 into master Dec 11, 2023
10 checks passed
@jjnesbitt jjnesbitt deleted the new-embargo-design-doc branch December 11, 2023 21:22
@dandibot
Copy link
Member

🚀 PR was released in v0.3.67 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants