-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple support for remote web stores #7324
Comments
3 tasks
pdurbin
added a commit
to GlobalDataverseCommunityConsortium/dataverse
that referenced
this issue
Aug 5, 2022
qqmyers
pushed a commit
to GlobalDataverseCommunityConsortium/dataverse
that referenced
this issue
Aug 5, 2022
pdurbin
added a commit
to GlobalDataverseCommunityConsortium/dataverse
that referenced
this issue
Aug 8, 2022
pdurbin
added a commit
to GlobalDataverseCommunityConsortium/dataverse
that referenced
this issue
Aug 8, 2022
pdurbin
added a commit
to GlobalDataverseCommunityConsortium/dataverse
that referenced
this issue
Aug 9, 2022
pdurbin
added a commit
to GlobalDataverseCommunityConsortium/dataverse
that referenced
this issue
Aug 10, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Based on use cases from Odum's TRSA (see #5213) and recent work to support remote uploads, I'm suggesting a mechanism extending the current StorageIO mechanism that would allow Dataverse management of a file at a remote URL 'as though' it were in S3. Recognizing that there are many potential remote stores that might fit this model, I've created a design document and a proof-of-concept implementation (#7325) to share with the community.
The basic concept is to treat the file URL as read-only and to retrieve it's bytes or provide a download redirect in the same way that the S3 store manages file access. However, since the remote web store is assumed ~read-only in this design, any/all derived files (thumbnails, ingested versions, provenance files, etc.) are managed by an underlying S3 or File store.
This mechanism works for public URLs. I've also suggested/implemented a URL presigning mechanism, roughly analogous to what S3 stores use to provide secure download URLs, that could be used by a remote store to verify that Dataverse was the source of the request (which would only be made if the user is allowed access per Dataverse's configured controls) and only allow access when Dataverse has 'pre-approved' the request. (The code includes java code to sign and validate these requests - validation implemented as module/filter for common web servers could simplify what needs to be done at the remote store.)
I've created this issue, design doc, and draft PR to encourage discussion and get feedback. Does this mechanism, or extensions of it, support use cases from other community members? Are there alternative designs that could be made general and fit well into Dataverse's architecture? Are there concerns about the mechanism proposed?
The text was updated successfully, but these errors were encountered: