-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
9002 allow direct upload setting #9003
9002 allow direct upload setting #9003
Conversation
@@ -545,7 +545,8 @@ List of S3 Storage Options | |||
dataverse.files.<id>.label <?> **Required** label to be shown in the UI for this storage (none) | |||
dataverse.files.<id>.bucket-name <?> The bucket name. See above. (none) | |||
dataverse.files.<id>.download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false`` | |||
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false`` | |||
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset in the S3 store. ``false`` | |||
dataverse.files.<id>.api-direct-upload ``true``/``false`` Enable direct upload of files added to a dataset via API only. ``false`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left notes in the issue discussing whether this might be better as an allow-out-of-band option that could be moved outside of the S3 section. In that case it might be restricted to superuser if desired. In any case, it would work as in this PR w.r.t. having upload-redirect on supercedes it (you need to be able to call the /addFiles endpoint for directupload so it doesn't make sense to have upload-redirect=true and api-direct-upload (or allow-out-of-band-upload)=false.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have renamed the option to allow-out-of-band-upload. I have also removed it from the s3 section and added this documentation in the general section:
When using integration tools, dataverse installation can be configured to allow out of band upload by setting the dataverse.files.\<id\>.allow-out-of-band-upload
JVM option to true
.
Files can be then uploaded by an integration tool with datasets/{id}/add
api call, or uploaded directly to the storage and registerd in a dataset afterwards using the datasets/{id}/addFiles
api call.
Notice that using S3-storage with dataverse.files.\<id\>.upload-redirect
JVM option enabled supersedes the allow-out-of-band-upload
and will enable direct upload even with allow-out-of-band-upload
not set (or set to false).
In other words, dataverse.files.\<id\>.allow-out-of-band-upload
option opens the datasets/{id}/add
and datasets/{id}/addFiles
api endpoints without redirecting uploads in the UI.
Enabling the upload-redirect
option allows then direct upload automatically, without the need of enabling the allow-out-of-band-upload
(setting it to false
does not have any effect in that case).
FWIW: This may be something that gets removed as I don't think there is much/any current use. That said, I think we've tried to keep it working for the specific store type it was originally developed for (posix or AWS S3). It's definitely possible that bugs have been introduced but I don't think just the option of direct upload should break it (enabling direct upload doesn't stop normal uploads). If you are trying to use it and see a specific problem, feel free to add an issue/PR - at a minimum if we know for sure it is broken, that may add priority to removing it. (Alternately, if you think DCM needs to stay because it supports some use case that is hard to address with other methods, let us know why and/or if there are ways it could/should be updated to be more useful). |
We are not using DCM, it was just an example. I was just wondering if I am the only one that run into this problem, I did not manage to upload files to posix system with the API (this is the reason for this pull request). Maybe nobody uses posix anymore, or maybe there is a workaround to upload files with the API without using S3? This is blocking for the integration tool I am working on. |
Ah - I don't think DCM knows about the direct upload API at all - it was built long before that came in. It uses it's own mechanism to associate the uploaded files with the dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of this PR as this might solve a problem for using direct upload at FZJ, using a Ceph-based S3 storage which is not exposed to the internet.
Code looks good to me, but it would be nice to polish the docs a bit. Thank you for your work on this!
@ErykKul forgive me being encroaching and converting this to a draft. Doing this to reflect the nature of this PR not being ready for valuation and being sent into the funnel. Trying to play with a better view of what is happening with community contributions and how to get them merged faster. See also https://github.com/orgs/IQSS/projects/36/views/1 |
OK, thanks! I have asked @DieuwertjeBloemen to help me with the documentation. We will request the review when we will be ready. |
I have tested adding files with SWORD API, and it works as a workaround. It is only native API that throws "Dataset store configuration does not allow provided storageIdentifier". |
@qqmyers Thanks for the hint, I will investigate it. Also, if you believe it is better to make the isDirectUploadEnabled non-static and initialize the objects when none can be passed, I will do the refactoring. Other than that, I think that the "waiting" label can be removed. Thank you for your review! |
@qqmyers, the upload redirect did not work for us because of the CORS policy, and it has nothing to do with tags. I have tried to set it up with the "put-bucket-cors", like the Dataverse documentation explains, but it is not implemented on our S3 (501 error: The requested resource is not implemented). I will test it locally with a CORS proxy and check with our IT department if they can set the CORS policy somehow. Thanks again for pointing me in the right direction! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - asked for a couple potential changes, but overall this addresses #9002.
@@ -507,6 +507,10 @@ A Dataverse installation can alternately store files in a Swift or S3-compatible | |||
|
|||
A Dataverse installation may also be configured to reference some files (e.g. large and/or sensitive data) stored in a web-accessible trusted remote store. | |||
|
|||
A Dataverse installation can be configured to allow out of band upload by setting the ``dataverse.files.\<id\>.upload-out-of-band`` JVM option to ``true``. | |||
By default, Dataverse support uploading files via the :ref:`add-file-api`. With S3 stores, a direct upload process can be enabled to allow sending the file directly to the S3 store (without any intermediate copies on the Dataverse server). | |||
With the upload-out-of-band option enabled, it is also possible for file upload to be managed manually or via third-party tools, with the :ref:`add-file-metadata-api` call used to add metadata and inform Dataverse that a new file has been added to the relevant store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc build is failing because the add-file-metadata-api ref doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it such that it is not a 'ref' anymore. I was not sure if I should add some references here. Right now, it just states literally add-file-metadata-api, without it being any reference, etc.
@ErykKul now that we've released 6.0, I'm moving waiting 6.1 issues back to the board. But could you first handle the 6.0 merge and address any EE10 issues. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I updated the direct upload api reference and fixed a typo. Ready for QA.
@ErykKul Would you please add a release note snippet mentioning the jvm option and an example of how to set it, and that the id refers to the name of the s3 store configured in Dataverse to which you want to apply this policy to? Thanks. I'm just missing some context when testing and so think it might help end users and when writing the main release notes. |
Thanks, @ErykKul |
Adding release notes snippet, directly taken from installation guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's convert the release note to Markdown.
Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>
@kcondon Thanks! |
What this PR does / why we need it:
In some situations, direct upload might not work from the UI, e.g., when s3 storage is not accessible from the internet. This pull request adds an option to allow direct uploads via API only. This way, a third party application can use direct upload from within the internal network, while there is no direct download available to the users via UI.
Which issue(s) this PR closes:
Closes #9002
Special notes for your reviewer:
I have replaced
Boolean.getBoolean("...")
withBoolean.parseBoolean(System.getProperty("..."))
in some places, as I think these were bugs. Let me know if I need to revert it.Is there a release notes update needed for this change?:
yes