-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 URL Endpoint #4690
Comments
@sparrowjack63 thanks for this suggestion. @mbamouni was asking related questions at https://groups.google.com/d/msg/dataverse-community/py0UMJV9lDg/A7Lk0YtsCQAJ and this gives me a better idea of a possible fix. Out of curiosity, would you be interested in making a pull request? 😄 |
Thanks, |
In addition, I looked at the google group, and know @mbamouni ;), it's the same issue, so we could test it together with michel. |
@sparrowjack63 ah, good to know that you and @mbamouni work together. No worries about the pull request. It looks like the only time "awscli" appears in the git repo is in the docs (rather than the code) at https://github.com/IQSS/dataverse/blob/v4.8.6/doc/sphinx-guides/source/installation/config.rst#set-up-access-configuration-via-command-line-tools . But above that section it is described how to manually set up: https://github.com/IQSS/dataverse/blob/v4.8.6/doc/sphinx-guides/source/installation/config.rst#set-up-access-configuration-via-command-line-tools I'm confused, though. When setting up manually it seems like one would need the ability to add "endpoint-url" to the "~/.aws/config" file. @sparrowjack63 can you please read through aws/aws-cli#1270 and see if there's anything in there that explains what to do? |
I have read aws/aws-cli#1270 but unfortunately, there is still no possibility to add s3 endpoint in awscli config file, it's still a pending feature request (for a long time...) |
@sparrowjack63 ok, do you want to try the Swift API for CEPH? Here are the docs: |
As a last resort we will try swift, but it would be regrettable to close the door to all s3 compatible storage for dataverse just because of a lack on this endpoint parameter. Thanks, |
@sparrowjack63 sure, I believe all the S3 code is here: https://github.com/IQSS/dataverse/blob/v4.8.6/src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java If that doesn't help, please let me know what else would be helpful. Please keep asking questions. Thanks! |
I'd also appreciate the support for alternative endpoints. I'm trying to specify a config file with endpoints at 3 different URL's (ec2, elb and rds have their own unique endpoitns). No way to do it short of passing the --endpoint-url flag for now. |
@aries-strato thanks for commenting. I'm new to AWS but since you mentioned RDS I thought I'd say that it's already in use by Harvard Dataverse. #4709 is about an upgrade to Postgres 9.6 in general but for anyone on RDS, they'll need to upgrade as well. Come to think of it, Harvard Dataverse is using ELB as well. And the instances run on EC2. Can you perhaps give an example of how the config would ideally look from your perspective? |
Simply adding the field in the .config file would be enough, along with specifying a --no-verify url option as well [default] |
@aries-strato thanks. Any news from your end? I'm wondering what the status of this issue is. Are you @sparrowjack63 interested in making a pull request? |
Hey @pdurbin, this really is a show stopper for our internal test deployment... Our centrally offered Swift Object Store is not GA yet and building a SAIO instance just for testing seems like overkill. (And using NFS for shared storage is the root of all evil... Let's try to avoid that in the age of 2018, right?) Minio is a very lightweight alternative and should be sufficient for testing purposes (maybe even beyond that?) or some Ceph RADOS S3 gateway... I just found https://github.com/minio/cookbook/blob/master/docs/aws-sdk-for-java-with-minio.md - it doesn't seem to be that hard to get the custom URL support into the Dataverse S3 storage driver class. Wondering if you would like to get a PR for this? I would suggest using a separate storage driver class to make it more distinguishable and less error prone for existing installations... (even so it introduces some code duplication.) (After having our testbed up and running, we can get to Payara testing... 😉 ) Cheers, |
@poikilotherm yes! We're interested in a pull request. I just invited you to join https://github.com/orgs/IQSS/teams/dataverse-readonly so I can assign this issue to you and dragged it to the new "Community Dev" column on our kanban board: https://waffle.io/IQSS/dataverse We're happy to answer questions here, on https://groups.google.com/forum/#!forum/dataverse-dev or http://chat.dataverse.org (#dataverse on freenode) or during community calls ( https://dataverse.org/community-calls ). Please just get in touch if there's any walk through you need of the code or docs. Thanks! I'd feel remiss if I didn't say that we used NFS in production for about a decade and it worked fine. Only within the last year did we (Harvard Dataverse) switch to S3. 😄 |
Thx! Joined 😄 What approach would you guys prefer? Separate driver with custom URL prefix or an extension of the existing driver? Or should I just come up with something? |
@poikilotherm thanks. I'll mention this issue at standup in about an hour but I guess I'm thinking a separate driver would be a good place to start. So rather than S3AccessIO.java it would be called MinioAccessIO.java maybe? I hope I'm understanding what you're building. I think it's a Minio driver. |
I am fairly sure supporting custom endpoints would be adding one line to the S3AccessIO.java where the client builder is initialized, to take in a custom url. That and a new config variable. |
@poikilotherm Feel free to ping me here if you need more info! |
@matthew-a-dunlap sure, will do so. Maybe you can give #5068 a look, too? This might be a dependency for this... |
…s using a custom EndpointConfiguration. Will default to previous standard configuration.
…s using a custom EndpointConfiguration. Will default to previous standard configuration.
Hey @matthew-a-dunlap and @pdurbin, I think I got this ready for review. Beware of the monsters in the shadows:
|
@sparrowjack63 and @mbamouni, could you test my feature branch with your S3 services and report back? Thank you!!! |
Hi @djbrooke, this is true - it's not really blocked, only connected. |
Thanks @poikilotherm - I'll move it over to QA. |
Hey @kcondon, |
@poikilotherm Sorry, I have been a bit backlogged. I'm working on this now. |
Made some edits to structure and content to match the rest of the page.
Changed first paragraph, fixed list formatting.
@poikilotherm Hi, I've regression tested this against s3 and that part is working. I've tried setting up against the public minio url but was not able to get it working. I know it must be something simple I'm missing but what are the minimum set of options that a minio config needs? Also, should any existing s3 config options be disabled? Thanks. |
@poikilotherm Got it working -typo on id in guide and I did not create the bucket first. Phil updated the doc for those small changes. |
@poikilotherm sure. @kcondon and I were talking about improvements we could make to the docs in the future but we didn't want to hold up merging just for docs. Thank you for your contribution! |
@pdurbin I just saw this issue is not assigned to a milestone... Should this be added to a 4.9.5 or 4.10 release? |
@poikilotherm if I knew for sure we were planning to release 4.9.5 I would add it there but I don't what the next release will be or when. A number of 4.x releases are mentioned at https://dataverse.org/goals-roadmap-and-releases |
Hi all,
Dataverse seems to use awscli to put data on S3 storage, but unfortunately, there is no way to put a specific S3 endpoint URL in the config file of awscli.
The only and normal way to use an other S3 endpoint URL with awscl is to put it as a parameter on every awscli command like this : awscl s3 --endpoint-url=https://mys3compatiblestorage.org ...
Could you please add the opportunity to add the parameter "--endpoint-url=URL" to every awscli command used by dataverse ?
It will be very usefull for every users who have S3 compatible storage (like ceph or scality) and want to use it with dataverse.
Thanks!
The text was updated successfully, but these errors were encountered: