Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import from Dataverse #536

Closed
pdurbin opened this issue May 13, 2019 · 15 comments · Fixed by #626
Closed

Import from Dataverse #536

pdurbin opened this issue May 13, 2019 · 15 comments · Fixed by #626
Assignees

Comments

@pdurbin
Copy link

pdurbin commented May 13, 2019

Dataverse is open source research data repository software with 43 installations around the world. I'm one of the developers and we'd be very happy for Renku to have "import from Dataverse" functionality. If you have any questions about Dataverse APIs, we're happy to answer them.

Once import has been implemented, we'd be happy to have Renku listed under "Analysis and Computation" at http://guides.dataverse.org/en/4.14/admin/integrations.html#analysis-and-computation . If you want to go ahead and create an issue at https://github.com/IQSS/dataverse/issues to update the integrations.rst file in the Dataverse repo, please go ahead. We could use that issue to answer any questions you may have.

I seem to remember that Renku is written in Python so you might want to try using a new Python library for Dataverse at https://github.com/AUSSDA/pyDataverse by @skasberger that is so new that we haven't yet listed it at http://guides.dataverse.org/en/4.14/api/client-libraries.html

Alternatively, you can just write your own implementation. You might find inspiration from whole-tale/girder_wholetale#175 by @Xarthisius who implemented "import" from Dataverse for @whole-tale which is also written in Python. For a more human readable discussion of how to do download all the files in a Dataverse dataset using the DOI of the dataset, please see whole-tale/girder_wholetale#179 . I suggested testing with https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/TJCLKP since it's a dataset of mine but you are of course welcome to pick any dataset from any installation of Dataverse, including the demo site at https://demo.dataverse.org

I'm looking at my notes from when @rokroskar and @ciyer visited @IQSS and I'm reminded that Renku has the ability to create PROV-JSON files. Perhaps a future integration would be to push these files into Dataverse using the Dataverse "prov-json" API endpoint: http://guides.dataverse.org/en/4.14/api/native-api.html#provenance

Of course we would be thrilled if you choose a dataset hosted on an installation of Dataverse when you work on SwissDataScienceCenter/renku/issues/543 😄

@rokroskar
Copy link
Member

@rokroskar rokroskar transferred this issue from SwissDataScienceCenter/renku May 14, 2019
@skasberger
Copy link

The first release of pyDataverse is now online, the next will come in 2-3 weeks with classes for the metadata of dataverses, datasets and datafiles.

https://github.com/AUSSDA/pyDataverse/releases/tag/v0.1.0

@rokroskar
Copy link
Member

Great, thanks for the update @skasberger!

@pdurbin
Copy link
Author

pdurbin commented May 24, 2019

Here's how a Renku button (or should it be RENKU?) could look in Dataverse:

Screen Shot 2019-05-24 at 11 48 49 AM

It would be added with a curl command something like this:

curl http://localhost:8080/api/admin/externalTools -X POST --upload-file renku.json

{
  "displayName": "Renku",
  "description": "Analyze in Renku",
  "type": "explore",
  "toolUrl": "https://renkulab.io/FIXME",
  "contentType": "application/x-ipynb+json",
  "toolParameters": {
    "queryParameters": [
      {
        "fileId": "{fileId}"
      },
      {
        "siteUrl": "{siteUrl}"
      },
      {
        "key": "{apiToken}"
      }
    ]
  }
}

@pdurbin
Copy link
Author

pdurbin commented Jul 15, 2019

Heads up that there's a new pull request at jupyterhub/repo2docker#739 by @Xarthisius (thanks!!) for downloading files from Dataverse into repo2docker which is a Python library used by both Binder and Whole Tale for spinning up Docker containers running Jupyter Notebooks and other compute environments.

In other news, I've been having great success with pyDataverse, the new Python client library mentioned by @skasberger who is also the author. He gave a great talk about it the other week at the 5th annual Dataverse Community Meeting: https://osf.io/ur2q7/

Finally, at the same meeting I demo'ed launching a Jupyter Notebook from Dataverse using Whole Tale at the same meeting. Next year (if not sooner!) I'd love to demo a similar trick with Renku! Here are screenshots of the demo and a full transcription: https://scholar.harvard.edu/pdurbin/blog/2019/jupyter-notebooks-and-crazy-ideas-for-dataverse

@pdurbin
Copy link
Author

pdurbin commented Aug 9, 2019

Over at IQSS/dataverse#6059 I recently created a new pull request to support external tools at the dataset level for Dataverse. (In the screenshot above, I showed how external tools at the file level are already supported.)

My question for the Renku team is this:

What is an ideal URL on an installation of Renku that Dataverse users should be sent to when they click "Explore" and then "Renku"?

Given the pull request right now, a URL like the following can be constructed on the Dataverse side:

https://renkulab.io?datasetPid=doi:10.7910/DVN/RLLL1V

The external tool manifest would look like this:

{
  "displayName": "Renku",
  "description": "Analyze in Renku",
  "type": "explore",
  "scope": "dataset",
  "toolUrl": "https://renkulab.io",
  "toolParameters": {
    "queryParameters": [
      {
        "datasetPid": "{datasetPid}"
      }
    ]
  }
}

Would that URL work for you? More query parameters are also supported. It could be longer and more specific, like this:

https://renkulab.io?datasetPid=hdl:10864/10798&siteUrl=https://dataverse.scholarsportal.info

Thoughts are welcome here or on the pull request above or its corresponding issue about supporting dataset level external tools: IQSS/dataverse#5028

@rokroskar
Copy link
Member

hey @pdurbin thanks for the heads up on this - there are two possible scenarios I can imagine:

  1. the user seeing the dataset on Dataverse would want to see if the dataset is present in a particular instance of renku
  2. the user on Dataverse wants to either create a project to use a dataset from Dataverse or add it to an existing project

For the first case, a URL like https://renkulab.io/datasets?datasetPid=doi:10.7910/DVN/RLLL1V could potentially work - however, for the second case the user would have to be prompted/redirected to provide the project they want to add the data to. We don't currently have the functionality to add data to a project via an API, but it is presently being worked on.

cc @vfried @lorenzo-cavazzi @cchoirat @ciyer @jsam @jachro who may have other opinions...

@rokroskar
Copy link
Member

closed by #626

@rokroskar
Copy link
Member

@pdurbin the discussion in this issue has sprawled a bit, but the import part is now implemented in #626 - would be fantastic if you could test it out and let us know if you find any issues. And please feel free to open follow-up issues. Thanks!

@pdurbin
Copy link
Author

pdurbin commented Aug 28, 2019

@rokroskar this is fantastic news! 🎉 Unfortunately, I'm struggling a bit with importing my dataset. I just left a screenshot of the commands I tried over at SwissDataScienceCenter/renku#593 (comment)

@rokroskar
Copy link
Member

@pdurbin: the dataverse import is not a part of a release yet. To install it (in your interactive environment running on renkulab), you can run:

pipx runpip renku install -e https://github.com/SwissDataScienceCenter/renku-python#egg=renku

@rokroskar
Copy link
Member

@pdurbin also note that we are still cleaning the import features up a bit (right now you get 1 commit per imported file... not optimal) - but they will be fixed soon.

@pdurbin
Copy link
Author

pdurbin commented Aug 28, 2019

@rokroskar thanks. I tried that pipx command but I got an error:

https://github.com/SwissDataScienceCenter/renku-python#egg=renku is not a valid editable requirement. It should either be a path to a local project or a VCS URL (beginning with svn+, git+, hg+, or bzr+).

Here's a screenshot for context:

Screen Shot 2019-08-28 at 11 51 16 AM

@rokroskar
Copy link
Member

Sorry, made a typo - should be git+https

@pdurbin
Copy link
Author

pdurbin commented Aug 29, 2019

@rokroskar it works! Thanks! I posted a screenshot to SwissDataScienceCenter/renku#593 (comment)

I also let the Dataverse community know about this exciting new integration: https://groups.google.com/d/msg/dataverse-community/2H21moBIRgU/PUuai7UNBgAJ 🎉

As you suggested, follow up issues specific to Dataverse would probably be best. I'm excited that this initial integration "just works". Thanks!

@rokroskar rokroskar added this to the sprint-2019-08-16 milestone Sep 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants