Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ingesting data from Dataverse. Fixes #179 #175

Merged
merged 14 commits into from
Nov 27, 2018
Merged

Conversation

Xarthisius
Copy link
Collaborator

@Xarthisius Xarthisius commented Nov 14, 2018

This PR introduces initial support for registering data from Dataverse using DOI resolver, e.g.:

A set of dataverse installation can be changed via DATAVERSE_URL config variable.

TODO:

  • add tests
  • allow to register data using internal ids
  • allow to point to a single dataverse instance via DATAVERSE_URL

@codecov
Copy link

codecov bot commented Nov 14, 2018

Codecov Report

Merging #175 into master will increase coverage by 0.96%.
The diff coverage is 93.22%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #175      +/-   ##
=========================================
+ Coverage   83.84%   84.8%   +0.96%     
=========================================
  Files          31      32       +1     
  Lines        1826    2001     +175     
=========================================
+ Hits         1531    1697     +166     
- Misses        295     304       +9
Impacted Files Coverage Δ
server/constants.py 85.71% <100%> (+0.42%) ⬆️
server/__init__.py 86.34% <100%> (+0.49%) ⬆️
server/lib/__init__.py 100% <100%> (ø) ⬆️
server/lib/null_provider.py 78.57% <33.33%> (+1.64%) ⬆️
server/lib/dataverse/provider.py 93.86% <93.86%> (ø)
server/lib/entity.py 83.33% <0%> (+4.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86fa180...bd05405. Read the comment docs.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

PluginSettings.DATAONE_URL: 'https://cn.dataone.org/cn/v2/node'
PluginSettings.DATAONE_URL: 'https://cn.dataone.org/cn/v2/node',
PluginSettings.DATAVERSE_URL:
'https://services.dataverse.harvard.edu/miniverse/map/installations-json'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "miniverse" URL may change some day but we'll try to let you know if it does. If we forget and it changes under you, please email support@dataverse.org about it.

server/lib/dataverse/provider.py Show resolved Hide resolved
@Xarthisius Xarthisius changed the title [WIP] add support for ingesting data from Dataverse Add support for ingesting data from Dataverse Nov 23, 2018
@Xarthisius Xarthisius changed the title Add support for ingesting data from Dataverse Add support for ingesting data from Dataverse. Fixes #179 Nov 23, 2018
Copy link

@craig-willis craig-willis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks great and works as expected. One minor comment about supported API formats, which it seems unlikely we'll ever hit. I'll mention again here that I think there is an opportunity to split some of this out into a separate library that could be used outside of WT.

'mimeType': item['file_content_type'],
'filesize': item['size_in_bytes'],
'id': item['file_id'],
'doi': 'fixme'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the reason for the doi fixme noted somewhere? I gather it isn't returned by Dataverse, but required by our framework?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. I wasn't actually setting the identifier attribute, that's why it went unnoticed. For now I'm gonna use parent DOI, but I left note pointing to IQSS/dataverse#5339 (comment)


Handles: {siteURL}/api/access/datafile/{fileId}
"""
fileId = os.path.basename(url.path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are trying to be comprehensive about what users can specify for lookup, Dataverse appears to support two additional methods of accessing data via the API. From http://guides.dataverse.org/en/latest/api/dataaccess.html#basic-file-access:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former is now handled. The latter returns a zipfile, which I'd say is out of scope for this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants