Add support in datasets API for persistent id (doi) #1837

scolapasta · 2015-04-01T22:39:29Z

We currently have the APis as using the db id, but we need to support persistent Id.

scolapasta · 2015-04-01T22:39:59Z

However, the doi naming Scheme contains slashes, and so does not play well with REST api. We could introduce a Scheme where the doi id is escaped or the slashes are replaces with dashes, or maybe base64-ed. Not sure any of these is a good idea - at least, it's not a very intuitive one. We could offer another endpoint that converts global ids to local ones.

pdurbin · 2015-04-09T14:04:06Z

I definitely have a need to figure out internal dataset ID numbers when working with APIs.

For a long times I've been doing this at https://github.com/IQSS/dataverse/blob/master/scripts/search/assumptions

export FIRST_FINCH_DATASET_ID=curl -s "http://localhost:8080/api/dataverses/finches/contents?key=$FINCHKEY" | jq '.data[0].id'

And more recently I've been using an undocumented feature of the Search API to expose database IDs (looking them up by globalId/persistentId/DOI) but this requires turning on an experimental feature I haven't fully implemented at #1299 -

dataverse/src/test/java/edu/harvard/iq/dataverse/api/SearchIT.java

Line 218 in 60e82b1

/**

Anyway, my point is that this is an important endpoint for sure. /cc @rliebz

garthg · 2015-05-14T16:03:38Z

Hi,

This is a blocker issue for my project, because without the ids I can't perform metadata updates, and I can't get the ids because the get_contents() call takes too long to complete. I will give a try on some of the workarounds described here, so thank you to folks who posted those!

One possible suggestion for a simple solution here would be to URL-escape the DOIs and then use them in the REST format as usual, so you'd get something like https://dataverse.harvard.edu/api/datasets/doi%3A10.7910%2FDVN%2FUXTXA/versions/:latest

Anyway, if anyone has any additional suggestions for how to find the IDs or how to perform metadata updates using only DOI, I would love to hear them!

Thanks,

Garth

pdurbin · 2015-05-29T19:27:08Z

While I was just trying to write a test for #2222 it was driving me crazy (again) that I can't see the dataset entity/database IDs from SWORD. I just pushed a proof of concept to correct this in 639d8c3.

Disabled because we still need a way to find a dataset id based on a DOI: IQSS/dataverse#1837

pdurbin · 2015-06-29T13:38:48Z

the get_contents() call takes too long to complete

Right, get_contents is a method @garthg is calling from https://github.com/IQSS/dataverse-client-python and the corresponding issue about this slowness on the API side is #2122

pdurbin · 2015-07-15T20:24:48Z

Without this functionality of being able to look up datasets via DOI, the native "datasets" API ( http://guides.dataverse.org/en/4.0/api/native-api.html#datasets ) is way less useful. An example use case today from @aawinburn was "How do I get the file ID this PDF in my unpublished dataset?" Good question and #1795 was supposed to be the answer but you have to know the database id of the dataset. I've also answered this question at https://groups.google.com/d/msg/dataverse-community/fFrJi7NnBus/JUdOlOmhtQgJ encouraging people (for now) to get a list of file IDs via the SWORD statement ( http://guides.dataverse.org/en/latest/api/sword.html#display-a-dataset-statement ) mostly because SWORD operates via DOIs. See also infsci2711/MultiDBs-FilesAPIs2DBs-WebClient#6

pdurbin · 2015-09-04T17:11:31Z

As I just mentioned in a thread on the Dataverse Google Group, #2416 was opened recently which is about how hard it is to discover file IDs from the GUI.

In addition #2438 is a new issue about what persistent IDs we could/should use for files.

pdurbin · 2015-10-10T14:02:09Z

Developers of the Dataverse client for Python would like the ability to use DOIs (not just database IDs) to operate on the native API. IQSS/dataverse-client-python#28 has some discussion on this.

leeper · 2015-11-14T13:39:38Z

This would also be useful for the R client.

leeper · 2015-11-14T13:41:35Z

I should elaborate: there's a tension between the Native API's ability to get versions of a dataset (but only by dataset ID) and the SWORD API's ability to retrieve a dataset by DOI. It would be nice for these to be able to play together, particularly given that the Native API doesn't require an API key to view the contents of a public dataset, but the SWORD API does.

RinkeHoekstra · 2015-12-02T08:47:46Z

This is a blocker as well for my project, and I do not see what the reason is that the search API does not expose the dataset ID's by default.

As it turns out, several dataverse installations I've tested do provide the id's when the 'show_entity_ids=true' parameter is passed in the URL. However, this feature is undocumented in the API docs.

pdurbin · 2015-12-02T14:02:09Z

See also #1717 which spawned this ticket. I think @michbarsinai @scolapasta and I need to get together and decide on an approach to try. Options include:

put the DOI in a query parameter: /api/datasets?persistentId=doi:10.7910/DVN/UXTXA
escape the DOI keeping it where it is the path: /api/datasets/doi%3A10.7910%2FDVN%2FUXTXA
put the DOI at the end of path: /api/datasets/versions/:latest/doi:10.7910/DVN/UXTXA

@garthg means well when he suggests escaping the DOI in the URL like /api/datasets/doi%3A10.7910%2FDVN%2FUXTXA/versions/:latest (and @michbarsinai suggested the same at #1717 (comment) ) but my goodness is that hard on the eyes. I would much prefer using a query parameter like this: /api/datasets?persistentId=doi:10.7910/DVN/UXTXA which is exactly what we do on the dataset page: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/UXTXA

Another approach would be to put the DOI at the end of the URL, like we do with SWORD ( /dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.7910/DVN/UXTXA ) but I favor the query parameter approach.

Whatever we decide on we would, of course, continue to support the old way for a while. And I think we should continue to support looking up a dataset by id, even if we use a query parameter (/api/datasets?id=42).

michbarsinai · 2015-12-02T20:53:27Z

Another option is to have a DOI endpoint. This will also allow to point to different types of items from a DOI, which is, I think, one of the main goals of the DOI project.

Something along the lines of:

/api/doi/10.7910/DVN/UXTXA

Not sure how to deal with versions there - we could append them (/api/doi/12.3456/DVNE/UXTXA/versions/:latest) and use some semi-clever URL parsing. Or we could return a list of the versions, and have the client access a specific version via the existing API.

garthg · 2015-12-03T01:04:29Z

@RinkeHoekstra In case it's helpful, I wrote some Python that does cached lookup of dataverse IDs to make it slightly easier to manage this issue. Some code is on pastebin at: http://pastebin.com/ipdhEPXA . Obviously that's not a substitute for proper implementation through the API, but I wanted to pass it along just in case it's helpful.

RinkeHoekstra · 2015-12-03T08:23:50Z

@garthg thanks! I found similar code somewhere on Github and now have a workaround.

A separate issue is that the search API is rather picky as to how the DOI is quoted. For instance Python requests always quotes the query parameters in a GET request, but the API then searches for the quoted string rather than unquoting it first. But that is a separate issue ...

michbarsinai · 2015-12-16T20:00:47Z

URL scheme for external persistent ids:

http://dataverse.org/api/datasets/:persistentid/:draft?persistentid=doi:10.2.3.4./open/ended/notation*

As long as the character is legal in URL parameters, so can't support, e.g. &.

…Also updated the native API guide (#1837)

pdurbin · 2016-01-04T19:21:23Z

@scolapasta this is one of the issues I mentioned this morning for which code has been pushed to a branch made from 4.2.3 and a decision should be made whether to merge it in to the 4.2.3 branch or not.

pdurbin · 2016-03-01T02:13:19Z

Most recently, this issue is affecting this user:

I'm replying with workarounds but really we should just fix this issue. @michbarsinai implemented a fix at #1837 (comment) and it has since become pull request #2893.

Issue #1837 implemented and ready to be merged.

scolapasta · 2016-03-16T18:07:39Z

Tested and merged.

pdurbin · 2016-03-16T18:20:09Z

You can see the fix in production at https://dataverse.harvard.edu/api/datasets/:persistentId?persistentId=doi:10.7910/DVN/ARKOTI

(That's the dataset @monogan said we could test with at IQSS/dataverse-client-r#2 (comment) .)

Docs at http://guides.dataverse.org/en/4.3/api/native-api.html#datasets

scolapasta added this to the 4.0.1 milestone Apr 1, 2015

scolapasta mentioned this issue Apr 1, 2015

List public files in dataset by API call #1717

Closed

scolapasta modified the milestones: 4.0.1, In Review - Short Term Apr 18, 2015

pdurbin added a commit that referenced this issue May 29, 2015

SWORD: expose dataset entity id #1837

639d8c3

pdurbin added a commit to IQSS/dataverse-apitester that referenced this issue Jun 1, 2015

add (disabled) test for IQSS/dataverse#2222

48713e4

Disabled because we still need a way to find a dataset id based on a DOI: IQSS/dataverse#1837

pdurbin mentioned this issue Jun 29, 2015

Optimize Permissions Lookup #2122

Closed

pdurbin mentioned this issue Jul 15, 2015

API: Add a way to discover file ids from a dataset to enable use in download API. #1795

Closed

pdurbin added the Feature: API label Jul 15, 2015

pdurbin mentioned this issue Oct 10, 2015

Add ability to instantiate a dataset without instantiating a dataverse IQSS/dataverse-client-python#28

Open

michbarsinai self-assigned this Dec 16, 2015

michbarsinai added a commit that referenced this issue Dec 18, 2015

Datasets can now be accesses via the api using their persistent ids. …

6553d33

…Also updated the native API guide (#1837)

michbarsinai removed their assignment Dec 19, 2015

michbarsinai added the Status: QA label Dec 19, 2015

pdurbin modified the milestones: 4.2.3, Not Assigned to a Release Jan 4, 2016

pdurbin added Status: Dev and removed Status: QA labels Jan 4, 2016

pdurbin assigned scolapasta Jan 4, 2016

scolapasta modified the milestones: 4.3, 4.2.3 Jan 5, 2016

pdurbin mentioned this issue Jan 28, 2016

Issue #1837 implemented and ready to be merged. #2893

Merged

scolapasta removed in progress labels Jan 28, 2016

scolapasta added the Status: QA label Feb 26, 2016

scolapasta assigned kcondon and unassigned scolapasta Feb 26, 2016

scolapasta removed the Status: Dev label Feb 27, 2016

scolapasta added a commit that referenced this issue Mar 11, 2016

Merge pull request #2893 from IQSS/1837-persistent-id-in-dataset-api

e3f635e

Issue #1837 implemented and ready to be merged.

scolapasta closed this as completed Mar 16, 2016

leeper mentioned this issue Mar 16, 2016

Update direct data download IQSS/dataverse-client-r#8

Closed

pdurbin mentioned this issue Mar 16, 2016

Add integration test to download a file by filename IQSS/dataverse-client-r#2

Closed

yarikoptic mentioned this issue Apr 1, 2016

dataverse datalad/datalad#393

Closed

pdurbin mentioned this issue Dec 22, 2016

Native API: publish dataset endpoint doesn't support :persistentId #3547

Closed

pdurbin mentioned this issue Nov 21, 2017

Add persistent identifier parameter option for File Download API #4295

Closed

pdurbin mentioned this issue Mar 8, 2018

As a user of the Search API, I'd like to know the database ID for dataverses, datasets, and files returned by my query #4493

Closed

pdurbin mentioned this issue Jul 23, 2018

Better error handling #4882

Closed

pdurbin mentioned this issue Feb 10, 2023

Update native-api.rst #9378

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support in datasets API for persistent id (doi) #1837

Add support in datasets API for persistent id (doi) #1837

scolapasta commented Apr 1, 2015 •

edited by djbrooke

Loading

scolapasta commented Apr 1, 2015

pdurbin commented Apr 9, 2015

garthg commented May 14, 2015

pdurbin commented May 29, 2015

pdurbin commented Jun 29, 2015

pdurbin commented Jul 15, 2015

pdurbin commented Sep 4, 2015

pdurbin commented Oct 10, 2015

leeper commented Nov 14, 2015

leeper commented Nov 14, 2015

RinkeHoekstra commented Dec 2, 2015

pdurbin commented Dec 2, 2015

michbarsinai commented Dec 2, 2015

garthg commented Dec 3, 2015

RinkeHoekstra commented Dec 3, 2015

michbarsinai commented Dec 16, 2015

pdurbin commented Jan 4, 2016

pdurbin commented Mar 1, 2016

scolapasta commented Mar 16, 2016

pdurbin commented Mar 16, 2016

Add support in datasets API for persistent id (doi) #1837

Add support in datasets API for persistent id (doi) #1837

Comments

scolapasta commented Apr 1, 2015 • edited by djbrooke Loading

scolapasta commented Apr 1, 2015

pdurbin commented Apr 9, 2015

garthg commented May 14, 2015

pdurbin commented May 29, 2015

pdurbin commented Jun 29, 2015

pdurbin commented Jul 15, 2015

pdurbin commented Sep 4, 2015

pdurbin commented Oct 10, 2015

leeper commented Nov 14, 2015

leeper commented Nov 14, 2015

RinkeHoekstra commented Dec 2, 2015

pdurbin commented Dec 2, 2015

michbarsinai commented Dec 2, 2015

garthg commented Dec 3, 2015

RinkeHoekstra commented Dec 3, 2015

michbarsinai commented Dec 16, 2015

pdurbin commented Jan 4, 2016

pdurbin commented Mar 1, 2016

scolapasta commented Mar 16, 2016

pdurbin commented Mar 16, 2016

scolapasta commented Apr 1, 2015 •

edited by djbrooke

Loading