Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When I create a dataset, I want use an existing DOI #6425

Closed
tcoupin opened this issue Dec 2, 2019 · 17 comments
Closed

When I create a dataset, I want use an existing DOI #6425

tcoupin opened this issue Dec 2, 2019 · 17 comments

Comments

@tcoupin
Copy link
Member

tcoupin commented Dec 2, 2019

I create this issue to continue the PR "Feat existing doi on creation" #5105

On our dataverse https://dataverse.ird.fr, we allow user to provide an existing DOI.
Capture d’écran 2019-12-02 à 12 48 26

This DOI can have the same prefix as set in :Authority, or not. If DOI has the good prefix, Dataverse update it on publication and modification like a standard DOI. If not, no update is made.

This feature allow us to use Dataverse as the main referencial for our institution: external data can be reference in this central point, expecially for non OAI-PMH data warehouse.

I think this feature can be useful to other and I can create a new PR with it. I you want, this feature can be disabled by default, and enabled with a new setting in database.

File changes: v4.17...tcoupin:v4.17-IRD3
Files:

  • src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
  • src/main/java/edu/harvard/iq/dataverse/GlobalId.java
  • src/main/java/edu/harvard/iq/dataverse/engine/command/impl/AbstractDatasetCommand.java
  • src/main/java/edu/harvard/iq/dataverse/engine/command/impl/CreateNewDatasetCommand.java
  • src/main/java/edu/harvard/iq/dataverse/engine/command/impl/FinalizeDatasetPublicationCommand.java
  • src/main/java/edu/harvard/iq/dataverse/engine/command/impl/PublishDatasetCommand.java
  • starting line 607 src/main/webapp/dataset.xhtml
@pdurbin
Copy link
Member

pdurbin commented Dec 2, 2019

@tcoupin thanks for the explanation, the screenshot, and the offer to make a pull request! Let's let the community discuss here or elsewhere and see how much interest there is!

This "use an existing DOI" feature reminds me a bit of harvested datasets because it's a way for DOIs to get into an installation of Dataverse that aren't under the main DOI authority/prefix for that installations. For example, the main DOI authority/prefix for Harvard Dataverse is 10.7910 but in the screenshot below, I'm showing harvested datasets with a DOI authority/prefix of 10.18738 and 10.6141:

Screen Shot 2019-12-02 at 8 04 21 AM

The other thing I'm reminded of is how sometimes installations of Dataverse are torn about including licensed data in Dataverse. Dataverse will assign all datasets a DOI. Sometimes there is a desire to not assign a DOI to licensed datasets. This was mentioned in passing in "Best Practices for Research Data Management" at https://www.youtube.com/watch?v=sIm9CZipNYo

Finally, there was some good discussion about scope (only one DOI authority/prefix) in the original "DataCite DOI Support Functional Requirements Document" years ago that feels somewhat related: https://docs.google.com/document/d/1DAiQ80-69EUW1so-qY3HVfcjiy4zaObg4RDfz18da_k/edit?usp=sharing

@djbrooke
Copy link
Contributor

djbrooke commented Dec 2, 2019

Hi @tcoupin, I don't think we'd want to support this for similar reasons that we didn't want to merge #5105/#5104 and we would not accept a PR. The workflow of data in other systems can currently be supported through OAI-PMH and in the future will be supported by TRSAs (http://cyberimpact.us/dataverse-trusted-remote-storage-agent-update/). We don't want to implement the ability to point to external objects outside of these two methods.

@tcoupin
Copy link
Member Author

tcoupin commented Dec 3, 2019

I followed the recommendations of #5105 (comment)

@RightInTwo
Copy link
Contributor

RightInTwo commented Dec 3, 2019

@djbrooke @pdurbin Hi, I hope you're doing fantastic! How I understand TRSA, the PIDs will still be minted in Dataverse. Am I mislead? Can that mechanism really be used to reference external PIDs?

@tcoupin Could you please check out #5402 to see if your issue is covered by that one? If this will not find a way to core, I'd be very interested in your branch!

@pdurbin
Copy link
Member

pdurbin commented Dec 3, 2019

@RightInTwo good question. Can you please leave a comment on #6423 with your thoughts on how a data citation should look?

@tcoupin
Copy link
Member Author

tcoupin commented Dec 3, 2019

@tcoupin Could you please check out #5402 to see if your issue is covered by that one? If this will not find a way to core, I'd be very interested in your branch!

I unterstand your use case as "I want to feed a dataverse by creating datasets based on metadatas associated with a DOI". It's very interresting!
My PR allows creating dataset only one by one but do not harvest existing metadata (bad!).

The only thing I want is to not create a new DOi for an existing DOI

@RightInTwo
Copy link
Contributor

@tcoupin Indeed. It would be one step further to fetch the metadata based on the DOI, but the prerequisite is the same: Facilitate externally managed DOIs.

@pdurbin I know I've had my fair share of disruption by creating and cross-referencing all kinds of issues. Maybe I can help cleaning up? Contact me :)

@tcoupin
Copy link
Member Author

tcoupin commented Jan 17, 2020

We plan to remove our "New dataset with existing DOI" feature from Datasud and replace it by "Create dataset by harvesting a DOI". Similar to OAI client feature, it will create remote hosted dataset.
It remains to be determined whether this function will be in the dataverse kernel or in a separated micro service.

Hi @tcoupin, I don't think we'd want to support this for similar reasons that we didn't want to merge #5105/#5104 and we would not accept a PR. The workflow of data in other systems can currently be supported through OAI-PMH and in the future will be supported by TRSAs (http://cyberimpact.us/dataverse-trusted-remote-storage-agent-update/). We don't want to implement the ability to point to external objects outside of these two methods.

So not in the dataverse kernel ^^

So a micro service!
An important condition is that this development will be donated to the dataverse community. Do you think that this service could have a fairly significant interest for other users? My development resources are extremely limited and it will take a little help to ensure maintenance. Maybe I can have something functional at the end of the first semester, in PHP with slimframework.

@pdurbin
Copy link
Member

pdurbin commented Jan 17, 2020

@tcoupin I'm not sure I completely understand what you're proposing but let's keep talking about it! 😄

@RightInTwo
Copy link
Contributor

@tcoupin Did you see the code I posted in #5402 (post of 16 Dec 2019) ? I think that "Create dataset by harvesting a DOI" would fit very well into that issue.

@tcoupin
Copy link
Member Author

tcoupin commented Jan 21, 2020

@RightInTwo Yes. This code creates a real dataset which can lead to an asynchronism between the metadata in dataverse and those associated with the DOI. I plan to set up an OAI-PMH service which will serve the metadata of a DOI list. In Dataverse, I will set up an OAI-PMH client to harvest this service.
The code will be shared ;)

@RightInTwo
Copy link
Contributor

@tcoupin I though we could maybe merge these two issues as they serve the same use case. Of course, avoiding the async is quite important, so I'd be happy to support you with that solution and drop my makeshift code!

@djbrooke
Copy link
Contributor

Hi @tcoupin @RightInTwo, like @pdurbin, I'm not sure that I understand exactly what you're planning here. Is there a diagram that could be included about the proposal?

@RightInTwo
Copy link
Contributor

@tcoupin @djbrooke @pdurbin I posted a very simple diagram in #5402.

@pdurbin
Copy link
Member

pdurbin commented Jan 21, 2020

@RightInTwo nice diagram. I'm copying it here too:

72823053-66b0f480-3c73-11ea-96f5-6578fd8ce255

@djbrooke
Copy link
Contributor

Thanks @tcoupin @RightInTwo @pdurbin. Since we'll be handling this with no changes to the Dataverse code, I'll close this out.

@pdurbin
Copy link
Member

pdurbin commented Jan 24, 2020

To follow along from here, please see https://github.com/IQSS/doi2pmh-server , the repo we created as part of #5402.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants