Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crossref as a DOI provider (experimental) #10806

Merged
merged 12 commits into from
Sep 17, 2024
Merged

Conversation

stevenwinship
Copy link
Contributor

@stevenwinship stevenwinship commented Aug 27, 2024

What this PR does / why we need it:
This pr adds new doi provider - CrossRef. Implemented logic was mostly inspired by other providers such as DataCite and PermaLink.

Which issue(s) this PR closes: 8581

Special notes for your reviewer:

Suggestions on how to test this:
In order to test this you need to have test crossref account and set all required properties, described in doc/sphinx-guides/source/installation/config.rst file
Update: CrossRef does not create test accounts. This may need to be tested by @JacekChudzik

Does this PR introduce a user interface change? If mockups are available, please link/include them here: None

Is there a release notes update needed for this change?: included

Additional documentation: Installation configuration

https://dataverse-guide--10806.org.readthedocs.build/en/10806/installation/config.html#crossref-specific-settings

@stevenwinship stevenwinship self-assigned this Aug 27, 2024
@stevenwinship stevenwinship added Type: Feature a feature request Size: 10 A percentage of a sprint. 7 hours. FY25 Sprint 4 FY25 Sprint 4 labels Aug 27, 2024
@coveralls
Copy link

coveralls commented Aug 27, 2024

Coverage Status

coverage: 20.714% (-0.05%) from 20.764%
when pulling 8dfd5f3 on 8581-crossref-provider2
into 34bfb1b on develop.

This comment has been minimized.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is close to working but I think right now it mixes the way legacy providers and new multiple pid providers work w.r.t. settings. I've tried to leave comments about how to disentangle that, but am happy to talk through it as well. I also commented on where I think some test changes/additions are needed.

I also noted that we'll need to decide if the updates to DOI metadata coming in other PRs should also apply to CrossRef - if so we probably need a new issue once everything is merged.



// PROVIDER CROSSREF
CROSSREF_URL(SCOPE_PID_PROVIDER, "url"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a SCOPE_PID_CROSSREF(SCOPE_PID_PROVIDER, "crossref"),
entry and then all of the rest should use that scope rather than SCOPE_PID_PROVIDER. The effect of that is to let you define jvm options of the form dataverse.pid.<id>.crossref.url for each of the potentially many crossref providers (rather than having only one global setting for all crossref providers).

Copy link
Contributor Author

@stevenwinship stevenwinship Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I had that but removed it to have 'dataverse.pid.< id >.url' without the .crossref.
I'll make that change


.. _dataverse.pid.crossref.password:

dataverse.pid.crossref.password
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you edit this to move it to the multiple provide section, please add a note like the one on DataCite suggesting that more secure options should be used (versus having the password as a plain text jvm-option)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR was made before multiPid was supported/before docs were improved with that, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will modify the doc

@@ -617,6 +617,15 @@ this provider.
- :ref:`dataverse.pid.ezid.username`
- :ref:`dataverse.pid.ezid.password`

**JVM Options for CrossRef:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation, and some of the code, is for 'legacy' single pid configuration but I think we could/should only support handling the new multipid way of doing config. Although I think you've added it, I think it's reasonable to not support the legacy style at all for cross ref (since the idea of legacy was to support people who had already deployed with the old single provider jvm options and no one could have used CrossRef before now (except the developer's group).

To change the docs - see the DataCite section in the guide and add the info there (probably right after DataCite/before Handle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will modify the doc

@@ -188,6 +188,9 @@ private void loadProviders() {
legacy = new FakeDOIProvider("legacy", "legacy", authority, shoulder, identifierGenerationStyle,
dataFilePidFormat, "", "");
break;
case "CrossRef":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should drop this. If you want to support legacy still, this section should be doing like the DataCite legacy section above to read the legacy-style options and feed them into the provider directly (creating a new CrossRefDOIProvider rather than calling a factory method). Again - probably better to drop it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

import java.util.logging.Level;
import java.util.logging.Logger;

public class CrossRefDOIPidProvider extends AbstractDOIProvider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other DOI providers drop the Pid part of the name (or use DOI instead of Pid since it is the type of Pid they support), so this would be more consistent as CrossRefDOIProvider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming

}
}

class CrossRefMetadataTemplate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope perhaps for now - we should coordinate and see if the enhancements in #10615/#10632 should be applied to CrossRef as well (as part of that PR or a new one once they're merged). It's possible that CrossRef doesn't accept all the new metadata that gets added in those, but if it does, I assume we'd want to send it to CrossRef as well. @konradperlowski - any thoughts?

@JvmSetting(key = JvmSettings.PID_PROVIDER_LABEL, value = "CrossRef 1", varArgs = "crossref1")
@JvmSetting(key = JvmSettings.PID_PROVIDER_TYPE, value = CrossRefDOIPidProvider.TYPE, varArgs = "crossref1")
@JvmSetting(key = JvmSettings.PID_PROVIDER_AUTHORITY, value = "10.5072", varArgs = "crossref1")
@JvmSetting(key = JvmSettings.PID_PROVIDER_SHOULDER, value = "FK2/", varArgs = "crossref1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are very close to the values for the provider ez1 above - perhaps changing to 10.5075 and/or to FK5/ to make a few more characters different would make it clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will use sample CrossRef values

@JvmSetting(key = JvmSettings.PID_PROVIDER_TYPE, value = CrossRefDOIPidProvider.TYPE, varArgs = "crossref1")
@JvmSetting(key = JvmSettings.PID_PROVIDER_AUTHORITY, value = "10.5072", varArgs = "crossref1")
@JvmSetting(key = JvmSettings.PID_PROVIDER_SHOULDER, value = "FK2/", varArgs = "crossref1")
@JvmSetting(key = JvmSettings.PID_PROVIDER_MANAGED_LIST, value = "cr:20.20.20/FK2ABCDEF", varArgs ="crossref1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values in the managed list for this provider need to be DOIs - of the form doi:10.5076/FK7ABCDEF, etc. (The hdl example above uses the standard Handle form.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this an empty list

@@ -133,6 +145,7 @@ public static void setUpClass() throws Exception {
pidProviderFactoryMap.put(HandlePidProvider.TYPE, new HandleProviderFactory());
pidProviderFactoryMap.put(FakeDOIProvider.TYPE, new FakeProviderFactory());
pidProviderFactoryMap.put(EZIdDOIProvider.TYPE, new EZIdProviderFactory());
pidProviderFactoryMap.put(CrossRefDOIPidProvider.TYPE, new CrossRefDOIProviderFactory());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that the test changes don't yet do anything because the code above doesn't look for jvm options with crossref1 in them. Once those changes are made, I think adding to the tests here to assure that a crossref doi - one with authority 10.5072 and shoulder FK2/ as currently defined, is recognized by a provider with the id crossref1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add that to JvmSettingsTest.java

@cmbz cmbz added the FY25 Sprint 5 FY25 sprint 5 label Aug 28, 2024
@pdurbin pdurbin changed the title 8581 crossref provider2 Crossref as a DOI provider Aug 28, 2024

This comment has been minimized.

@stevenwinship stevenwinship removed their assignment Aug 29, 2024
@qqmyers qqmyers self-assigned this Sep 4, 2024
CrossRef has two APIs that are used in Dataverse:

The base URL of the `CrossRef <https://api.crossref.org>`_,
used to mint and manage DOIs. Current valid values for ``dataverse.pid.*.datacite.mds-api-url`` are "https://mds.datacite.org" (production) and "https://mds.test.datacite.org" (testing, the default).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about DataCite values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified the doc

@@ -2841,6 +2871,50 @@ should delete the old JVM option and the wrapped password alias, then recreate
as shown for :ref:`dataverse.pid.datacite.password` but with the EZID alias
name.

.. _dataverse.pid.crossref.url:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are about legacy support which is no longer implemented - they can all be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

CrossRefRESTfullClient client = getClient();
doiExists = client.testDOIExists(identifier.substring(identifier.indexOf(":") + 1));
} catch (Exception e) {
logger.log(Level.INFO, identifier, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a warning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called a lot just to see if the DOI exists. It's just a check before it creates one.

logger.log(Level.FINE, jsonMetadata);
metadata.put("_status", mappedJson.get("status").toString());
} catch (RuntimeException e) {
logger.log(Level.INFO, identifier, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same - a warning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the code from DOIDataCiteRegisterService which logs info level

}

public String reserveIdentifier(String identifier, DvObject dvObject) throws IOException {
logger.info("Crossref reserveIdentifier");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and probably fine for all of these that just tell you that you used the method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


class CrossRefFileUtil {

public static void close(InputStream in) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is just used with readAndClose where the stream comin in is in a try-with-resources statement. Do we need any of this (the finally/close parts)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed try catch finally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So close() is no longer used anywhere?

String data = EntityUtils.toString(response.getEntity(), encoding);
if (response.getStatusLine().getStatusCode() != 200) {
String errMsg = "Response from getMetadata: " + response.getStatusLine().getStatusCode() + ", " + data;
logger.info(errMsg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more that should be .warning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@@ -133,6 +145,7 @@ public static void setUpClass() throws Exception {
pidProviderFactoryMap.put(HandlePidProvider.TYPE, new HandleProviderFactory());
pidProviderFactoryMap.put(FakeDOIProvider.TYPE, new FakeProviderFactory());
pidProviderFactoryMap.put(EZIdDOIProvider.TYPE, new EZIdProviderFactory());
pidProviderFactoryMap.put(CrossRefDOIProvider.TYPE, new CrossRefDOIProviderFactory());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to the end of the testDOIParsing() test:

        String pid7String = "doi:10.11111/DVN/ABCDEF";
        GlobalId pid7 = PidUtil.parseAsGlobalID(pid7String);
        assertEquals(pid7String, pid7.asString());
        assertEquals("crossref1", pid7.getProviderId());

That at least tests that the a CrossRef provider gets created by the factory and the authority/shoulder params are being set correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -21,11 +21,17 @@ void lookupSetting() {
void lookupPidProviderSetting() {
assertEquals("test", JvmSettings.DATACITE_USERNAME.lookup("datacite"));
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this adds anything/suggest removing. The DataCite example is nominally just a test of the lookup(param) call and this just repeats that with another setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I suggested a little cleanup and a new test. I did look through the 'original' code that interacts with the CrossRef service and creates the required metadata, but did not dig into details of whether the API calls work/etc. (I assume it was working code when it came to us.)

Beyond that, I think it looks OK to move forward. It appears that we are not going to be able to acquire CrossRef credentials to test the code in the short term, which is basically like the EzID provider where we don't have creds to test (afaik), so we'll have to rely on users to report bugs.

@qqmyers qqmyers assigned stevenwinship and unassigned qqmyers Sep 4, 2024

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

@qqmyers qqmyers removed their assignment Sep 11, 2024
@cmbz cmbz added the FY25 Sprint 6 FY25 Sprint 6 label Sep 11, 2024
@pdurbin pdurbin changed the title Crossref as a DOI provider Crossref as a DOI provider (experimental) Sep 17, 2024
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:8581-crossref-provider2
ghcr.io/gdcc/configbaker:8581-crossref-provider2

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@pdurbin pdurbin self-assigned this Sep 17, 2024
@pdurbin pdurbin merged commit ec882e3 into develop Sep 17, 2024
12 checks passed
@pdurbin pdurbin deleted the 8581-crossref-provider2 branch September 17, 2024 17:51
@pdurbin pdurbin added this to the 6.4 milestone Sep 17, 2024
@pdurbin
Copy link
Member

pdurbin commented Sep 17, 2024

Merged. As discussed at standup (and mentioned above) this support for Crossref as a DOI provider is experimental, especially because we were unable to obtain and credentials to test with. Community, please report bugs and create pull requests to fix them as our hands somewhat tied.

I did run the code locally and was able to publish a dataset using the FAKE provider. I also confirmed that API tests are passing.

@pdurbin pdurbin removed their assignment Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 4 FY25 Sprint 4 FY25 Sprint 5 FY25 sprint 5 FY25 Sprint 6 FY25 Sprint 6 Size: 10 A percentage of a sprint. 7 hours. Type: Feature a feature request
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

Adding Crossref as a DOI provider
6 participants