-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error setting up harvesting client for ICPSR on UNC Dataverse and Demo Dataverse #7497
Comments
@jggautier I'm able to set up the client via dataverse5.odum.unc.edu but harvesting doesn't go well, throwing one of two errors for each identifier:
|
Identifier confusion The error "Failed to find a global identifier in the OAI_DC XML record" reminds me of the issue in #5050. In that issue, @JingMa87 found that Dataverse wants only identifiers that are DOIs or HDLs, and when the oai_dc record has two identifiers, Dataverse looks only at the first identifier. ICPSR oai_dc records have two dc:identifier elements, and its first is not the record's DOI.
(This issue wasn't resolved in #5050 because the scope was about Zenodo, which doesn't have this problem. The problem that we decided to resolve was about how restrictive Dataverse was when figuring out if the first identifier is a DOI or HDL.) I would expect Dataverse to throw this "Failed to find a global identifier" error for all of these ICPSR records, since I think they all have two dc:identifier elements and the first element contains just a number. But some records aren't getting this "Failed to find a global identifier" error? In cases where a record has more than one identifier, is there a way to have Dataverse look for a DOI or HDL first, maybe the first one it comes across, then try to use that when importing the metadata? java.lang.NullPointerException Maybe @scolapasta or @landreev can help with this? Not sure what it means and why it happens for some records and not all. Which metadata format and "Archive Type" to use Looks like you're using oai_dc. Which "Archive type" are you using? I know oai_dc should always work, but I've been encouraged to use and have been testing with the "ICPSR" Archive Type (and the oai_ddi25 metadata format since that should yield more metadata from each record). Although this seems to have its own problems: #7498) |
@jggautier On your not being able to add the client on demo.dataverse.org - is |
hey @donsizemore. I see dataverse.siteUrl referenced in the installation and configuration guides but still don't know what it means and don't think I can check. @kcondon, would you know? |
@jggautier siteUrl is a jvm-option that specifies the url to use to access this dataverse from the outside world, eg. https://dataverse.harvard.edu versus a local machine name, http://machine1.harvard.edu . Some functionality, over time, has made use of this setting and so will not function properly when not configured. It can be confusing because there is another, similar option, fqdn, that is just the full hostname accessible by the outside world rather than the url, eg. dataverse.harvard.edu. It does not include the protocol (https) nor the optional port number (443) . For convenience but maybe not ease of reading, there is a syntax that allows you to define one in terms of the other: In this case: |
Thanks for explaining! I found a Dataverse support tickets with a text file attached that lists a Dataverse installation's jvm-options, including its dataverse.siteUrl. How would I see what Demo Dataverse's JVM options are? (I'm assuming I don't have access to it since I've never needed to.) |
@jggautier I've put the two options from demo mentioned in the previous comment. |
Sometime between the last comment in this issue and today (maybe "2021-03"), a notice was added to the top of the ICPSR documentation page stating that ICPSR is retiring its OAI-PMH service. At the end of this month (June 2021), it won't be available. (The notice also reads that they are "exploring an API-focused solution that will involve delivering metadata using the DCAT-US schema", but I think that should be addressed outside of this GitHub issue.) There are some IQSS grant funded projects in the planning phases for improving the Dataverse software's harvesting capabilities, but that will start sometime after this month, which I think means that the OAI-PMH harvesting problems described in this and related GitHub issues won't be resolved in time for Dataverse repositories to harvest ICPSR's metadata before their OAI-PMH service is retired. I think this and the related GitHub issues should be closed and we should follow up with ICPSR. The notice reads that they don't know when the API-focused solution will be completed, but I'm curious why they're retiring the OAI-PMH service. |
Thu-Mai at Odum/UNC let us know today (see RT support email) that UNC's Dataverse-based repository shows the following error during the first step of creating a harvesting client using the server URL https://www.icpsr.umich.edu/icpsrweb/neutral/oai/studies:
Demo Dataverse reports the same error when I try to create a harvesting client using https://www.icpsr.umich.edu/icpsrweb/neutral/oai/studies or ICPSR's "citations" Server URL (https://www.icpsr.umich.edu/icpsrweb/neutral/oai/citations)
UNC is running Dataverse version 4.16. Demo Dataverse is running 5.3.
Documentation of the two ICPSR OAI-PMH feeds is at https://www.icpsr.umich.edu/web/pages/membership/or/metdata/oai.html.
Harvard Dataverse, running 5.3, and Dataverse instances I create on AWS, do not show this error. I'm able to get through all four steps for creating a harvesting client for ICPSR.
I originally reported this bug in Harvard Dataverse's Github repo at IQSS/dataverse.harvard.edu#63, but this issue isn't really specific to Harvard Dataverse so I moved it here.
The text was updated successfully, but these errors were encountered: