-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to create harvesting client for SRDA repository #7624
Comments
2023/12/19: Prioritized during meeting on 2023/12/18. Added to Needs Sizing. |
2023/12/19: @jggautier and @landreev will retest to see if problem still exists, then determine next steps afterwards. Also, issue should be moved to dataverse.harvard.edu. |
So, we have a configured SRDA harvesting client in prod. (harvesting one specific set, There appears to be some content successfully harvested via this client relatively recently. |
Their current working OAI endpoint appears to be https://srda.sinica.edu.tw/oai_pmh/oai. (not "/oai_pmh/oai2.php"). However, you will notice that redirects to the remote locations are NOT working. I will open a new dev. issue for this. But creating a client for this repo is working just fine now, after all these years. |
(to be precise, there's not one, but 2 different problems that prevent the redirects from working) |
Opened the dev. issue for the redirect issues (linked above). |
(this one itself is a non-dev. issue, no PR associated with it, so I dragged it into "In Review" directly, asking @jggautier to take a look before we close it) |
Should this issue still be moved to the Harvard Dataverse repo? It sounds like we should let the SRDA folks know that when we try to harvest their srda set, it fails with "the noSetHierarchy response from their server". Is that right? I'd be happy to email them to let them know that this is preventing us from harvesting from that set and ask if they can look into it. Since we're able to harvest from them when we don't specify, I wonder if we can do that instead. I can also ask them if we can do that. |
Since we are closing this issue, idk if it's worth moving it to the local repo - but, up to you. Yes, we should just harvest from them without specifying the set. That's what their server supports. The only, minor problem on their end is that their server is for whatever reason advertising this unsupported set under ListSets. I mentioned it just to warn you not to select it when configuring the client. You may want to let them know. But no, it isn't preventing us from harvesting from them. |
BTW, why was it important to harvest from them directly - as opposed to harvesting their records from Datacite, as set up in prod.? Unfortunately, the records in prod. harvested via that client are not properly redirecting at the moment - but that's because of the bug that I opened #10254 for (and I'm really hoping to fix it asap). Is the content expected to be different, between what we get from their own OAI vs. Datacite? (it looks like there are different numbers of records served between the two). |
I assumed that they created their own harvesting server and emailed us to avoid the issue(s) that Dataverse used to have with harvesting sets from DataCite. In our emails with them I pointed out that issue, but I didn't ask them explicitly why they want us to harvest from their own OAI. I also wondered if they wanted us to harvest from their own OAI because they wanted more control over what we harvested. I've been planning to email them again with our progress. Want me to ask them why exactly they'd like us to harvest from their OAI instead of from DataCite? |
I was just curious, really. It doesn't really matter. It's up to you - but maybe we should wait for it to be fixed before contacting them? So that we could show them harvested and working records, even if it's on one of our test servers - otherwise it just doesn't feel like "progress", when the harvested records are all broken - ? |
Yeah I agree. I could email them to let them know that Harvard Dataverse isn't able to harvest from the set they asked us to harvest from and ask them why'd they'd like us to harvest from that set, as opposed to harvesting everything in the repository by not specifying a set, which will work once #10254 is addressed. |
I emailed the folks at SRDA to let them know that a problem on their side prevents Harvard Dataverse from harvesting from their srda set, to let them know that Harvard Dataverse is able to harvest all of their metadata without specifying a set, and to ask if they would like Harvard Dataverse to harvest from that srda set or harvest without specifying the srda set. |
I'd like to close this issue (for accounting purposes; I will also resize to 10, since we have put more work into it this week). Would you mind creating a new issue in the local repo, something like "Harvest metadata from SRDA", to keep track of the remaining effort? (or I can create it there) |
Great, closing this issue sounds okay to me since SRDA folks let us know yesterday that it's fine for Harvard Dataverse to harvest from them without specifying a set and you wrote that Harvard Dataverse is able to do that. I'll close this issue with this comment, adjust the harvesting client for SRDA, and start the re-harvesting today. There's more info in our email thread with the folks at SRDA about what's going on with that srda set and why they're recommending using their OAI instead of harvesting from DataCite. |
Great, thanks. |
Ah okay. Although I already edited the harvesting client and told Harvard Dataverse to re-harvest. Why would you wait until 6.1 is in prod? Is it because until #10254 is addressed, clicking on the dataset titles won't lead users to the dataset, and that fix won't be applied to metadata that was harvested before #10254 is addressed? |
I'm going to include a quick patch into 6.1 as deployed here that will fix the redirects, yes. |
Ah okay. So I can create a new issue in the Harvard Dataverse repo like you suggested, to keep track of things to do to harvest SRDA's metadata. |
I'm unable to create harvesting clients in the Harvard Dataverse Repository and Demo Dataverse repository using SRDA's own OAI-PMH feed. It's base URL is https://srda.sinica.edu.tw/oai_pmh/oai2.php.
Identifying it works - https://srda.sinica.edu.tw/oai_pmh/oai2.php?verb=Identify - and so does listing records -https://srda.sinica.edu.tw/oai_pmh/oai2.php?verb=ListRecords&metadataPrefix=oai_dc.
But when trying to create a client, Harvard Dataverse Repository and Demo Dataverse show errors about the base URL being an "Invalid URL. Failed to establish connection and receive a valid server response."
Harvard Dataverse Repository is harvesting SRDA's records into https://dataverse.harvard.edu/dataverse/srda_harvested, using DataCite's OAI-PMH feed. The admins created their own feed and emailed the repository's support to ask that Harvard Dataverse Repository to use that feed instead of the records that DataCite has.
The SRDA repository's admins are troubleshooting and leaving updates in the support email thread at https://help.hmdc.harvard.edu/Ticket/Display.html?id=287243. They have already changed the base URL and may change it again so when this is investigated, that email thread should be checked for the latest info.
The text was updated successfully, but these errors were encountered: