-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are Dataverse installations' harvesting sets exporting regularly? #5392
Comments
This seems related to #5345 and the refactoring of timers. |
The number of harvested records that Harvard Dataverse has harvested also doesn't match the number of records in the following installation's OAI sets. I ran the harvests manually just to make sure.
(I checked the total number of records in each set using a python script at https://gist.github.com/rlskoeser/880a6f9f20bbaf9202fb) |
Hi Julian,
If you are harvesting all e-cienciaDatos records, Harvard Dataverse has
missed five records, but if you have harvested the openaire_data set, the
number of records in Harvard Dataverse is the right number.
Best Regards,
Juan
El jue., 13 dic. 2018 a las 0:11, Julian Gautier (<notifications@github.com>)
escribió:
… Harvard Dataverse also isn't able to harvest all records in the OAI sets
of these other Dataverse installations. I ran the harvests manually just to
make sure.
- DataverseNL, https://dataverse.nl/oai, set:tilburg_oai
- Consorcio Madroño, https://edatos.consorciomadrono.es/oai,
set:openaire_data
- Texas Data Repository, https://dataverse.tdl.org/oai, set:tdr
(I checked the total number of records in each set using a python script
at https://gist.github.com/rlskoeser/880a6f9f20bbaf9202fb)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5392 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAT5CBDMAuwNp6EffpJyXQ2SvYp5KbnWks5u4Y0YgaJpZM4ZOQnR>
.
|
@jggautier - I think we were going to check in about this in order to scope it but I lost track of it. Let's figure out what we're trying to estimate here so that we can work on it. |
Definitely. Harvard Dataverse is harvesting the 48 data projects (datasets) in QDR's "qdr_whole" oai-pmh set. Since this issue was opened, QDR has published two more data projects. All QDR data projects should be included in QDR's "qdr_whole" oai-pmh set, so the set is missing the 49th and 50th data projects. Just in case there's something happening other than an export timer problem, @djbrooke asked me to try to look into if there's anything different about the metadata of these two data projects (https://doi.org/10.5064/F65BVECY and https://doi.org/10.5064/F6RL3PS2) compared to the ones that are in the set. I haven't noticed anything, yet. |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
Harvard Dataverse harvests datasets from QDR's dataverse installation. An installation admin, @adam3smith, noticed that only 31 of QDR's 48 published datasets were harvested to Harvard Dataverse. He wanted to know why and what could be done so that Harvard Dataverse harvests all of QDR's datasets. QDR's OAI set includes all of its published datasets.
My dashboard on Harvard Dataverse showed that the last harvest was the most recent Sunday, so it looks like scheduled harvesting to import datasets from QDR is running on the schedule I set, every Sunday. I tried a manual harvest, pushing the "Run harvesting" button, and the dashboard reported another successful harvest, but 0 datasets were harvested, and the total was still 31.
Sebastian then tried a manual export of QDR's OAI set from his installation's dashboard. Then when I ran a manual harvest again, 17 datasets were harvested, so the total was 48. Many of these 17 datasets were published months ago. Only three datasets were published this December, and none were published in November.
Is the harvesting server exporting OAI sets on a regular schedule? Is it necessary that installation admins export their OAI sets themselves (e.g. pushing the "Run harvesting" button in their dashboard) so that other systems harvesting those sets can import the latest dataset metadata?
The text was updated successfully, but these errors were encountered: