Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GREI 3: HDV Task - Improve OAI-PMH Harvesting #171

Open
43 of 57 tasks
cmbz opened this issue Jan 22, 2024 · 12 comments
Open
43 of 57 tasks

GREI 3: HDV Task - Improve OAI-PMH Harvesting #171

cmbz opened this issue Jan 22, 2024 · 12 comments
Assignees
Labels
Dataverse Project Issues related to Dataverse Project software Feature: Harvesting GREI 3 Search and Browse Harvard Dataverse Issues related to Harvard Dataverse Repository Project: NIH GREI Tasks related to the NIH GREI project

Comments

@cmbz
Copy link
Contributor

cmbz commented Jan 22, 2024

Overview

"Our proposed project will significantly improve the widely-used Harvard Dataverse repository to better support NIH-funded research. A critical measure of the GREI program’s success is to standardize the discoverability across generalist repositories.

To help with this, we propose to improve the existing harvesting functionality in the Dataverse software based on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard, and coordinate with other repository packaging standards to share or move metadata and data. Dataverse already supports the Bags as defined by the Research Data Alliance (RDA) Research Data Repository Interoperability Working Group.

Here we proposed to improve the support for Bags, test it for NIH-funded datasets, and explore and define the appropriate standard to use to move the metadata and data across generalist repositories This will help with a sustainable and succession plan - if one repository cannot support anymore a specific dataset, it will allow to easily move the dataset to another repository without losing any information about the dataset." (Source)

Issues

New Features

Support harvesting from DataCite's OAI-PMH API

Spikes

Harvesting Issues (Year 3)

In Progress

Complete

Pending

These issues will be sized and prioritized once the In Progress issues are closed.

Additional Harvesting Issues (Year 4)

These issues will be prioritized during GREI Year 4 Planning.

Related

Resources

@cmbz cmbz added the GREI 3 Search and Browse label Jan 22, 2024
@cmbz cmbz self-assigned this Jan 22, 2024
@cmbz cmbz added Project: NIH GREI Tasks related to the NIH GREI project Harvard Dataverse Issues related to Harvard Dataverse Repository Dataverse Project Issues related to Dataverse Project software labels Jan 22, 2024
@cmbz
Copy link
Contributor Author

cmbz commented Feb 22, 2024

@cmbz
Copy link
Contributor Author

cmbz commented Mar 12, 2024

Status: March 2024

Meetings

A meeting was held on 2024/03/13 to:

  • Discuss current GREI harvesting issues defined here: GREI 3: HDV Task - Improve OAI-PMH Harvesting #171.
  • Discuss larger issues and spikes requiring substantive design and refactoring work and coordination
  • Define a plan of action for GREI Year 3 (and ideally, Year 4) to complete the known remaining harvesting issues and any new, in-scope issues that may arise.

Action Items

The following action items were identified:

Other Work Towards Improving Harvesting

Completed

@cmbz
Copy link
Contributor Author

cmbz commented Mar 12, 2024

Status: April 2024

  • @jggautier continued the March 2024 action item of re-creating harvesting clients using the most appropriate metadata format in order to harvest as much metadata as possible from each client. But he stopped due to technical issues being investigated, described at Investigate Solr performance issues (again) dataverse#10469, that are affecting harvesting.

@cmbz
Copy link
Contributor Author

cmbz commented Mar 20, 2024

Status: June 2024

  • @jggautier emailed the contacts of all repositories we've been emailing about Harvard Dataverse harvesting from their repositories. In most cases he let them know that we continue to work on challenges that prevent Harvard Dataverse from harvesting from these repositories, and in a couple of cases, such as what's described in #243, he asked about the status of related work that they are doing so that Harvard Dataverse is able to harvest from them.

@jggautier
Copy link

jggautier commented Jun 25, 2024

@landreev, @cmbz and @scolapasta, today or sometime this week I'm thinking of following up with contacts of the repositories who've emailed our support email addresses about harvesting, to let them know that we're continuing to work on improvements related to how Dataverse indexes records, and some of those improvements may affect harvesting. Most of the emails sitting in my RT queue are harvesting related:

342774511-51b92ddc-676c-42bc-af33-24d7a64d0c5d

@cmbz
Copy link
Contributor Author

cmbz commented Jun 25, 2024

@jggautier Great thanks! When you do, please update the June status comment, too #171 (comment)

@cmbz
Copy link
Contributor Author

cmbz commented Jul 11, 2024

@cmbz
Copy link
Contributor Author

cmbz commented Aug 14, 2024

Status: August 2024

Updates

Harvesting Issues

@cmbz
Copy link
Contributor Author

cmbz commented Sep 25, 2024

@cmbz
Copy link
Contributor Author

cmbz commented Oct 27, 2024

Status: October 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dataverse Project Issues related to Dataverse Project software Feature: Harvesting GREI 3 Search and Browse Harvard Dataverse Issues related to Harvard Dataverse Repository Project: NIH GREI Tasks related to the NIH GREI project
Projects
None yet
Development

No branches or pull requests

2 participants