Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Idea: Support compliance with the Open Science Horizon Europe Model Grant Agreement (OS HE MGA) Requirements #10196

Open
philippconzett opened this issue Dec 20, 2023 · 18 comments
Labels
Type: Feature a feature request

Comments

@philippconzett
Copy link
Contributor

philippconzett commented Dec 20, 2023

Overview of the Feature Request
The European Research Council (ECR) is currently running a survey to collect data for updating the Study on the Readiness of Research Data and Literature Repositories to Facilitate compliance with the Open Science Horizon Europe Model Grant Agreement (OS HE MGA) Requirements. A PDF version of the survey, which was commissioned to a group of independent experts by the European Research Council Executive Agency (ERCEA) is attached to this issue.

This is an umbrella issue to cover the following features needed to achieve full Dataverse support for compliance with the Open Science Horizon Europe Model Grant Agreement (OS HE MGA) Requirements:

  1. Dedicated and separated metadata field for Grant project acronym (see survey question 4.1)
  2. Dedicated and separated metadata field for PID(s) for the author(s)’ organisation/affiliation (eg. ROR ID) (see survey question 4.1)
  3. Dedicated and separated metadata field for Grant project PID(s) (eg. Grant DOI) (see survey question 4.1)
  4. Dedicated and separated metadata field to indicate "Horizon Europe" (see survey question 4.2)

What kind of user is the feature intended for?
API User, Depositor, Guest

What inspired the request?
The ECR survey mentioned above.

What existing behavior do you want changed?
Add the metadata fields mentioned above.

Any brand new behavior do you want to add to Dataverse?
No, not brand new, but extending metadata capturing and exposing for harvesting.

Any open or closed issues related to this feature request?

@philippconzett philippconzett added the Type: Feature a feature request label Dec 20, 2023
@DS-INRAE
Copy link
Member

Thanks for the issue Philip, we created a Project field for some of these needs, we were currently planning to merge it back into funding so a discussion on the topic would be interesting. Here is the current state of our project block (in citation) :
image
We've very recently also been asked to add a field for the deliverable in addition to WP and Task we have.

@philippconzett
Copy link
Contributor Author

Thanks, Dimitri. It would be good to add these fields to the main distribution of Dataverse.

@pdurbin
Copy link
Member

pdurbin commented Jan 5, 2024

Dedicated and separated metadata field to indicate "Horizon Europe" (see survey question 4.2)

Here's 4.2:

Screenshot 2024-01-05 at 8 59 27 AM

Does Horizon Europe have a PID? The PID for the NIH in the US, for example, is http://dx.doi.org/10.13039/100000002

@DS-INRAE
Copy link
Member

DS-INRAE commented Jan 8, 2024

Interesting question, Horizon Europe is the funding programme and not the agency, but maybe they also have identifiers and would need an additional field for programme identifier.

@DS-INRAE
Copy link
Member

DS-INRAE commented Jan 8, 2024

Other question, should we take the work on these fields as an opportunity to change the current bloc name "grantNumber" ?

@jggautier
Copy link
Contributor

jggautier commented Jan 8, 2024

Hey all. I've been researching how people describe who funds the research data they deposit in order to help improve how Dataverse collects and distributes that metadata, and updating the GitHub issue at #4859, whose scope has broadened, beyond what the issue title suggests, to account for what we've learned so far about how folks are using certain metadata fields.

So I'm very interested in this issue, too, and have questions.

@DS-INRA, why'd you mention that Horizon Europe is the funding programme and not the agency? Is it because NIH, which @pdurbin mentioned, is an agency? Or because there's a field that ships with Dataverse called Funding Information Agency?

I'm wondering if the distinction between an agency and funding programme is important and why. Although the label and tooltip text for the Funding Information Agency field that ships with Dataverse, in the Citation metadata block, has the word "Agency", we don't mean to limit the types of funders to "agencies", and I don't think the DataCite and DDI metadata standards that informed the design of the fields mean to limit funder types to "agencies" either.

@philippconzett, when you wrote that Dataverse should add these two metadata fields that you mentioned:

  • Dedicated and separated metadata field for Grant project PID(s) (eg. Grant DOI) (see survey question 4.1)
  • Dedicated and separated metadata field to indicate "Horizon Europe" (see survey question 4.2)

... are you saying that the fields that ship with Dataverse don't include dedicated and separated metadata fields for this metadata?

Why couldn't people use the current Funding Information fields for this? Such as:

Screenshot 2024-01-08 at 1 38 49 PM

Lastly, on Demo Dataverse, when people use the Funding Information Agency field, they're able to choose organization names suggested from the Crossref Funder Registry, and several things show up when I enter "Horizon Europe". I haven't looked too closely at what appears, but I wonder if Horizon Europe does have an entry in the Crossref Funder Registry.

@pdurbin
Copy link
Member

pdurbin commented Jan 8, 2024

Yes, what @jggautier said. I'm also wondering if people can just type "Horizon Europe" under the existing field.

@jggautier
Copy link
Contributor

jggautier commented Jan 9, 2024

On the other hand, the study's final report, ERC Study on repositories - final report.pdf at https://zenodo.org/records/7728016 - makes me wonder if another field is needed for repositories that need to be able to comply with these requirements.

Before I read the study, I thought "Horizon Europe" was the name of the funder. And I thought that the terms "Funding Stream"; grant or funding number; and Grant or funding PID were all describing the same concept.

But on page 32 of the study, they write more about "Funding Streams":

"We considered the Funding Stream per definition of OpenAIRE, where information regarding the Funding programme (FP7, H2020, Horizon Europe) is provided. Also, we included the information about the Funder, as repositories typically need to report funders other than the European Commission."

And on the table on that page, they describe "Horizon Europe" as a Funding Stream:

Screenshot 2024-01-09 at 10 18 41 AM

So it seems like it's a separate concept. That is:

  • A Funder, like the European Union, can have one or more Funding Streams
  • A Funding Stream, like Horizon Europe, can have one or more Grants
  • And of course each Grant can have a Grant Number and a persistent identifier

And a dedicated field for Funding Stream makes more sense. Neither of Dataverse's fields for funding metadata, "Funding Information" and "Contributor Name," include a dedicated field for "Funding Streams". So people have entered "European Union" and "Horizon Europe" in Dataverse's Funding Information Agency field (such as https://doi.org/10.34810/data686), and others have entered "European Union’s Horizon 2020 research and innovation programme" or "European Union‘s Horizon 2020" in either of Dataverse's funder fields: either Funding Information Agency or the Contributor Name field (where they chose Funder as the Contributor Type).

I'm confused about the Project Name concept, which the study's authors mention earlier in their report (page 21) as a requirement of Horizon Europe MGA. At https://openscience.cuni.cz/OSCIEN-90.html about the metadata that Horizon Europe beneficiaries should include, there's no mention of Project Names, but they do mention "Grant project name, acronym and number".

It's also interesting that the report's authors write that "OpenAIRE compliance for the repositories included in the study was derived from the OpenAIRE website" and that their definition of Funding Stream comes from OpenAIRE. But OpenAIRE's metadata guidelines don't include a way to record "Funding Streams" as distinct from Funder Names. Even later versions of the DataCite standard, which removes the "Funder" type from their list of Contributor types and adds a Funding References field and child fields, doesn't include a field (or property) for a Funding Stream.

When the community was designing Dataverse's OpenAIRE metadata export, we wrote that we need to be "able to share metadata about data in the way OpenAIRE is requiring, by using OAI-PMH to harvest OpenAIRE-compliant metadata", and that we'd make design decisions based on a more recent version of DataCite, version 4.1, believing that later versions of OpenAIRE's recommendations would include changes that DataCite made to its standard. More discussion about this is at https://groups.google.com/g/dataverse-community/c/OALTzINxkX0/m/v_WwJ4cvAwAJ, #4257 (comment), and #5889.

That change in the DataCite standard included how DataCite would like funding metadata included in the DataCite standard. So Dataverse has been adding funding metadata in Dataverse's OpenAIRE export, and available for OAI-PMH harvesting, to the funderName child field (or subproperty) of the fundingReference field (or property). For example, here's what some funding metadata looks like when included in Dataverse's OpenAIRE export:

Screenshot 2024-01-09 at 10 23 51 AM

This doesn't seem to comply with OpenAIREs guidelines, which expects the funder metadata in the Contributor field, although that seems even more inadequate for the type of metadata requirements being written about in the "ERC Study on repositories" report.

So I think my questions are:

  • In the survey results, how accurate is the data about the repositories that collect "Funding Stream" metadata? Among the 220 repositories they reviewed, we can see survey results for at least 9 repositories that use the Dataverse software, but the study authors wrote that the survey results might be inaccurate either because the repositories who self-reported might have interpreted the survey questions differently or the study's authors might have interpreted things differently when they needed to get the information themselves. The survey results are in "ANNEX 3 - Study curated data.xlsx" in the dataset at https://doi.org/10.5281/zenodo.7728016, and those 9 Dataverse repositories I could see are (1) Australian Data Archive, (2) Data Station Archaeology, (3) DataRepositóriUM, (4) DataverseNO, (5) Harvard Dataverse, (6) KU Leuven RDR, (7) Qualitative Data Repository, (8) Tilburg University Dataverse, and (9) UNC Dataverse. Should we contact the study's authors to ask? @philippconzett, would you be able to, or would you mind if I did?

  • Is how I've described the differences among the terms "Funders", "Funding Streams", "Grant Numbers", and Grant PIDs right?

  • How do the concepts "Project Name" and "Project Acronym" fit into how I described the differences among the terms "Funders", "Funding Streams", "Grant Numbers", and Grant PIDs? And how do the "Project Information" fields that @DS-INRA mentioned relate to all of this?

  • Are we recommending that funding metadata be organized as part of a Project Information field that would be added to the metadata fields that ship with the Dataverse software? And if so, do all datasets have Project Names? Would this work for most repositories using Dataverse?

  • How are 3 of the 9 repositories in compliance with OpenAIRE, according to the survey results, despite how those repositories export funder metadata in their OpenAIRE exports in a way that doesn't follow OpenAIRE's current metadata requirements? The three repositories are DataRepositóriUM, DataverseNO, and KU Leuven RDR. I'm thinking of asking the folks who work on OpenAIRE's metadata requirements.

@DS-INRAE
Copy link
Member

DS-INRAE commented Jan 9, 2024

So it seems like it's a separate concept. That is:

  • A Funder, like the European Union, can have one or more Funding Streams
  • A Funding Stream, like Horizon Europe, can have one or more Grants
  • And of course each Grant can have a Grant Number and a persistent identifier

That is correct, for European project typically the EU is the funder, with several distinct streams (e.g. Horizon 2020, Horizon Europe, ...) which have a determined number of grants for which projects are made.

How do the concepts "Project Name" and "Project Acronym" fit into how I described the differences among the terms "Funders", "Funding Streams", "Grant Numbers", and Grant PIDs? And how does the "Project Information" fields that @DS-INRA mentioned relate to all of this?

They are additional informations , only the identifer (here Grant Number) is the same for the grant/funding/project
Here is a concrete example for a european project(see https://doi.org/10.3030/857650) :

  • Funder : European Union
  • Funding Stream : Horizon 2020
  • Grant Number 857650
  • Grant PID : https://doi.org/10.3030/857650
  • Project Acronym : EOSC-Pillar
  • Project Name : Coordination and Harmonisation of National Inititiatives, Infrastructures and Data services in Central and Western Europe

Are we recommending that funding metadata be organized as part of a Project Information field that would be added to the metadata fields that ship with the Dataverse software?

It would be best to have the funding and project merged as as seen previously they overlap, we should discuss on the labels as not all funding might be called "projects" though.

And if so, do all datasets have Project Names?

In our case, no, even for some dataset that may have funding from other sources (e.g. from administrative regions)

@DS-INRAE
Copy link
Member

DS-INRAE commented Jan 9, 2024

How are three of the nine repositories in compliance with OpenAIRE, according to the survey results, despite how funder metadata is currently included in Dataverse's OpenAIRE export in a way that doesn't follow OpenAIRE's requirements? The three repositories are DataRepositóriUM, DataverseNO, and KU Leuven RDR. I'm thinking of asking the folks who work on OpenAIRE's metadata requirements.

You can contact Pedro Principe in the community (I don't know his GH account) who is both involved at DataRepositóriUM and OpenAIRE provide :)

@philippconzett
Copy link
Contributor Author

@jggautier Thanks for your thorough and informative follow-up on this issue!

Should we contact the study's authors to ask? @philippconzett, would you be able to, or would you mind if I did?

That's exactly what came to my mind when I was reading your comment. Please go ahead and contact them. I'm sure they'll be more than happy to discuss these issues with you.

Is how I've described the differences among the terms "Funders", "Funding Streams", "Grant Numbers", and Grant PIDs right?

To my understanding, yes.

As for your other questions, I think they could be discussed in a common meeting with the others of the report mentioned above, members of the OpenAIRE team and members of the Dataverse community.

@jggautier
Copy link
Contributor

jggautier commented Jan 9, 2024

Thanks @DS-INRA and @philippconzett

@philippconzett, about your second point about "a dedicated and separated metadata field for PID(s) for the author(s)’ organisation/affiliation (eg. ROR ID)", the Author Affiliation field on Demo Dataverse has been changed so that we could evaluate how well the design of this implementation of the "external controlled vocabulary" functionality helps people add their affiliation and helps repositories collect and distribute persistent IDs of author affiliations. That work is being described at #9151, although it's also related to how we collect funding metadata, since we also want to collect and distribute persistent IDs of funders, and the thinking so far has been to use the same "external controlled vocabulary" functionality for this, using the persistent IDs and other metadata from the Crossref Funder Registry, although that's being deprecated and the folks at ROR are working to make sure they can be a good replacement.

The requirement for a "dedicated and separated metadata field" is interesting, too, and maybe worth clarifying. The way the Author Affiliation field is designed on the deposit form on Demo Dataverse, there is no separate field for the persistent ID of the author's affiliation. But if the depositor chooses an organization that the field suggests, Dataverse records that persistent ID. So by "dedicated and separated metadata field", I'm assuming they mean to discourage repositories from letting depositors put this information in something like a "catch-all" field, like Description or Notes, which right now would make the metadata less machine-readable and less interoperable.

I'll email the study's authors and @pedroprincipe to ask about the differences between what the study is evaluating and OpenAIRE's metadata guidelines.

I'll need to think more about how to help organize a common meeting that's effective and timely 🤔

@philippconzett
Copy link
Contributor Author

philippconzett commented Jan 18, 2024

Here are my notes from our meeting earlier today (thank you to Anna Pelagotti, Dagmar Meyer, and Emma Lazzeri for feedback):

Requirements from European Commission for Horizon Europe projects (including European Research Council)

  1. Indicate "Horizon Europe"
  2. Metadata have to be machine-actionable ---> implies one metadata field for each piece of information. For reference:
  3. OpenAIRE uses the term "Funding Stream"; see OpenAIRE guidelines for repositories and OpenAIRE Graph documentation on Projects: https://graph.openaire.eu/docs/data-model/entities/project/
  4. Displayed on OpenAIRE EXPLORE, e.g., https://explore.openaire.eu/search/project?projectId=corda__h2020::2f32bdaa3c76066bd8267f3ac90ba898
  5. OpenAIRE creates overviews of research outputs with information about funding, including information about specific ERC grants.

How to comply

  1. Have a separate field for each of these:
    Funder: European Union
    Funding Stream: Horizon 2020
    Grant Number: 857650
    Grant PID: https://doi.org/10.3030/857650
    Project Acronym: EOSC-Pillar
    Project Name: Coordination and Harmonisation of National Inititiatives, Infrastructures and Data services in Central and Western Europe

How to reduce errors

  1. Use controlled vocabularies / PIDs where possible; e.g. Zenodo uses a direct link to EU's list of projects. [1] FundRef has DOI for different ERC funding programs. [2] FundRef is merging with ROR.
  2. Comply with OpenAIRE Guidelines for Data Archives. Need to reach out to OpenAIRE community to make sure OpenAIRE guidelines already have been aligned with the new European Commission Horizon Europe requirements.
  3. Dataverse repositories need to validate their metadata compliance with OpenAIRE guidelines.

How to proceed with the ERC survey

  1. ERC will send clarifications of things in this GitHub issue.
  2. Repository managers in Dataverse community will discuss how to align responses to ERC survey.
  3. ERC will share preliminary survey results with repository managers and ask for clarifications/amendments if needed.

[1] https://cordis.europa.eu/projects/en
[2] PIDs for Horizon Europe etc.: https://data.crossref.org/fundingdata/funder/10.13039/100018693; https://data.crossref.org/fundingdata/funder/10.13039/100019188; https://data.crossref.org/fundingdata/funder/10.13039/100019180; https://data.crossref.org/fundingdata/funder/10.13039/100010663; https://data.crossref.org/fundingdata/funder/10.13039/100011199; https://data.crossref.org/fundingdata/funder/10.13039/501100000781; https://ror.org/0472cxd90

@DS-INRAE
Copy link
Member

Hi, just to give an update, we are still planning to contribute on this, we should start on the first steps (imo adding the fields) in our april sprint

@jggautier
Copy link
Contributor

Hi all. After the January 18 meeting that @philippconzett mentioned, I emailed the folks who joined the meeting and some others with follow up questions, next steps and other things we need to consider, and thought I'd include those in this GitHub issue.

My take was that we all agreed that a major goal here is to make it easier for the folks from the European Commission to be able to track outputs of the research they fund, particularly by using OpenAIRE's infrastructure.

So it's important that we're able to connect with folks from the European Commission who track research outputs and with grantees who need to make sure that their funders are aware of the data and code they publish, so that our understanding of their experiences can inform the changes we make to Dataverse and so that we're able to evaluate how their experiences are changed (and hopefully improved!) by those changes to Dataverse.

As @philippconzett mentioned, we need to learn from folks at OpenAIRE about their metadata guidelines, such as how they hope the existing guidelines or their changes to the guidelines will ensure that their systems can make it easy for funders and grantees to report and track research outputs.

It would be helpful to understand how Zenodo sends to OpenAIRE the funder metadata they collect, given Zenodo's close association with the European Commission and OpenAIRE.

And we need to make sure we're aware of similar efforts being discussed to help stakeholders with similar goals, like folks from NIH funding groups and folks who manage other Dataverse repositories. This includes discussions in the GitHub issue at #4859 and several GitHub issues about using ROR, like #6640.

So @DS-INRA before any metadata fields are added, I'm recommending these next steps:

  • Connecting with people who use OpenAIRE's platform for tracking the outputs of the research they fund (under the funding stream "Horizon Europe") to understand what their experiences are like now and how those experiences change as we make changes to Dataverse
  • Connecting with grantees who have or need to publish data and acknowledge funding from the European Commission, to understand how they use Dataverse repositories and other systems, especially OpenAIRE's platform where they can augment the metadata of their datasets to ensure that the European Commission is aware of their published research outputs
  • Connecting with folks from OpenAIRE to learn if and how they plan to update their metadata guidelines for data repositories (as @philippconzett wrote) and learn how they index metadata from Zenodo
  • Connecting with folks from Zenodo to learn how OpenAIRE's platform retrieves and indexes the metadata of the outputs of research funded by the European Commission

I also plan to update the Github issue at #4859 with next steps, and I'm connecting with folks from the NIH so that we can get a better understanding of how they track research outputs and so that we can rely on those connections later to evaluate how their experiences are changed by the changes we make to Dataverse.

It would be really helpful if we could continue discussing these steps and how we might collaborate on them. Being able to scale our ux research is the goal of that UX working group I've been exploring, and this effort seems like a good way to see how we might leverage more of the Dataverse community's resources.

@CB-HAL
Copy link

CB-HAL commented Sep 11, 2024

To fulfil the HE MGA requirements we also integrated now new metadata fields in our DV e.g. https://data.aussda.at/dataset.xhtml?persistentId=doi:10.11587/D3PZEA

grafik

Nonetheless, an official solution would be a good idea. In addition, the Organization PID issue is still open
Feature Request/Idea: Allow ORCID and ROR to be used together in author field
Support Research Organization Registry (ROR) IDs #6640
I added also a new issue for the 2024 requirement “separate embargo field”
Feature Request: Metadata field for embargoed datasets #10833

Requirements HE MGA survey 2024:
In order to reach the “Exemplary Readiness Level”

  • The repository metadata structure contains a separate "License" field, i.e. column X value is “Yes”
  • The repository assigns persistent unique identifiers to contents, i.e. column Y value is “Yes”
  • The repository metadata are machine-actionable, i.e. column Z value is “Yes”,
  • The repository metadata is standardised, i.e. column AA value is “Yes”
  • The repository metadata are open under a Creative Common Public Domain Dedication (CC 0) or equivalent, i.e. column AE value is “Yes”
  • The repository metadata allow reference to "Horizon Europe", i.e. column AL value is “Yes, through a separate field”
  • Linked resources PID(s), i.e. either column AC or AD value is “Yes, through a separate field”;
  • Author(s), i.e. column AF value is “Yes, through a separate field”;
  • Description, i.e. column AG value is “Yes, through a separate field”;
  • Date of publication/deposit, i.e. column value AI is “Yes, through a separate field”;
  • Embargo, i.e. column AK value is “Yes, through a separate field”;
  • Grant Project name, i.e. column AM value is “Yes, through a separate field”;
  • Grant project acronym, i.e. column AN value is “Yes, through a separate field”;
  • Grant project number, i.e. column AO value is “Yes, through a separate field”;
  • Record PID, i.e. column AP value is “Yes, through a separate field”;
  • Author(s) PID, i.e. column AQ value is “Yes, through a separate field”;
  • Organisation PID, i.e. column AR valueis “Yes, through a separate field”;
  • Grant PID, i.e. column AS is “Yes, through a separate field”.

@pdurbin
Copy link
Member

pdurbin commented Sep 11, 2024

Interesting. #4859 was opened because there are two ways to enter funding information. Now there are three, in the solution above. 😄

Yes, I agree that some sort of official solution would be nice. 🤔 Meanwhile, it looks like it's working for you! It looks like that field is under the citation block, though. As it's a custom solution it might be better under a custom metadata block.

These are just some idle thoughts as I catch up on GitHub comments. Thanks for pushing the envelope!

@CB-HAL
Copy link

CB-HAL commented Sep 12, 2024

I did it not as custom block, because the Funding Information and Grant Project Information are together the HE MGA requirements. I know, it's not a beautiful solution. Even more, I also but a ROR PID field into the author block. This is all temporary until there is an official solution. At the moment we have only 11 datasets with this requirements. Our funders demanded a solution for HE MGA requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature a feature request
Projects
Status: Interested
Status: High priority
Status: 🕒 Planned Development
Development

No branches or pull requests

5 participants