Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Investigate how Dataverse stakeholders and users need to collect and use funder metadata #4859

Closed
jggautier opened this issue Jul 17, 2018 · 30 comments
Assignees
Labels
Feature: Harvesting NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations pm.GREI-d-2.5.1B year 2 continuation of 1.5.1 Size: 10 A percentage of a sprint. 7 hours. Status: Needs Input Applied to issues in need of input from someone currently unavailable Type: Suggestion an idea

Comments

@jggautier
Copy link
Contributor

jggautier commented Jul 17, 2018

In Dataverse 4.x Citation metadatablock, in the Contributor metadata field, there's a "Funder" contributor type:

screen shot 2018-07-17 at 11 33 30 am

The "Funder" type comes from DataCite's list of contributor types, added in their 3.x schema. I think we should remove the contributor type "Funder" because:

  1. It's a duplicate of Dataverse's "Funding Information Agency" field....

    Screen Shot 2023-01-27 at 12 48 42 PM

    ...and it's probably confusing depositors.

    • The "Funding Information Agency" fields are used more often across the known Dataverse installations, but there are cases where depositors entered the same funder names in both fields, cases where depositors entered a name in the Contributor field and not the "Funding Information Agency" field, and cases where depositors entered a name in the Contributor field, nothing in the Funding Information Agency field, and something in the Funding Information Identifier field:

    Screen Shot 2023-01-27 at 12 56 47 PM

  2. This complicates metadata exporting and makes it harder to find data based on who funded the research. For example, if we send funding metadata to DataCite, it won't except metadata that includes "Funder" as a Contributor Type. Newer versions of the DataCite standard don't include a "Funder" contributor type. (It was deprecated when a FundingReference property was added, so that more information about funding could be included in subproperties of FundingReference.)

Definition of done:

  • Dataverse team speaks with depositors who've used combinations of these fields to learn why.
    • The current design might be serving some user need we weren't aware of but may need to support. For example, one depositor entered a person's name in the Contributor field and nothing in the Funding Information Agency field. Was this because it didn't seem appropriate to put a person's name in the Funding Information Agency field? The field's watermark reads "Organization XYZ". Where should depositors enter the names of people, not organizations, who've funded the research?
    • User research can uncover more insights and unsupported needs
  • Dataverse team plans for how Dataverse repositories should move metadata in the Contributor Name field (when Contributor Type is Funder) to the Funding Information Agency field:
    • The community's doing something similar for the "multiple license" update. Following that example, we could
      • Recommend that installations check for cases where metadata exists in both fields
      • Provide database queries that move the values from the Contributor fields (when Contributor Type is Funder) to the Funding Information Agency field, except in cases where the same values exist in both fields, such as the dataset at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/064X5M&version=1.3
      • When Contribute Type "Funder" was chosen but nothing was entered in the Contributor Name field, remove Contribute Type "Funder" from the dataset version's metadata
      • Once no dataset versions have Contribute Name where Contribute Type is "Funder", and once no dataset versions have Contribute Type "Funder" where there is no Contribute Name, remove "Funder" from the list of Contributor Types in the citation.tsv and citation.properties files
  • The crosswalk for the SWORD API is updated. The funder name entered in the Funder Information Agency field should be mapped to dcterms:contributor
@jggautier
Copy link
Contributor Author

jggautier commented May 9, 2022

While updating the crosswalk I saw that when you use an Atom entry (XML) to create a dataset, funder metadata is mapped to the Contributor field, with Contributor Type being set to "Funder". So if we removed Contributor Type "Funder," the mapping done when an Atom entry is used to create a dataset would need to change so that funding info is mapped to the "Funding Information" field.

@mreekie mreekie added the NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... label Oct 6, 2022
@mreekie
Copy link

mreekie commented Nov 8, 2022

@pdurbin @jggautier There seems to be a question of where this should go. This issue is in the deliverable backlog but under 1.5.1. I'm not an expert here. Is it better addressed here or under NIH OTA 1.2.1 ?

@jggautier
Copy link
Contributor Author

jggautier commented Nov 8, 2022

I think this should be worked on as part of any effort to improve how Dataverse collects and exports funding metadata about datasets, which I think is a goal of NIH OTA 1.2.1 so I think it should be addressed in 1.2.1.

@pdurbin
Copy link
Member

pdurbin commented Nov 18, 2022

No strong opinion. It could be worked on under either.

@pdurbin
Copy link
Member

pdurbin commented Jan 18, 2023

mapped to the "Grant Information" field (which is being renamed "Funding Information")

Just a note that yes, this has been renamed. From Harvard Dataverse running 5.12.1:

Screen Shot 2023-01-18 at 3 34 55 PM

@jggautier
Copy link
Contributor Author

Thanks @pdurbin. I updated the original comment.

@mreekie mreekie added pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations labels Mar 20, 2023
@jggautier
Copy link
Contributor Author

I'm talking with depositors in the Harvard repository who've used both fields in the same dataset most often, in order to learn why. I spoke with a manager from the WorldFish repository. They've used both fields only because when they create datasets and include funder names in the metadata, they often use a different platform instead of entering metadata in the Dataverse deposit form. And in the platform's deposit page, the metadata field for funder names, called "Donor", is mapped to Dataverse's "Contributor" field and given the "Funder" contributor type. We spoke about how they should update their platform so that they use the Funder Information field instead, and we'll schedule another meeting, hopefully one including developers of their platform, to review the changes being made to the Funder Information field (#9150).

They said it's fine if we move the funder names in their datasets' Contributor field to the Funder Information field.

This week I'll be meeting with the manager of another collection that most often adds funding metadata to their datasets to learn why they've used both fields.

@jggautier jggautier changed the title Remove the Contributor type "Funder" in Dataverse's citation metadatablock Remove the Contributor type "Funder" in Dataverse's citation metadatablock, move that metadata to the Funder Information Agency field Mar 21, 2023
@jggautier
Copy link
Contributor Author

jggautier commented Mar 21, 2023

I reviewed the metadata I collected from most known Dataverse installations in October 2022 (https://doi.org/10.7910/DVN/DCDKZQ, version 12) to learn which datasets have values in both fields. While Harvard Dataverse had the most of these kinds of datasets at the time (250), other Dataverse installations also have datasets with funder names in both fields. Here's a CSV file listing the datasets, which installation they're published in, and the funder names entered in both fields:
duplicateFundingFieldsInAllInstallations.csv

I've emailed the Dataverse Google Group to try to learn, from as many Dataverse installations as possible, why both fields were used and what we should consider when moving the funder names in the Contributor fields to the Funding Information fields. See this GitHub issue's original post, which I've been updating with a more detailed proposal.

@shlake
Copy link
Contributor

shlake commented Mar 22, 2023

All datasets in UVa Dataverse have "Funder" information in the grantNumber block. Examples:
https://doi.org/10.18130/V3/FRZYXV
https://doi.org/10.18130/V3/VJUZSH
https://doi.org/10.18130/V3/YWTLHC

BUT have made this block "displayoncreate" = TRUE
With this block showing on dataset creation, I hoped this would prevent the information going into some other field.

AND in hopes to make what goes in that field clearer (to US), I have changed the title and the description

name			title				description
grantNumber		Grant/Funding Information	Grant or Funding Information
grantNumberAgency	Grant/Funding Agency		Funding Agency
grantNumberValue	Grant Number			The grant or contract number of the project that sponsored the effort.

@jggautier
Copy link
Contributor Author

I just realized that this was removed from a Dataverse_Funded_Deliverables list last month, but I'm not sure what that means. @mreekie could you write about what that means?

I'm wondering if this will be addressed as part of efforts, such as #9150, to improve the quality of funding metadata in Dataverse repositories. I think it should; trying not to let it fall through the cracks.

@cmbz
Copy link

cmbz commented Apr 24, 2023

I will follow up with @mreekie to ascertain where this issue should be moved (e.g., back in to Global Backlog related to NIH deliverable)

@pdurbin
Copy link
Member

pdurbin commented Apr 24, 2023

@mreekie mreekie added the pm.GREI-d-2.5.1B year 2 continuation of 1.5.1 label Apr 24, 2023
@mreekie
Copy link

mreekie commented May 2, 2023

sizing:

  • This will likely involve database changes.
  • The changes may be similar to the changes done for the multiple license update where values had to be moved from one table to another.
  • There are 2 fields where funder name can be entered. We will be moving these fields but this cannot be done blindly. The data needs to be reviewed.
  • Would be best to get input on sizing from @qqmyers and @pdurbin and @jggautier
  • metadata exports will be impacted. Some exports have to look in two different places for the same information.

@cmbz
Copy link

cmbz commented May 17, 2023

  • Requires investigation and planning of technical work
  • May involve manual steps and automated steps
    • Example: Could develop a script that automatically identifies cases in your installation that might be affected
  • Will discuss during Tech Hours (Julian provides examples, includes previous examples from other installations)
  • Sizing will follow Tech Hours discussion

@cmbz cmbz added this to the 6.1 milestone Sep 18, 2023
@sekmiller sekmiller self-assigned this Oct 5, 2023
@jggautier
Copy link
Contributor Author

jggautier commented Oct 10, 2023

Just an update about the meeting @cmbz mentioned. This Friday @cmbz, @scolapasta and I will be talking about this while planning for how Dataverse should follow a set of metadata recommendations from NIH GREI.

@scolapasta scolapasta added the Status: Needs Input Applied to issues in need of input from someone currently unavailable label Oct 10, 2023
@cmbz
Copy link

cmbz commented Oct 16, 2023

2023/10/16

  • As per conversation between @jggautier, @scolapasta, and @cmbz This issue has been moved to Sprint Ready in a waiting state, and has been added to the deliverables associated with the proposal: GREI Harvard Dataverse Repository and Dataverse Software Metadata Support
  • The issue has also been reassigned to @jggautier
  • @jggautier anticipates that coming to a resolution on this issue will require a great deal of coordination and discussion with Community stakeholders.

@cmbz cmbz assigned jggautier and unassigned sekmiller Oct 16, 2023
@scolapasta scolapasta removed this from the 6.1 milestone Oct 26, 2023
@pdurbin pdurbin added the Type: Suggestion an idea label Nov 16, 2023
@jggautier
Copy link
Contributor Author

jggautier commented Dec 4, 2023

Just an update:

  • I reached out to Steve McEachern to find time this week or next week to chat so we can better understand ADA's needs.
  • And I've started re-reviewing funding metadata from other Dataverse installations to see who else I should reach out to in order to understand their needs and how the changes we're discussing in this GitHub issue might affect them.

@jggautier
Copy link
Contributor Author

jggautier commented Dec 7, 2023

More updates:

  • I've started talking with managers from Borealis, one of the handful of installations with datasets whose depositors or curators have used the Contributor Name field more often than the Funding Information Name field for recording funder names.
  • Other known Dataverse installations that most often use the Contributor Name field or both fields are also managed by CGIAR centers. I've reached out to them, through a listserv I was asked to use, to learn more. They asked me to follow up next year too, since many of the managers will be busy with end of year work and may not be able to discuss how they've used these fields.
  • Regarding datasets in Harvard Dataverse, back in March of this year I spoke with managers of collections with datasets that have used the Contributor Name field more often or have used both fields. Those collections are also managed by CGIAR centers. I left in this GitHub issue a summary of what we learned.

@jggautier
Copy link
Contributor Author

jggautier commented Dec 16, 2023

More updates:

While the CGIAR and Borealis folks are discussing, I emailed Steve McEachern to share what I learned from a review of funder metadata in ADA's repository and wrote that I'd share in this GitHub issue.

To reiterate and expand on the great points Steve made in June:

  • ADA would like to make sure that funders are always considered contributors, so they've prioritized adding funder names to the Contributor Name field, using the "Funder" Contributor Type. Steve's been a big proponent of improving how contributors are described in dataset metadata, such as advocating for the use of CRediT, Feature Request/Idea: Incorporate CRediT vocabulary for author/contributor roles #8213.

  • ADA also more often adds funder names to the Contributor Name field, using the "Funder" Contributor Type, so that when they need to see all types of contributors, funders are more easily included. If funder names are only in the Grant Information Agency field, as I had first proposed in this GitHub issue, needing to include that field would complicate how ADA looks for all contributors.

When looking at ADA's and other installation's funder metadata, I also noticed and at least want to acknowledge these things:

  • Depositors and curators at ADA sometimes need to indicate that two or more funders are responsible for the same grant. To do this, users at ADA and users of other Dataverse installations have often entered the names of multiple funders in one Grant Information Agency field, separated by things like commas or "and", and included the single funding ID in the Grant Information Identifier field. For example, see the first and third lines in the "Grant Information" metadata in the dataset at http://dx.doi.org/10.26193/KTL5YE:

    Screenshot 2023-12-15 at 5 55 28 PM

    This can hurt searching and filtering by funder names within Dataverse and in other systems that rely on the metadata that Dataverse exports, which, for example, can wind up incorrectly telling those systems that "Commonwealth Government Agency, Safe Work Australia, Australian Research Council Discovery Grant" is the name of a single funder instead of three funders:

    Screenshot 2023-12-15 at 6 04 38 PM

    This is a common challenge with other Dataverse fields (such as with the keyword fields, Multiple values being entered into metadata fields that expect only one value (e.g. authors and keywords) #4035), and in online forms in general. For example, the folks at OpenICPSR had to add a note asking depositors not to enter multiple keywords into one field:

    Screenshot 2023-12-15 at 6 14 29 PM

    Though I think it's too soon to start proposing solutions, I just want to note that using Dataverse's external vocab support to help people more easily add funder names in a consistent way might help if it also lets people add multiple funder names to the same field to associate with a single funder ID, and if Dataverse knows this when it indexes the funder names for searching, filtering and creating metadata exports that other systems use.

  • Similar to that first point above, depositors and curators at ADA and other repositories sometimes need to indicate that the same funder is responsible for multiple grants. For example, see the metadata in the dataset at https://doi.org/10.7910/DVN/VO0UNV:

    Screenshot 2023-12-15 at 6 27 40 PM

    The way the current design expects users to do this, which results in more machine-readable metadata right now but more work on the user's part, is to create two or more instances of the compound field, one for each funder name, and add the same funder ID in each compound field's Grant Information Identifier field.

    For example, see the Grant Information metadata in the dataset at http://dx.doi.org/10.26193/RAR7TH:

    Screenshot 2023-12-15 at 6 25 16 PM

    This design pattern, where depositors are expected to create multiple instances of a compound field and enter the same information in one field and the different information in the other field, seems like an attempt to mirror how people organize this information in metadata standards and formats, and is a good example of why we should be wary of form designs created, sometimes "automatically," based more on decisions that are made to improve how systems exchange this information (e.g. in XML and JSON documents) and not necessarily how people think about the information.

    The GitHub issues at Metadata: allow multiples of the same child within a single compound/parent #377 and Feature Request/Idea: Nested compound fields #9200 are also related, more broadly describing this challenge of helping depositors more easily add metadata for these sorts of cases where the connections between the fields aren't really one-to-one but one-to-many, like one author with many affiliations.

  • Similar to funder names, users can add the names of data collectors in two fields. Steve mentioned this in June, too, and I noted it in the GitHub issue at Investigate simplifying metadata fields for "Data Collector" #6720.

@jggautier
Copy link
Contributor Author

jggautier commented Dec 18, 2023

This GitHub issue is in Dataverse SODHA's "Santa's watching" list, so I emailed those folks to learn about their interests in this issue. Our regular contacts for this installation, Benjamin Peuch and Youssef Ouahalou, are no longer working on this installation, so I emailed the installation's general email address as Youssef suggested.

As with Borealis and CGIAR, I'm waiting to hear back from them and I'll follow up on the first week of January after the winter break.

@jggautier
Copy link
Contributor Author

Just an update on progress so far. Most of the discussion happening in the GitHub issue at #10196 involves user goals we'll need to consider when we're thinking about a redesign of how Dataverse collects and distributes funding metadata.

I'm also trying to find time to chat with @stevenmce about what he wrote about ADA's needs and goals.
And I've been emailing with @amberleahey about Borealis's use of funding metadata fields and with folks from CGIAR.

@cmbz
Copy link

cmbz commented Jul 10, 2024

2024/07/10

@jggautier
Copy link
Contributor Author

jggautier commented Jul 11, 2024

Yes definitely! I'm going to close this issue, since the discovery research that the community was doing last winter effectively ended in April when the UX WG started planning and executing the design sprint (being tracked in IQSS/dataverse-pm#127 and GitHub issues listed in that issue).

We've been using what we've learned in this spike GitHub issue and in #10196 as we plan for how to evaluate the success of a redesign of the Citation metadata block and use of the external controlled vocabulary functionality and as we consider different design ideas for addressing the goals driving the work recorded in this GitHub issue - improving the experience of adding funding metadata and of making it easier to find datasets by funders, in part by resolving the issues caused by having two places on the dataset deposit form where people can record who funded the deposit and keeping in mind what we learned from @stevenmce about the value of thinking of a funder as a kind of contributor.

More broadly, I hope that the design sprint idea we're working on can help us more effectively research, like what was done for this spike issue, by timeboxing research and setting other expectations for how much resources we'll need from stakeholders who are vital to our shared understanding of goals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Harvesting NIH OTA: 1.5.1 collection: 5 | 1.5.1 | Standardize download metrics for the Harvard Dataverse repository... pm.GREI-d-1.5.1 NIH, yr1, aim5, task1: Standardize download metrics pm.GREI-d-1.5.2 NIH, yr1, aim5, task2: WG with other repositories to follow Make Data Count recommendations pm.GREI-d-2.5.1B year 2 continuation of 1.5.1 Size: 10 A percentage of a sprint. 7 hours. Status: Needs Input Applied to issues in need of input from someone currently unavailable Type: Suggestion an idea
Projects
Status: Done
Development

No branches or pull requests

9 participants