7844 codemeta schema #7877

poikilotherm · 2021-05-17T13:49:56Z

What this PR does / why we need it:
This is adding the CodeMeta Schema as a default out of the box schema for (new) installations.
This pull request is a first step. Please see the discussion points below for your review. We need to be careful about the scope of this first step to keep compatibility in mind. (There is no schema migration present in the Dataverse application, so when changing data types etc, we need to write SQL database migrations manually!)

TODOs

Test TSV, make screenshots
Add release note
Sort out remaining questions (see below)

Which issue(s) this PR closes:

Closes #7844

Special notes for your reviewer:

With the new feature of "metadata block facets" per collection should we use a different displayName for the block? (It currently is "Software Metadata (CodeMeta 2.0)")
Should we use the W3C proposed vocabulary for applicationCategory?
- Should we go ahead and add ResearchApplication to this list and reach out to schema.org and CodeMeta people to push for adding it to the list? (Maybe Google, too?)
- Should we go ahead and reach out to CodeMeta about a field on scientific method used in the software? (Not covered by subject field, which is very coarse anyway)
Should we make the *Requirements fields use integer values of byte? kilobyte? megabyte? (or similar for CPU) instead of arbitrary text values?
Do we want to add docs about the crosswalk of "Dataverse Metadata" to "CodeMeta" to the guides?
What other docs do we want to include?
Do we want to add https://github.com/SoftwareUnderstanding/software_types (which would extend this beyond pure CodeMeta)
Do we want to add a field to allow documenting computational methods in use?
- There is no standard, vocabulary, schema or ontology for this yet, we'd be on our own.
- This might as well be done via Citation Blocks Keywords
- We could leave this for a later extension of the block

Suggestions on how to test this:

Load the TSV via the usual API call.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

On dataset creation:

On dataset editing:

As JSON-LD export:

Is there a release notes update needed for this change?:

Yet to be done, as review/extension/discussion needed.

Additional documentation:

Block as TSV: https://docs.google.com/spreadsheets/d/1MsJifbLeRYCdFUXPAOh-KIphgTaxPsc6ICxtmEj_sb4/edit#gid=1781064623
Mapping CodeMeta to Dataverse software metadata fields: https://docs.google.com/spreadsheets/d/1zcOm1PX2_HTMacgc-sn8aSKTIQBtoaSAAKEPXmKVt3A/edit#gid=0

Tagging @doigl @atrisovic @4tikhonov @jggautier @djbrooke @pdurbin (I don't know the GH names of the other WG members)

coveralls · 2021-05-17T13:59:45Z

Coverage remained the same at 19.326% when pulling fcc36d0 on poikilotherm:7844-codemeta-schema into 2dbf9b7 on IQSS:develop.

jggautier · 2021-05-17T15:29:18Z

This is great! I'd like to point out two issues that I think are most pressing and I hope could be resolved before this is merged:

I think there's a stray tab in line 11, applicationCategory, that's splitting its displayName into two columns:
There are a few different ways that the Citation metadatablock's fields are still designed to describe data as opposed to software. It looks like we'll be tackling these issues in future work (such as how metadata is exported), but I hope some of the issues that users will see when depositing software can be resolved:
- The tooltips for most of the fields, even fields that make sense for describing software, such as Title, include the word "Dataset".
  
  One solution might be to generalize the text in the tooltips of the fields in the Citation metadatablock, for example by replacing the word "Dataset" with "deposit".
- Some of the fields wouldn't make sense for describing software at all, such as "Series", "Date of collection" and "Type of data". If someone is depositing software, I would think they wouldn't need to see these fields.
  
  To prevent depositors of software from seeing fields that they wouldn't need, one solution might be to recommend that repositories use a Dataverse collection only for software deposits, and when setting up that collection they should hide the fields in the Citation metadatablock that describe data (e.g. "Data of collection" and "Type of data") and enable the Software metadatablock for that Dataverse collection. So repositories, especially "self curated" ones, should not have users deposit a mix of datasets and software into the same Dataverse collection because there wouldn't be a way for the Dataverse software to know if what the user is depositing is data or software, so the Dataverse software has no way of showing the relevant metadata fields.

poikilotherm · 2021-05-17T17:05:01Z

I once had the idea to actually make the citation block pluggable, non-mandatory. I know this requires A LOT, but maybe it's a way to go, if we don't want other archictural changes like abstracting the concept of sets.

However, this seems beyond scope. Thanks for the pointer for the description issue, I'll fix that right away.

poikilotherm · 2021-05-18T11:03:50Z

@doigl has some data about the candidates for displayOnCreate from DaRuS:

I agree on all of those, except for applicationCategory, which has been used not with the vocabulary from W3C but free text. I still think we should not do that.

poikilotherm · 2021-05-18T11:28:17Z

There is a list of programming languages in WikiData, containing ~1500 entries. (Via https://en.wikiversity.org/wiki/Research_in_programming_Wikidata/Programming_languages)

There is an extensive list of operating systems (not names alone) in WikiData with ~1100 entries. (Via https://en.wikiversity.org/wiki/Research_in_programming_Wikidata/Operating_systems)

We might wanna play with the OS query to select only instances that are not a subclass of another OS and not "based on" to gain the top level ones only.

poikilotherm · 2021-05-20T10:14:06Z

I checked on the autocomplete/filtering support for controlled vocabulary fields in compound fields. Here's what I found:

For primitive fields using a CV we use a filter input in the dropdown in case of a "check multiple" metadata field, but not for single values. This has been done as part of UI proposal: as a Dataverse user, I want autocompletion for (long) controlled vocab metadata #6000 / PR Subjects disappear when clicked in metadata editing #6339 (I knew there was an old issue for this... 😄 )
This change has not been introduced for "check multiple" in compound fields. No idea why. Tagging @mheppler here.
The remaining issue of single value fields has never been addressed, but @TaniaSchlatter mentioned a few thoughts.

I guess adding the filter functionality to the "check multiple" fields in compound fields is an easy way forward. As this seems like a good discussion for Dataverse software decoupled from this issue about CodeMeta, I'm going to create that little issue now. ↪️#7888

After revisiting the schema, I see that the field operatingSystem is "allow multiple", but the (potential) CV field for the OS name would - of course - not be "check multiple". So we still need a solution for number 3 above, if we want this. ↪️#7889

mfenner · 2021-05-27T04:06:12Z

@poikilotherm as Codemeta is close to version 3.0 (https://blog.datacite.org/codemeta-we-need-your-feedback/), applicationCategory and scientific method are good topics to discuss now. Would the Dataverse community want them to become part of Codemeta?

And what is the timing for this pull request with regards to Codemeta 2.0 vs. Codemeta 3.0 (which is still a few months away)?

poikilotherm · 2021-05-27T11:51:27Z

@poikilotherm as Codemeta is close to version 3.0 (https://blog.datacite.org/codemeta-we-need-your-feedback/), applicationCategory and scientific method are good topics to discuss now. Would the Dataverse community want them to become part of Codemeta?

@mfenner I think there is a high demand for these fields not only within the boundaries of the Dataverse community. I know that @sdruskat is also looking into this matter for his PhD thesis.

Are you aware of any existing, reusable controlled vocabularies, preferably as RDF/SKOS/JSON-LD/sth. with a PID, we could reuse for a field like scientificMethod? Dataverse soonish will have support to use those kind of sources within the UI (#7712)

And what is the timing for this pull request with regards to Codemeta 2.0 vs. Codemeta 3.0 (which is still a few months away)?

I'm not so sure about this. Maybe it would be a good start to create a schema for 2.0 now and upgrade to 3.0 later on. It's a rather low hanging fruit. It might become necessary to introduce a migration method in Dataverse, but this seems like a good addition beyond the CodeMeta use case.

… watermark helptext IQSS#7844

- Add missing displayOrder values - Fix missing type for software requirements - Avoid splitting up compound fields too much, otherwise data is not exportable to schema.org or CodeMeta JSON-LD without special handling (IQSS#7856) - Tweak order - Tweak descriptions and examples - Fix whitespaces and line endings

pdurbin · 2022-07-21T19:11:19Z

@poikilotherm I couldn't get this tsv to load without making a few changes. I put them in a pull request for you to review and perhaps merge: poikilotherm#553

poikilotherm · 2022-07-21T19:15:51Z

Thanks @pdurbin!

Just today I picked up working on this again (not yet pushed).

There's lots of stuff to be moved around, which will also incorporate your changes😉

poikilotherm · 2022-12-13T17:26:25Z

@pdurbin @mreekie I just pushed the necessary changes to revert the addition to the schema. Also updated to latest develop. Dunno why the RTD CI fails, but seems unrelated.

poikilotherm · 2022-12-13T19:55:19Z

(We'll make a PR to back out the schema.xml change for computational workflow as well, for consistency.)

Chop chop here we go #9225

mreekie · 2022-12-14T21:17:31Z

added to sprint Dec 15, 2022

pdurbin

Just a quick review. I haven't loaded up the block.

pdurbin · 2022-12-16T19:03:02Z

src/main/java/propertyFiles/codeMeta20.properties

+datasetfieldtype.softwareHelp.title=Software Help/Documentation
+datasetfieldtype.softwareHelp.description=Link to help texts or documentation
+datasetfieldtype.softwareHelp.watermark=e.g. https://user.github.io/project/docs
+datasetfieldtype.readme.title=Readme


Isn't "README" little more standard? (Instead of "Readme".) If others agree, we should change the tsv as well.

I agree the filename should be sth with README. But do we want an all caps field name in the UI?

Yes, I was trying to suggest all caps README in the UI. That's what you have in the description ("Link to the README of the project") and the watermark ("e.g. https://github.com/user/project/blob/main/README.md"), both of which appear in the UI, so it should probably be consistent, right?

It's weird, Codemeta itself has "link to software Readme file" as a description at https://codemeta.github.io/terms/ but before codemeta/codemeta@0818c31 it was all caps README:

before: "A URL for the software README file"

after: "link to software Readme file"

I haven't committed and pushed this, but here's how more consistent all uppercase README would look (name, description, and watermark):

src/main/java/propertyFiles/codeMeta20.properties

pdurbin

I played around with this locally and it's looking good!

I'm sending it to QA but I'll make a few observations:

A lot of these fields would benefit from a picklist (programming languages, etc.) so I hope that we'll see a pull request to add some external controlled vocabularies.
For the tooltips, there is some inconsistency of final periods being present or absent.
It's weird that SVN is listed before Git, but that's because of CodeMeta and Schema.org.
I'm slightly weirded out by the inconsistency between Readme (title) and README (tooltip and watermark).
For some fields, it would be nice to have units (memory requirements, for example) but this is feedback to give upstream to CodeMeta and Schema.org, I imagine.
I find "Target Product" to be a bit odd. Again, this is feedback to send upstream. I think the idea is that if, for example, you're creating a plugin for WordPress, you can put WordPress as the target product.
There is a failing API test (FilesIT.test_008_ReplaceFileAlreadyDeleted) but I'm sure it has nothing to do with this metadata block, which isn't even loaded.
It seems like Oliver would like more feedback earlier in the process. He posted about this at https://groups.google.com/g/dataverse-community/c/heNotzADbaQ/m/DJItrFjFBAAJ but in practice, developers like me don't take a serious look until the work (a PR in this case) make it into a sprint. So maybe we could improve our process here.

jggautier · 2023-01-12T17:38:26Z

I was pinged a while back but thought I should reply now that I finally found the time to answer after the winter break.

We'd like to back out the schema.xml change.

(We'll make a PR to back out the schema.xml change for computational workflow as well, for consistency.)

Seems like @jggautier has given his blessing, especially since it's experimental.

I'm not sure what the schema.xml change was and how that's related to this being experimental. Is that what I gave my blessing to? Is the effect of the schema.xml change that this won't be a default metadatablock in future Dataverse installations? Does that mean that experimental, as it's been used for this and the workflow metadatablock, means that it'll be included in a release but the feature won't be turned on by default in Dataverse installations?

I agree about more feedback earlier in the process (and @poikilotherm has been using many opportunities over the years to encourage feedback), and I'd like to add that I think it's important to plan, as early in the process as possible, for evaluating solutions after they've been merged, too, even more so if we're so uncertain about a solution that we label it experimental.

pdurbin · 2023-01-13T20:05:16Z

@jggautier you probably missed the discussion but to sum up, only changes to non-experimental blocks should result in a change to schema.xml.

That is, schema.xml contains field for all the block that we ship. All these blocks are enabled by default and will "just work" because schema.xml has the fields already.

I hope this helps. This whole experimental blocks concept is quite new, of course!

jggautier · 2023-01-13T20:49:33Z

Ah thanks. That's how I understood it. Experimental metadatablocks shouldn't be enabled in installations by default when those installations use the version of the software that includes that experimental metadatablock. Those installations will need to take extra steps to enable it.

It's just not clear to me how a metadatablock becomes not experimental.

pdurbin · 2023-01-13T20:55:55Z

It hasn't happened yet! 😄 I hope we find out with CodeMeta!

poikilotherm added Feature: Metadata Working Group: SWC labels May 17, 2021

poikilotherm requested review from jggautier and 4tikhonov May 17, 2021 13:55

This was referenced May 20, 2021

Enable filtering / autocomplete for controlled vocabularies inside compound fields (check multiple use case) #7888

Closed

Enable filtering for long lists in CV fields (check one use case) #7889

Closed

poikilotherm mentioned this pull request Mar 29, 2022

Feature Request/Idea: Add new static facet to show the metadata blocks types that are populated. #8536

Closed

proycon mentioned this pull request Jun 22, 2022

Collaboration between CLARIAH Tool discovery and related initiatives CLARIAH/clariah-plus#128

Open

poikilotherm mentioned this pull request Jul 19, 2022

Added Computational Workflow metadata and related documentation #8812

Merged

poikilotherm added 6 commits July 21, 2022 13:38

feat(metadata): add metadata block for CodeMeta IQSS#7844

98f21ea

docs(metadata): add CodeMeta reference to user guide

f9f9cbd

feat(metadata): load CodeMeta by default in new installations.

ed485df

fix(metadata): fix wrong tab in CodeMeta and rephrase softwareVersion…

3c497a1

… watermark helptext IQSS#7844

fix(metadata): add standard name to Codemeta MDB displayName. IQSS#7844

492491e

feat(metadata): add i18n properties for CodeMeta IQSS#7844

1e8567d

poikilotherm force-pushed the 7844-codemeta-schema branch from fcc36d0 to 1e8567d Compare July 22, 2022 08:01

poikilotherm mentioned this pull request Jul 22, 2022

get codemeta.tsv to load (displayOrder, mostly) #7844 poikilotherm/dataverse#553

Closed

Merge branch 'develop' into 7844-codemeta-schema

114a25a

This was referenced Dec 13, 2022

Revert adding workflow metadata block field to Solr Schema #9224

Closed

9224 - revert workflow metadata block in Solr schema #9225

Merged

mreekie added the NIH OTA: 1.3.1 3 | 1.3.1 | Support software metadata | 5 prdOwnThis is an item synched from the product planning... label Dec 15, 2022

pdurbin reviewed Dec 19, 2022

View reviewed changes

pdurbin assigned pdurbin and poikilotherm Dec 20, 2022

poikilotherm added 2 commits December 21, 2022 08:05

Merge branch 'develop' into 7844-codemeta-schema

182abed

fix(metadata): remove typos from CodeMeta files IQSS#7844

f1a84b4

pdurbin approved these changes Dec 21, 2022

View reviewed changes

pdurbin unassigned pdurbin and poikilotherm Dec 21, 2022

kcondon merged commit ee019ab into IQSS:develop Dec 21, 2022

kcondon self-assigned this Dec 21, 2022

pdurbin added this to the 5.13 milestone Dec 21, 2022

mreekie mentioned this pull request Jan 4, 2023

Include CodeMeta schema out of the box #7844

Closed

poikilotherm deleted the 7844-codemeta-schema branch January 16, 2023 12:00

mreekie added pm.GREI-d-1.3.1 NIH, yr1, aim3, task1: Support software metadata pm.GREI-d-1.3.2 NIH, yr1, aim3, task2: R & D phase biomedical workflows support labels Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7844 codemeta schema #7877

7844 codemeta schema #7877

poikilotherm commented May 17, 2021 •

edited

Loading

coveralls commented May 17, 2021 •

edited

Loading

jggautier commented May 17, 2021 •

edited

Loading

poikilotherm commented May 17, 2021 •

edited

Loading

poikilotherm commented May 18, 2021

poikilotherm commented May 18, 2021 •

edited

Loading

poikilotherm commented May 20, 2021 •

edited

Loading

mfenner commented May 27, 2021

poikilotherm commented May 27, 2021 •

edited

Loading

pdurbin commented Jul 21, 2022

poikilotherm commented Jul 21, 2022

poikilotherm commented Dec 13, 2022

poikilotherm commented Dec 13, 2022

mreekie commented Dec 14, 2022

pdurbin left a comment

pdurbin Dec 16, 2022 •

edited

Loading

poikilotherm Dec 21, 2022

pdurbin Dec 21, 2022

pdurbin Dec 21, 2022

pdurbin left a comment

jggautier commented Jan 12, 2023 •

edited

Loading

pdurbin commented Jan 13, 2023

jggautier commented Jan 13, 2023

pdurbin commented Jan 13, 2023

7844 codemeta schema #7877

7844 codemeta schema #7877

Conversation

poikilotherm commented May 17, 2021 • edited Loading

coveralls commented May 17, 2021 • edited Loading

jggautier commented May 17, 2021 • edited Loading

poikilotherm commented May 17, 2021 • edited Loading

poikilotherm commented May 18, 2021

poikilotherm commented May 18, 2021 • edited Loading

poikilotherm commented May 20, 2021 • edited Loading

mfenner commented May 27, 2021

poikilotherm commented May 27, 2021 • edited Loading

pdurbin commented Jul 21, 2022

poikilotherm commented Jul 21, 2022

poikilotherm commented Dec 13, 2022

poikilotherm commented Dec 13, 2022

mreekie commented Dec 14, 2022

pdurbin left a comment

Choose a reason for hiding this comment

pdurbin Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

poikilotherm Dec 21, 2022

Choose a reason for hiding this comment

pdurbin Dec 21, 2022

Choose a reason for hiding this comment

pdurbin Dec 21, 2022

Choose a reason for hiding this comment

pdurbin left a comment

Choose a reason for hiding this comment

jggautier commented Jan 12, 2023 • edited Loading

pdurbin commented Jan 13, 2023

jggautier commented Jan 13, 2023

pdurbin commented Jan 13, 2023

poikilotherm commented May 17, 2021 •

edited

Loading

coveralls commented May 17, 2021 •

edited

Loading

jggautier commented May 17, 2021 •

edited

Loading

poikilotherm commented May 17, 2021 •

edited

Loading

poikilotherm commented May 18, 2021 •

edited

Loading

poikilotherm commented May 20, 2021 •

edited

Loading

poikilotherm commented May 27, 2021 •

edited

Loading

pdurbin Dec 16, 2022 •

edited

Loading

jggautier commented Jan 12, 2023 •

edited

Loading