-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As a researcher, I want more dataset metadata in schema.org exports so that my data is more discoverable #4371
Comments
After a discussion in today's Community Call about sending DataCite file-level metadata that includes the file checksum, @mercecrosas added in this Google Groups comment a table recommending ways to map to schema.org more dataset. The table includes more metadata fields that have been added. An open question, that might deserve its own github issue, is if Dataverse should produce schema.org metadata at the file level. |
Let's exclude File Download URL for now. It can follow on in a separate issue. |
Thanks to @jggautier I was able to track down some tools for validating schema.org JSON-LD. https://github.com/jessedc/ajv-cli can be used from the command line to validate the JSON against a schema ; after using https://github.com/scrapinghub/extruct to retrieve the JSON-LD from the generated html/xhtml. |
"I think Google Dataset Search is ignoring author and prefers creator"
from 41.16% to 42.94% for DatasetVersion
Use the installation brand name instead.
I just made the changes @jggautier @scolapasta @djbrooke @kcondon and I agreed to after standup:
|
So, aside from internal code restructuring, this pr: |
Issues/questions:
Discussed above with Julian and he will complete review. Will discuss with Julian and Phil to see what needs to be addressed. |
Another issue:
|
(contentUrl should appear only when the installation indicates that they want download URLs appearing in their schema.org exports.) |
I looked at the schema.org export and all four issues are resolved! @kcondon noticed that contentUrl isn't showing up in the schema.org export of a test dataset, although we expect it to. (It's the dataset titled "Test Schema Org Julian 5 Schema" on the "internal" test instance.) |
For the record, as discussed with @kcondon and @jggautier , the FileUtil.isPubliclyDownloadable logic is used to contentUrl wasn't being shown because the dataset had terms of use. It also checks for guestbooks. Both of these require a popup to agree to or fill out in the UI. |
In issue #2243, some metadata fields important for dataset discovery were excluded from mapping to Schema.org. We said we'd include them in a later issue. This is that issue, and these are those fields (dun dun):
@id
propertyurl
propertyWe'll also need to fix:
author
, but I think Google Dataset Search is ignoring author and preferscreator
. (See this comment on Improving Dataverse's Schema.org JSON-LD schema to enable author names display in Google Dataset Search's #5029)provider
property we hardcode "Dataverse" and put the installation name in theDataCatalog
name
property, but Dataset Search is displaying a "Data provided by" field and is using what's in theprovider
property.Which fields are added to the Schema.org metadata template (draft) and how they're mapped will probably be adjusted after community discussion (within Dataverse community and hopefully with a proposed RDA group focused on ways to make data more discoverable by search engines).
@scolapasta asked me to add to the definition of done that we should make sure that the methods used to pull metadata values from different fields into different exports (DDI, DC, DataCite, Schema.org, native JSON (?)) are consistent.
The text was updated successfully, but these errors were encountered: