-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema.org markup to Dataset pages #2243
Comments
Related to: #1393 |
I don't know whether having this, or using the meta tags as in #1393 would have a greater impact. In theory this isn't a big task and it is invisible to users in the browser (unless some browser plugin detects the markup and acts on it, of course). In terms of semantics: this is one vocabulary for expressing metadata. For the citation metadata (block), the fields map very well to either Schema.org or DC Terms/Elements. This mapping would be implicit business logic if you were to just go ahead and make changes to the UI without making the connection between metadata fields in Dataverse and ontology properties like the ones in DC Terms. The idea I'm trying to get at is similar to what I mentioned in comments on #947, but for the field names instead of field values. Let me create a new issue for this. I can't believe I didn't do so yet :) |
@bencomp I think both and tagging in the markup should be implemented. ICPSR and other data archives are already doing it the markup for the basic sets of fields . If i understand your comment the idea is to provide a mapping for custom fields added to a dataset? |
Just came across this article about SEO for libraries, including adding Schema.org to pages: https://journal.lib.uoguelph.ca/index.php/perj/article/view/3328/0 @borsna the ideas outlined in #2357 concern "custom" fields added to a installation of Dataverse - I'm actually not sure you can add fields to a single dataset only. These Dataverse-wide fields actually come from existing ontologies, like DDI and ISA-Tab, but the only way to read the definitions of the fields is to parse the text files in the source code. |
Interesting article, thanks for the link :) @bencomp okay, was not thinking about unique custom fields for a single dataset, rather configured fields for a dataverse installation or similar. |
This was posted two days ago: https://research.googleblog.com/2017/01/facilitating-discovery-of-public.html . Thanks for pointing it out, @eugene-barsky |
Related to @pdurbin 's previous comment: Google has recently published new guidelines for describing scientific datasets using Schema.org vocabulary: https://developers.google.com/search/docs/data-types/datasets These guidelines refer to Schema.org's list of markup properties related to datasets: http://schema.org/Dataset Schema.org represents a collaboration between all major search engine companies (Google, Microsoft, Yahoo, and Yandex) and has been developed to support each of these search engines. As such, marking up dataset pages using Google's recommended Schema.org metadata fields would likely improve each of these search engines' ability to display relevant results from Dataverse. It would also likely increase the pagerank of our dataset pages, meaning Dataverse datasets would appear more frequently and more visibly in search results. This would help our datasets be more discoverable to the public. Libraries and data repositories frequently make use of Schema.org markup for these reasons. Viewing the page source of a Mendeley Data dataset page provides a solid example of how Schema.org markup can be implemented by a data repository. It's also worth noting that Schema.org can incorporate Dublin Core (AKA DC) terms by using a "dc" prefix, though this does not conform to Google's recommendations for marking up datasets. |
This is also one of the 11 recommendations made in A Data Citation Roadmap for Scholarly Data Repositories (https://doi.org/10.1101/097196). I started mapping the Schema.org elements Google recommends to elements in Dataverse's citation metadata block. (Each tab on that spreadsheet is a different metadata block.) We can also expose file and variable level metadata with Schema.org. I'm thinking those mappings (when Schema.org terms exist for it) can be recorded in that spreadsheet's other tabs and possibly here, where ingested tabular file metadata is listed and mapped to other standards. I'm not sure how accurate this last spreadsheet is. |
Right now Google's recommended Schema.org properties don't include dataset persistent IDs, but persistent IDs should be embedded in dataset landing pages in json-ld as well. |
@jggautier (and others) you might be interested in @csarven saying, "Virtually nothing in particular is consuming granular citations in Linked Data." More at https://gitter.im/linkedresearch/chat?at=58f5f3acad849bcf42962e56 |
Interesting conversation. Thanks @pdurbin! |
Related: #3700 |
OK, checked in the code to address the items from the latest checklist; |
The only thing I had to add, to what's specified above - the "author" entry needs to have an additional "@type: Person" attribute for the whole thing to be valid. (I've updated the Googledoc to reflect this) |
I've checked in the last (I hope) change, that makes the ld json fragment appear in the LATEST published version ONLY. |
Thanks for checking @landreev. I spoke with Natasha about this issue, and we agreed it's okay to ignore the warning that Google's tool gives when @type is "Thing" (which it defaults to when there's no @type) and an affiliation is included. The less-preferred alternatives are (1) saying that every author is a person, which isn't true, and Dataverse has no way of knowing which author is a person and which is an organization (the other @type), or (2) not including an affiliation. |
OK, I reversed the type=person change. |
Conflicts (just imports: src/main/java/edu/harvard/iq/dataverse/DatasetPage.java
commit e19a346 Author: Ruben Andreassen <rubean85@gmail.com> Date: Mon Dec 4 12:20:54 2017 +0100 Forgot username commit 0d478a7 Merge: 45288aa 8aa4150 Author: Ruben Andreassen <rubean85@gmail.com> Date: Mon Dec 4 10:56:10 2017 +0100 Merge dataporten into 4334-oauth-dataporten commit 45288aa Merge: caf6371 4648b6a Author: Ruben <rubean85@gmail.com> Date: Fri Dec 1 14:45:44 2017 +0100 Merge pull request #1 from IQSS/develop test commit 4648b6a Merge: 0f36aa0 fff836c Author: kcondon <kcondon@hmdc.harvard.edu> Date: Thu Nov 30 18:44:35 2017 -0500 Merge pull request IQSS#4331 from IQSS/4330-no-affiliation add null check for datasetAuthor.getAffiliation() IQSS#4330 commit fff836c Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 30 16:39:26 2017 -0500 add null check for datasetAuthor.getAffiliation() IQSS#4330 commit 0f36aa0 Merge: e2878ce fad8669 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Thu Nov 30 15:07:54 2017 -0500 Merge pull request IQSS#4325 from IQSS/4324-header-padding Fixed padding layout issue with dataverse name text link in header IQSS#4324 commit fad8669 Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Thu Nov 30 10:14:53 2017 -0500 Fixed padding layout issue with dataverse name text link in header. [ref IQSS#4324] commit e2878ce Merge: d785c5c cb9647f Author: kcondon <kcondon@hmdc.harvard.edu> Date: Wed Nov 29 18:22:53 2017 -0500 Merge pull request IQSS#4305 from IQSS/4304-navbar-search use "?" (`&IQSS#63;`) rather than "&" (`&IQSS#38;`) before "q" IQSS#4304 commit d785c5c Merge: a881f36 3cc02d0 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Wed Nov 29 18:19:25 2017 -0500 Merge pull request IQSS#4302 from IQSS/3700-export-schema.org implement export of schema.org JSON-LD IQSS#3700 commit 3cc02d0 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 12:53:04 2017 -0500 have dataset page get cached JSON-LD, if available IQSS#3700 commit 84224bd Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 12:45:53 2017 -0500 guard against null terms.getTermsOfUse() IQSS#3700 commit ba9c6bd Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 12:28:16 2017 -0500 API: document "schema.org" as a supported export format IQSS#3700 commit e5c2528 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 12:11:17 2017 -0500 capitalize Schema.org in guides IQSS#3700 commit 086824d Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 29 10:57:32 2017 -0500 note that we know "affliation" throws a warning IQSS#3700 commit a881f36 Merge: b20ab14 23b865c Author: kcondon <kcondon@hmdc.harvard.edu> Date: Tue Nov 28 16:28:04 2017 -0500 Merge pull request IQSS#4312 from IQSS/4197-bundle-error Fixed bundle reference to "parent" dataverse for Theme + Widget pg IQSS#4197 commit 34859e7 Merge: 2f278cc b20ab14 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 28 16:24:56 2017 -0500 Merge branch 'develop' into 3700-export-schema.org IQSS#3700 commit 23b865c Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Tue Nov 28 14:42:12 2017 -0500 Fixed bundle reference to "parent" dataverse for Theme + Widget pg. [ref IQSS#4197] commit b20ab14 Merge: caf6371 8e6354a Author: kcondon <kcondon@hmdc.harvard.edu> Date: Tue Nov 28 14:01:39 2017 -0500 Merge pull request IQSS#4277 from IQSS/4197-dv-header 4197 dv header commit 8e6354a Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Tue Nov 28 13:23:15 2017 -0500 Changed references from "customization" to "theme" in Theme + Widgets pg. [ref IQSS#4197] commit c312a85 Author: Derek Murphy <dlmurphy@g.harvard.edu> Date: Tue Nov 28 13:05:39 2017 -0500 Doc rewrites [IQSS#4197] Rewrote some text on the config page for clarity, changed terminology usage in dataverse management page to make it more consistent commit f68b81d Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Tue Nov 28 12:15:40 2017 -0500 Removed commented out theme logic found in QA. [ref IQSS#4197] commit 624922f Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 28 11:09:26 2017 -0500 when adding row to dataversetheme, use white instead of gray IQSS#4197 commit cb9647f Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 27 10:27:30 2017 -0500 use "?" (&IQSS#63;) rather than "&" (&IQSS#38;) before "q" IQSS#4304 commit d8028f1 Merge: 36d9228 caf6371 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 27 09:33:03 2017 -0500 Merge branch 'develop' into 4197-dv-header IQSS#4197 commit 2f278cc Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 22 12:33:56 2017 -0500 cleanup IQSS#3700 commit b00d4d6 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 22 12:28:25 2017 -0500 capitalize "Schema.org" IQSS#3700 commit 8f52663 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 22 11:06:41 2017 -0500 implement export of schema.org JSON-LD IQSS#3700 commit caf6371 Merge: c67a39f d80b9d1 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Tue Nov 21 16:29:07 2017 -0500 Merge pull request IQSS#4297 from IQSS/orcid_v21 orcid v2.1 changes (mainly https for profile page link) commit c67a39f Merge: 0918fae a756751 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Mon Nov 20 15:48:37 2017 -0500 Merge pull request IQSS#4252 from IQSS/2243-schema.org-json-ld 2243 schema.org json ld commit d80b9d1 Author: Pete Meyer <pameyer@crystal.harvard.edu> Date: Mon Nov 20 14:32:09 2017 -0500 orcid v2.1 changes (mainly https for profile page link) commit 0918fae Merge: 3013c0d dcfcbaf Author: kcondon <kcondon@hmdc.harvard.edu> Date: Mon Nov 20 14:31:41 2017 -0500 Merge pull request IQSS#4276 from IQSS/4250-ingest-failed make it clear that file upload is complete IQSS#4250 commit 3013c0d Merge: b4cea62 3f0f7e8 Author: kcondon <kcondon@hmdc.harvard.edu> Date: Mon Nov 20 14:21:37 2017 -0500 Merge pull request IQSS#4275 from IQSS/4262-describe-method move `describe` from EjbDataverseEngine to Command interface IQSS#4262 commit 36d9228 Merge: d612189 b4cea62 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 17 16:38:34 2017 -0500 Merge branch 'develop' into 4197-dv-header IQSS#4197 commit dcfcbaf Merge: 268c3dc b4cea62 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 17 16:36:21 2017 -0500 Merge branch 'develop' into 4250-ingest-failed IQSS#4250 commit 3f0f7e8 Merge: 633a19d b4cea62 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 17 16:33:37 2017 -0500 Merge branch 'develop' into 4262-describe-method IQSS#4262 commit a756751 Merge: eec1163 b4cea62 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 17 16:32:43 2017 -0500 Merge branch 'develop' into 2243-schema.org-json-ld IQSS#2243 Conflicts (just imports: src/main/java/edu/harvard/iq/dataverse/DatasetPage.java commit eec1163 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Fri Nov 17 15:58:38 2017 -0500 Per conversation with jgautier stipped the '@type="person"' attribute in the author fragment; since it can be a person or an organization; this results in a warning from google validation tool (because "Thing" is not supposed to have an affiliation) but it appears to be ok to live with it. commit 0801d56 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Fri Nov 17 15:36:04 2017 -0500 ldjson should will only be embedded into the page if this is the LATEST PUBLISHED version (IQSS#2243) commit a2742c5 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Fri Nov 17 15:08:40 2017 -0500 latest changest to ld json formatting, making the fragment pass the google validation tool test. (IQSS#2243) commit d612189 Author: Derek Murphy <dlmurphy@g.harvard.edu> Date: Fri Nov 17 13:01:55 2017 -0500 Docs: extremely nitpicky word change [IQSS#4197] Changed a couple words in the config page. commit d277669 Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Thu Nov 16 16:21:29 2017 -0500 Added tip to Installation Guide > Configuration > Custom Header related to disable root theme. [ref IQSS#4197] commit 80219c5 Author: Derek Murphy <dlmurphy@g.harvard.edu> Date: Thu Nov 16 11:43:59 2017 -0500 Syntax + typo fix Small edit, fixed a typo and a syntax error in (ironically) a header in the docs commit e0399c1 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Wed Nov 15 19:50:54 2017 -0500 ...and a quick fix for the "temporalCoverage" entry (IQSS#2243) commit 67882ff Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Wed Nov 15 19:41:05 2017 -0500 the ld json fragment should now be structured as specified in the issue IQSS#2243. commit 8b8391f Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Wed Nov 15 13:24:22 2017 -0500 added topicClassifications and kewords to JSONLD. (IQSS#2243) commit 28f705c Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 15 12:58:11 2017 -0500 implement :DisableRootDataverseTheme db setting IQSS#4197 commit 268c3dc Author: Michael Heppler <mheppler@hmdc.harvard.edu> Date: Wed Nov 15 12:54:50 2017 -0500 Revised ingest error popover message text. Fixed icon spacing issue. [ref IQSS#4250] commit 7cd2fea Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 15 12:01:57 2017 -0500 Revert "stub out UI for disabling root dataverse theme IQSS#4197 " This reverts commit b9c3c56. We're going to use a database setting instead. commit b9c3c56 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 15 08:53:36 2017 -0500 stub out UI for disabling root dataverse theme IQSS#4197 commit 1f938e9 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 15 08:18:25 2017 -0500 Revert "only show header for non-root dataverses IQSS#4197 " This reverts commit 8eccacd. commit 633a19d Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 14 19:02:10 2017 -0500 affectedDvObjects is a better name for this field IQSS#4262 commit 9a3f4a3 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 14 17:10:06 2017 -0500 add the role to the message IQSS#4262 commit 7cfc8ba Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 14 10:09:18 2017 -0500 override `describe` in AssignRoleCommand IQSS#4262 commit 023cb8f Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 16:09:43 2017 -0500 remove parameters since the Command has them IQSS#4262 commit 8eccacd Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 15:52:37 2017 -0500 only show header for non-root dataverses IQSS#4197 commit 7795e70 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 15:22:08 2017 -0500 change header background from gray to white IQSS#4197 commit e434dd0 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 14:28:23 2017 -0500 make it clear that file upload is complete IQSS#4250 commit 26eb11d Author: Philip Durbin <philip_durbin@harvard.edu> Date: Mon Nov 13 14:18:57 2017 -0500 move `describe` from EjbDataverseEngine to Command interface IQSS#4262 commit 7d03e70 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Nov 7 16:21:37 2017 -0500 consistency between DC.subject and JSON-LD keywords IQSS#2243 commit 9f1d057 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Mon Nov 6 21:58:32 2017 -0500 one more addition for IQSS#2243 - added temporalCoverage. commit 8c74e37 Author: Leonid Andreev <leonid@hmdc.harvard.edu> Date: Mon Nov 6 21:28:06 2017 -0500 A few quick fixes for getJsonLd() (and the corresponding test in DatasetVersionTest()); (ref IQSS#2243) commit c941781 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 12:21:12 2017 -0400 explain why ui:insert lines are in the template IQSS#2243 commit 1aa323a Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 12:20:52 2017 -0400 remove unused imports used in this branch IQSS#2243 commit f8ca59f Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 12:13:05 2017 -0400 add tests for getJsonLd and getPublicationDateAsString IQSS#2243 commit b1db8ee Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 11:26:37 2017 -0400 rename to publicationDateAsString and improve javadoc IQSS#2243 commit 8f3083c Author: Philip Durbin <philip_durbin@harvard.edu> Date: Fri Nov 3 11:14:13 2017 -0400 delete cruft (unused method) IQSS#2243 commit 6c5f044 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:41:12 2017 -0400 use dateModified and proper schemaVersion URL IQSS#2243 commit 171c8f3 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:29:35 2017 -0400 move getJsonLd method to DatasetVersion entity IQSS#2243 commit 485a5ca Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:25:37 2017 -0400 don't even try to figure out if the author is a person or not IQSS#2243 commit 80b5a88 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:19:49 2017 -0400 limit to non-published, not just non-drafts IQSS#2243 Also add helper method. commit ad71c6a Author: Philip Durbin <philip_durbin@harvard.edu> Date: Thu Nov 2 15:17:32 2017 -0400 use same date format as meta name="DC.date" IQSS#2243 commit 2cc958d Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 1 13:30:15 2017 -0400 fix a number of issues (listed below) IQSS#3793 IQSS#2243 - only show published versions - show URL to DOI dynamically (was hard coded) - show publication date - show correct publisher - show correct provider commit 5ad88fc Author: Philip Durbin <philip_durbin@harvard.edu> Date: Wed Nov 1 13:15:00 2017 -0400 better author name parsing (could be an org!) IQSS#3793 IQSS#2243 commit 1b62596 Author: Philip Durbin <philip_durbin@harvard.edu> Date: Tue Oct 31 14:57:01 2017 -0400 stub out dataset in json-ld format IQSS#3793
Regarding how DataCite tries to determine whether an author is a person or an organization: we spent a lot of effort on this, and have gone through multiple iterations. The code is here: https://github.com/datacite/bolognese/blob/master/lib/bolognese/author_utils.rb#L72-L87. We assume the author is a person if
The above gives us a > 90% accuracy. The reason we need this is not so much that it is required for schema.org, but that we need this to do proper citation formatting and bibtex export. |
Also, while |
@mfenner thanks. This is helpful and interesting. |
Using a dictionary of given names worked really well for us. False negatives were mainly names from China and India, false positives the rare organization where the name starts with a given name, e.g. Because this is so painful, the DataCite Schema 4.1 released in September 2017 added an attribute to The simplest solution is obviously to use givenName and familyName from the start. |
Thanks, that |
This would make it easier for search engines to parse information about the title, author, timeperiod etc.
Relevant types to do markup for:
http://schema.org/Dataset and http://schema.org/DataCatalog
Validation and testing of markup can be done on this page:
https://developers.google.com/structured-data/testing-tool/
The markup can be done directly in the html template.
The text was updated successfully, but these errors were encountered: