-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core(jsonld): add structured data validation #6750
Merged
Merged
Changes from 31 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
5f02a63
core(jsonld): add structured data validation
patrickhulce 2883d90
yarn fixes
patrickhulce 4d27cd0
condense generated files to one line
patrickhulce e9daf3d
stash error
patrickhulce 2371450
eslint ignores
patrickhulce 179796a
update jsonld
patrickhulce a21b921
cleanup
patrickhulce 6ac3092
Merge branch 'master' into structued_data_pkg
patrickhulce e8714ce
update jsonld to non-native
patrickhulce 12ff45f
add extensions to require statements
patrickhulce 6a981d9
keep ts happy
patrickhulce 21ec416
eslint
patrickhulce 75b7a51
update jsonlint-mod dep
patrickhulce eca612a
feedback pt 1
patrickhulce e345a5e
feedback pt 2
patrickhulce ca2443b
feedback pt3
patrickhulce 5ae8632
feedback pt 4
patrickhulce a6d2aaa
prettify
patrickhulce 965541d
add script for updating jsonldcontext
patrickhulce dddba0f
Update sd-validation/jsonld.js
davidlehn 1fc3d13
more feedback
patrickhulce e4ba275
more types
patrickhulce d29678b
move tests
patrickhulce 9b7abd3
comment
patrickhulce 1bd1f9e
Merge branch 'master' into structued_data_pkg
patrickhulce a4200aa
Merge branch 'master' into structued_data_pkg
patrickhulce fedd5cb
revert viewerg
patrickhulce 064c104
Merge branch 'master' into structued_data_pkg
patrickhulce 4ebdd17
move folder to lib
patrickhulce df921da
feedback
patrickhulce ce535cf
forEach -> map
patrickhulce 002a663
more feedback
patrickhulce 70028c9
last feedback!
patrickhulce File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
6,995 changes: 6,995 additions & 0 deletions
6,995
lighthouse-core/lib/sd-validation/assets/jsonldcontext.json
Large diffs are not rendered by default.
Oops, something went wrong.
9,900 changes: 9,900 additions & 0 deletions
9,900
lighthouse-core/lib/sd-validation/assets/schema-tree.json
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
/** | ||
* @license Copyright 2018 Google Inc. All Rights Reserved. | ||
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
*/ | ||
'use strict'; | ||
|
||
/** | ||
* Recursively (DFS) traverses an object and calls provided function for each field. | ||
* | ||
* @param {*} obj | ||
* @param {function(string, any, Array<string>, any): void} callback | ||
* @param {Array<string>} path | ||
*/ | ||
module.exports = function walkObject(obj, callback, path = []) { | ||
if (obj === null) { | ||
return; | ||
} | ||
|
||
Object.entries(obj).forEach(([fieldName, fieldValue]) => { | ||
const newPath = Array.from(path); | ||
newPath.push(fieldName); | ||
|
||
callback(fieldName, fieldValue, newPath, obj); | ||
|
||
if (typeof fieldValue === 'object') { | ||
walkObject(fieldValue, callback, newPath); | ||
} | ||
}); | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
/** | ||
* @license Copyright 2018 Google Inc. All Rights Reserved. | ||
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
*/ | ||
'use strict'; | ||
|
||
const parseJSON = require('./json-linter.js'); | ||
const validateJsonLD = require('./jsonld-keyword-validator.js'); | ||
const expandAsync = require('./json-expander.js'); | ||
const validateSchemaOrg = require('./schema-validator.js'); | ||
|
||
/** @typedef {'json'|'json-ld'|'json-ld-expand'|'schema-org'} ValidatorType */ | ||
|
||
/** | ||
* Validates JSON-LD input. Returns array of error objects. | ||
* | ||
* @param {string} textInput | ||
* @returns {Promise<Array<{path: ?string, validator: ValidatorType, message: string}>>} | ||
*/ | ||
module.exports = async function validate(textInput) { | ||
// STEP 1: VALIDATE JSON | ||
const parseError = parseJSON(textInput); | ||
|
||
if (parseError) { | ||
return [{ | ||
validator: 'json', | ||
path: parseError.lineNumber, | ||
message: parseError.message, | ||
}]; | ||
} | ||
|
||
const inputObject = JSON.parse(textInput); | ||
|
||
// STEP 2: VALIDATE JSONLD | ||
const jsonLdErrors = validateJsonLD(inputObject); | ||
|
||
if (jsonLdErrors.length) { | ||
return jsonLdErrors.map(error => { | ||
return { | ||
validator: /** @type {ValidatorType} */ ('json-ld'), | ||
path: error.path, | ||
message: error.message, | ||
}; | ||
}); | ||
} | ||
|
||
// STEP 3: EXPAND | ||
/** @type {LH.StructuredData.ExpandedSchemaRepresentation|null} */ | ||
let expandedObj = null; | ||
try { | ||
expandedObj = await expandAsync(inputObject); | ||
} catch (error) { | ||
return [{ | ||
validator: 'json-ld-expand', | ||
path: null, | ||
message: error.message, | ||
}]; | ||
} | ||
|
||
// STEP 4: VALIDATE SCHEMA | ||
const schemaOrgErrors = validateSchemaOrg(expandedObj); | ||
|
||
if (schemaOrgErrors.length) { | ||
return schemaOrgErrors.map(error => { | ||
return { | ||
validator: /** @type {ValidatorType} */ ('schema-org'), | ||
path: error.path, | ||
message: error.message, | ||
}; | ||
}); | ||
} | ||
|
||
return []; | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
/** | ||
* @license Copyright 2018 Google Inc. All Rights Reserved. | ||
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
*/ | ||
'use strict'; | ||
|
||
const {URL} = require('../url-shim.js'); | ||
const jsonld = require('jsonld'); | ||
const schemaOrgContext = require('./assets/jsonldcontext.json'); | ||
const SCHEMA_ORG_HOST = 'schema.org'; | ||
|
||
/** | ||
* Custom loader that prevents network calls and allows us to return local version of the | ||
* schema.org document | ||
* @param {string} schemaUrl | ||
* @param {(err: null|Error, value?: any) => void} callback | ||
*/ | ||
function documentLoader(schemaUrl, callback) { | ||
let urlObj = null; | ||
|
||
try { | ||
// Give a dummy base URL so relative URLs will be considered valid. | ||
urlObj = new URL(schemaUrl, 'http://example.com'); | ||
} catch (e) { | ||
return callback(new Error('Error parsing URL: ' + schemaUrl), undefined); | ||
} | ||
|
||
if (urlObj.host === SCHEMA_ORG_HOST && urlObj.pathname === '/') { | ||
callback(null, { | ||
document: schemaOrgContext, | ||
}); | ||
} else { | ||
// We only process schema.org, for other schemas we return an empty object | ||
callback(null, { | ||
document: {}, | ||
}); | ||
} | ||
} | ||
|
||
/** | ||
* Takes JSON-LD object and normalizes it by following the expansion algorithm | ||
* (https://json-ld.org/spec/latest/json-ld-api/#expansion). | ||
* | ||
* @param {any} inputObject | ||
* @returns {Promise<LH.StructuredData.ExpandedSchemaRepresentation|null>} | ||
*/ | ||
module.exports = async function expand(inputObject) { | ||
try { | ||
return await jsonld.expand(inputObject, {documentLoader}); | ||
} catch (err) { | ||
// jsonld wraps real errors in a bunch of junk, so see we have an underlying error first | ||
if (err.details && err.details.cause) throw err.details.cause; | ||
throw err; | ||
} | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
/** | ||
* @license Copyright 2018 Google Inc. All Rights Reserved. | ||
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
*/ | ||
'use strict'; | ||
|
||
const jsonlint = require('jsonlint-mod'); | ||
|
||
/** | ||
* @param {string} input | ||
* @returns {{message: string, lineNumber: string|null}|null} | ||
*/ | ||
module.exports = function parseJSON(input) { | ||
try { | ||
jsonlint.parse(input); | ||
} catch (error) { | ||
/** @type {string|null} */ | ||
let line = error.at; | ||
|
||
// extract line number from message | ||
if (!line) { | ||
const regexLineResult = error.message.match(/Parse error on line (\d+)/); | ||
|
||
if (regexLineResult) { | ||
line = regexLineResult[1]; | ||
} | ||
} | ||
|
||
|
||
// jsonlint error message points to a specific character, but we just want the message. | ||
// Example: | ||
// ---------^ | ||
// Unexpected character { | ||
let message = /** @type {string} */ (error.message); | ||
const regexMessageResult = error.message.match(/-+\^\n(.+)$/); | ||
|
||
if (regexMessageResult) { | ||
message = regexMessageResult[1]; | ||
} | ||
|
||
return { | ||
message, | ||
lineNumber: line, | ||
}; | ||
} | ||
|
||
return null; | ||
}; |
50 changes: 50 additions & 0 deletions
50
lighthouse-core/lib/sd-validation/jsonld-keyword-validator.js
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
/** | ||
* @license Copyright 2018 Google Inc. All Rights Reserved. | ||
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
*/ | ||
'use strict'; | ||
|
||
const walkObject = require('./helpers/walk-object.js'); | ||
|
||
// This list comes from the JSON-LD 1.1 editors draft: | ||
// https://w3c.github.io/json-ld-syntax/#syntax-tokens-and-keywords | ||
const VALID_KEYWORDS = new Set([ | ||
'@base', | ||
'@container', | ||
'@context', | ||
'@graph', | ||
'@id', | ||
'@index', | ||
'@language', | ||
'@list', | ||
'@nest', | ||
'@none', | ||
'@prefix', | ||
'@reverse', | ||
'@set', | ||
'@type', | ||
'@value', | ||
'@version', | ||
'@vocab', | ||
]); | ||
|
||
/** | ||
* @param {*} json | ||
* @return {Array<{path: string, message: string}>} | ||
*/ | ||
module.exports = function validateJsonLD(json) { | ||
/** @type {Array<{path: string, message: string}>} */ | ||
const errors = []; | ||
|
||
walkObject(json, (name, value, path) => { | ||
if (name.startsWith('@') && !VALID_KEYWORDS.has(name)) { | ||
errors.push({ | ||
path: path.join('/'), | ||
message: 'Unknown keyword', | ||
}); | ||
} | ||
}); | ||
|
||
return errors; | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
/** | ||
* @license Copyright 2018 Google Inc. All Rights Reserved. | ||
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 | ||
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | ||
*/ | ||
'use strict'; | ||
|
||
const walkObject = require('./helpers/walk-object.js'); | ||
const schemaStructure = require('./assets/schema-tree.json'); | ||
const TYPE_KEYWORD = '@type'; | ||
const SCHEMA_ORG_URL_REGEX = /https?:\/\/schema\.org\//; | ||
|
||
/** | ||
* @param {string} uri | ||
* @returns {string} | ||
*/ | ||
function cleanName(uri) { | ||
return uri.replace(SCHEMA_ORG_URL_REGEX, ''); | ||
} | ||
|
||
/** | ||
* @param {string} type | ||
* @returns {Array<string>} | ||
*/ | ||
function getPropsForType(type) { | ||
const cleanType = cleanName(type); | ||
const props = schemaStructure.properties | ||
.filter(prop => prop.parent.includes(cleanType)) | ||
.map(prop => prop.name); | ||
const foundType = findType(type); | ||
if (!foundType) throw new Error(`Unable to get props for missing type "${type}"`); | ||
const parentTypes = foundType.parent; | ||
|
||
return parentTypes.reduce((allProps, type) => allProps.concat(getPropsForType(type)), props); | ||
} | ||
|
||
/** | ||
* @param {string} type | ||
* @returns {{name: string, parent: Array<string>}|undefined} | ||
*/ | ||
function findType(type) { | ||
const cleanType = cleanName(type); | ||
|
||
return schemaStructure.types.find(typeObj => typeObj.name === cleanType); | ||
} | ||
|
||
/** | ||
* Validates keys of given object based on its type(s). Returns an array of error messages. | ||
* | ||
* @param {string|Array<string>} typeOrTypes | ||
* @param {Array<string>} keys | ||
* @returns {Array<string>} | ||
*/ | ||
function validateObjectKeys(typeOrTypes, keys) { | ||
/** @type {Array<string>} */ | ||
let types = []; | ||
|
||
if (typeof typeOrTypes === 'string') { | ||
types = [typeOrTypes]; | ||
} else if (Array.isArray(typeOrTypes)) { | ||
types = typeOrTypes; | ||
patrickhulce marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} else { | ||
return ['Unknown value type']; | ||
} | ||
|
||
const unknownTypes = types.filter(t => !findType(t)); | ||
|
||
if (unknownTypes.length) { | ||
return unknownTypes | ||
.filter(type => SCHEMA_ORG_URL_REGEX.test(type)) | ||
.map(type => `Unrecognized schema.org type ${type}`); | ||
} | ||
|
||
/** @type {Set<string>} */ | ||
const allKnownProps = new Set(); | ||
|
||
types.forEach(type => { | ||
const knownProps = getPropsForType(type); | ||
|
||
knownProps.forEach(key => allKnownProps.add(key)); | ||
}); | ||
|
||
const cleanKeys = keys | ||
// Skip JSON-LD keywords (including invalid ones as they were already flagged in the json-ld validator) | ||
.filter(key => key.indexOf('@') !== 0) | ||
.map(key => cleanName(key)); | ||
|
||
return cleanKeys | ||
// remove Schema.org input/output constraints http://schema.org/docs/actions.html#part-4 | ||
.map(key => key.replace(/-(input|output)$/, '')) | ||
.filter(key => !allKnownProps.has(key)) | ||
.map(key => `Unexpected property "${key}"`); | ||
} | ||
|
||
/** | ||
* @param {LH.StructuredData.ExpandedSchemaRepresentation|null} expandedObj Valid JSON-LD object in expanded form | ||
* @return {Array<{path: string, message: string}>} | ||
*/ | ||
module.exports = function validateSchemaOrg(expandedObj) { | ||
/** @type {Array<{path: string, message: string}>} */ | ||
const errors = []; | ||
|
||
if (expandedObj === null) { | ||
return errors; | ||
} | ||
|
||
if (Array.isArray(expandedObj) && expandedObj.length === 1) { | ||
patrickhulce marked this conversation as resolved.
Show resolved
Hide resolved
|
||
expandedObj = expandedObj[0]; | ||
} | ||
|
||
walkObject(expandedObj, (name, value, path, obj) => { | ||
if (name === TYPE_KEYWORD) { | ||
const keyErrorMessages = validateObjectKeys(value, Object.keys(obj)); | ||
|
||
keyErrorMessages.forEach(message => | ||
errors.push({ | ||
// get rid of the first chunk (/@type) as it's the same for all errors | ||
path: | ||
'/' + | ||
path | ||
.slice(0, -1) | ||
.map(cleanName) | ||
.join('/'), | ||
message, | ||
}) | ||
); | ||
} | ||
}); | ||
|
||
return errors; | ||
}; |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename
sd-validator.js
(or whatever) so we don't add anotherindex.js
file :)