Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core(jsonld): add structured data validation #6750

Merged
merged 33 commits into from
Apr 8, 2019
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
5f02a63
core(jsonld): add structured data validation
patrickhulce Dec 8, 2018
2883d90
yarn fixes
patrickhulce Dec 8, 2018
4d27cd0
condense generated files to one line
patrickhulce Dec 8, 2018
e9daf3d
stash error
patrickhulce Dec 8, 2018
2371450
eslint ignores
patrickhulce Dec 10, 2018
179796a
update jsonld
patrickhulce Dec 11, 2018
a21b921
cleanup
patrickhulce Dec 12, 2018
6ac3092
Merge branch 'master' into structued_data_pkg
patrickhulce Jan 23, 2019
e8714ce
update jsonld to non-native
patrickhulce Jan 24, 2019
12ff45f
add extensions to require statements
patrickhulce Jan 24, 2019
6a981d9
keep ts happy
patrickhulce Jan 24, 2019
21ec416
eslint
patrickhulce Jan 24, 2019
75b7a51
update jsonlint-mod dep
patrickhulce Feb 11, 2019
eca612a
feedback pt 1
patrickhulce Feb 21, 2019
e345a5e
feedback pt 2
patrickhulce Feb 21, 2019
ca2443b
feedback pt3
patrickhulce Feb 21, 2019
5ae8632
feedback pt 4
patrickhulce Feb 21, 2019
a6d2aaa
prettify
patrickhulce Feb 21, 2019
965541d
add script for updating jsonldcontext
patrickhulce Feb 21, 2019
dddba0f
Update sd-validation/jsonld.js
davidlehn Feb 21, 2019
1fc3d13
more feedback
patrickhulce Mar 6, 2019
e4ba275
more types
patrickhulce Mar 6, 2019
d29678b
move tests
patrickhulce Mar 6, 2019
9b7abd3
comment
patrickhulce Mar 6, 2019
1bd1f9e
Merge branch 'master' into structued_data_pkg
patrickhulce Mar 12, 2019
a4200aa
Merge branch 'master' into structued_data_pkg
patrickhulce Mar 18, 2019
fedd5cb
revert viewerg
patrickhulce Mar 21, 2019
064c104
Merge branch 'master' into structued_data_pkg
patrickhulce Mar 26, 2019
4ebdd17
move folder to lib
patrickhulce Mar 28, 2019
df921da
feedback
patrickhulce Mar 28, 2019
ce535cf
forEach -> map
patrickhulce Mar 28, 2019
002a663
more feedback
patrickhulce Apr 2, 2019
70028c9
last feedback!
patrickhulce Apr 8, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6,995 changes: 6,995 additions & 0 deletions lighthouse-core/lib/sd-validation/assets/jsonldcontext.json

Large diffs are not rendered by default.

9,900 changes: 9,900 additions & 0 deletions lighthouse-core/lib/sd-validation/assets/schema-tree.json

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions lighthouse-core/lib/sd-validation/helpers/walk-object.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/**
* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

/**
* Recursively (DFS) traverses an object and calls provided function for each field.
*
* @param {*} obj
* @param {function(string, any, Array<string>, any): void} callback
* @param {Array<string>} path
*/
module.exports = function walkObject(obj, callback, path = []) {
if (obj === null) {
return;
}

Object.entries(obj).forEach(([fieldName, fieldValue]) => {
const newPath = Array.from(path);
newPath.push(fieldName);

callback(fieldName, fieldValue, newPath, obj);

if (typeof fieldValue === 'object') {
walkObject(fieldValue, callback, newPath);
}
});
};
75 changes: 75 additions & 0 deletions lighthouse-core/lib/sd-validation/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename sd-validator.js(or whatever) so we don't add another index.js file :)

* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const parseJSON = require('./json-linter.js');
const validateJsonLD = require('./jsonld-keyword-validator.js');
const expandAsync = require('./json-expander.js');
const validateSchemaOrg = require('./schema-validator.js');

/** @typedef {'json'|'json-ld'|'json-ld-expand'|'schema-org'} ValidatorType */

/**
* Validates JSON-LD input. Returns array of error objects.
*
* @param {string} textInput
* @returns {Promise<Array<{path: ?string, validator: ValidatorType, message: string}>>}
*/
module.exports = async function validate(textInput) {
// STEP 1: VALIDATE JSON
const parseError = parseJSON(textInput);

if (parseError) {
return [{
validator: 'json',
path: parseError.lineNumber,
message: parseError.message,
}];
}

const inputObject = JSON.parse(textInput);

// STEP 2: VALIDATE JSONLD
const jsonLdErrors = validateJsonLD(inputObject);

if (jsonLdErrors.length) {
return jsonLdErrors.map(error => {
return {
validator: /** @type {ValidatorType} */ ('json-ld'),
path: error.path,
message: error.message,
};
});
}

// STEP 3: EXPAND
/** @type {LH.StructuredData.ExpandedSchemaRepresentation|null} */
let expandedObj = null;
try {
expandedObj = await expandAsync(inputObject);
} catch (error) {
return [{
validator: 'json-ld-expand',
path: null,
message: error.message,
}];
}

// STEP 4: VALIDATE SCHEMA
const schemaOrgErrors = validateSchemaOrg(expandedObj);

if (schemaOrgErrors.length) {
return schemaOrgErrors.map(error => {
return {
validator: /** @type {ValidatorType} */ ('schema-org'),
path: error.path,
message: error.message,
};
});
}

return [];
};
56 changes: 56 additions & 0 deletions lighthouse-core/lib/sd-validation/json-expander.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
/**
* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const {URL} = require('../url-shim.js');
const jsonld = require('jsonld');
const schemaOrgContext = require('./assets/jsonldcontext.json');
const SCHEMA_ORG_HOST = 'schema.org';

/**
* Custom loader that prevents network calls and allows us to return local version of the
* schema.org document
* @param {string} schemaUrl
* @param {(err: null|Error, value?: any) => void} callback
*/
function documentLoader(schemaUrl, callback) {
let urlObj = null;

try {
// Give a dummy base URL so relative URLs will be considered valid.
urlObj = new URL(schemaUrl, 'http://example.com');
} catch (e) {
return callback(new Error('Error parsing URL: ' + schemaUrl), undefined);
}

if (urlObj.host === SCHEMA_ORG_HOST && urlObj.pathname === '/') {
callback(null, {
document: schemaOrgContext,
});
} else {
// We only process schema.org, for other schemas we return an empty object
callback(null, {
document: {},
});
}
}

/**
* Takes JSON-LD object and normalizes it by following the expansion algorithm
* (https://json-ld.org/spec/latest/json-ld-api/#expansion).
*
* @param {any} inputObject
* @returns {Promise<LH.StructuredData.ExpandedSchemaRepresentation|null>}
*/
module.exports = async function expand(inputObject) {
try {
return await jsonld.expand(inputObject, {documentLoader});
} catch (err) {
// jsonld wraps real errors in a bunch of junk, so see we have an underlying error first
if (err.details && err.details.cause) throw err.details.cause;
throw err;
}
};
49 changes: 49 additions & 0 deletions lighthouse-core/lib/sd-validation/json-linter.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/**
* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const jsonlint = require('jsonlint-mod');

/**
* @param {string} input
* @returns {{message: string, lineNumber: string|null}|null}
*/
module.exports = function parseJSON(input) {
try {
jsonlint.parse(input);
} catch (error) {
/** @type {string|null} */
let line = error.at;

// extract line number from message
if (!line) {
const regexLineResult = error.message.match(/Parse error on line (\d+)/);

if (regexLineResult) {
line = regexLineResult[1];
}
}


// jsonlint error message points to a specific character, but we just want the message.
// Example:
// ---------^
// Unexpected character {
let message = /** @type {string} */ (error.message);
const regexMessageResult = error.message.match(/-+\^\n(.+)$/);

if (regexMessageResult) {
message = regexMessageResult[1];
}

return {
message,
lineNumber: line,
};
}

return null;
};
50 changes: 50 additions & 0 deletions lighthouse-core/lib/sd-validation/jsonld-keyword-validator.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/**
* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const walkObject = require('./helpers/walk-object.js');

// This list comes from the JSON-LD 1.1 editors draft:
// https://w3c.github.io/json-ld-syntax/#syntax-tokens-and-keywords
const VALID_KEYWORDS = new Set([
'@base',
'@container',
'@context',
'@graph',
'@id',
'@index',
'@language',
'@list',
'@nest',
'@none',
'@prefix',
'@reverse',
'@set',
'@type',
'@value',
'@version',
'@vocab',
]);

/**
* @param {*} json
* @return {Array<{path: string, message: string}>}
*/
module.exports = function validateJsonLD(json) {
/** @type {Array<{path: string, message: string}>} */
const errors = [];

walkObject(json, (name, value, path) => {
if (name.startsWith('@') && !VALID_KEYWORDS.has(name)) {
errors.push({
path: path.join('/'),
message: 'Unknown keyword',
});
}
});

return errors;
};
131 changes: 131 additions & 0 deletions lighthouse-core/lib/sd-validation/schema-validator.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
/**
* @license Copyright 2018 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const walkObject = require('./helpers/walk-object.js');
const schemaStructure = require('./assets/schema-tree.json');
const TYPE_KEYWORD = '@type';
const SCHEMA_ORG_URL_REGEX = /https?:\/\/schema\.org\//;

/**
* @param {string} uri
* @returns {string}
*/
function cleanName(uri) {
return uri.replace(SCHEMA_ORG_URL_REGEX, '');
}

/**
* @param {string} type
* @returns {Array<string>}
*/
function getPropsForType(type) {
const cleanType = cleanName(type);
const props = schemaStructure.properties
.filter(prop => prop.parent.includes(cleanType))
.map(prop => prop.name);
const foundType = findType(type);
if (!foundType) throw new Error(`Unable to get props for missing type "${type}"`);
const parentTypes = foundType.parent;

return parentTypes.reduce((allProps, type) => allProps.concat(getPropsForType(type)), props);
}

/**
* @param {string} type
* @returns {{name: string, parent: Array<string>}|undefined}
*/
function findType(type) {
const cleanType = cleanName(type);

return schemaStructure.types.find(typeObj => typeObj.name === cleanType);
}

/**
* Validates keys of given object based on its type(s). Returns an array of error messages.
*
* @param {string|Array<string>} typeOrTypes
* @param {Array<string>} keys
* @returns {Array<string>}
*/
function validateObjectKeys(typeOrTypes, keys) {
/** @type {Array<string>} */
let types = [];

if (typeof typeOrTypes === 'string') {
types = [typeOrTypes];
} else if (Array.isArray(typeOrTypes)) {
types = typeOrTypes;
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
} else {
return ['Unknown value type'];
}

const unknownTypes = types.filter(t => !findType(t));

if (unknownTypes.length) {
return unknownTypes
.filter(type => SCHEMA_ORG_URL_REGEX.test(type))
.map(type => `Unrecognized schema.org type ${type}`);
}

/** @type {Set<string>} */
const allKnownProps = new Set();

types.forEach(type => {
const knownProps = getPropsForType(type);

knownProps.forEach(key => allKnownProps.add(key));
});

const cleanKeys = keys
// Skip JSON-LD keywords (including invalid ones as they were already flagged in the json-ld validator)
.filter(key => key.indexOf('@') !== 0)
.map(key => cleanName(key));

return cleanKeys
// remove Schema.org input/output constraints http://schema.org/docs/actions.html#part-4
.map(key => key.replace(/-(input|output)$/, ''))
.filter(key => !allKnownProps.has(key))
.map(key => `Unexpected property "${key}"`);
}

/**
* @param {LH.StructuredData.ExpandedSchemaRepresentation|null} expandedObj Valid JSON-LD object in expanded form
* @return {Array<{path: string, message: string}>}
*/
module.exports = function validateSchemaOrg(expandedObj) {
/** @type {Array<{path: string, message: string}>} */
const errors = [];

if (expandedObj === null) {
return errors;
}

if (Array.isArray(expandedObj) && expandedObj.length === 1) {
patrickhulce marked this conversation as resolved.
Show resolved Hide resolved
expandedObj = expandedObj[0];
}

walkObject(expandedObj, (name, value, path, obj) => {
if (name === TYPE_KEYWORD) {
const keyErrorMessages = validateObjectKeys(value, Object.keys(obj));

keyErrorMessages.forEach(message =>
errors.push({
// get rid of the first chunk (/@type) as it's the same for all errors
path:
'/' +
path
.slice(0, -1)
.map(cleanName)
.join('/'),
message,
})
);
}
});

return errors;
};
Loading