-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new_audit(seo): JSON-LD validation (not included in UI for now) #5446
Conversation
sd-validation/schema.js
Outdated
// @ts-ignore | ||
const schemaStructure = new Map(require('./assets/schema_google')); | ||
const TYPE_KEYWORD = '@type'; | ||
const SCHEMA_ORG_URL = 'http://schema.org/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both http and https versions could be used in defining schema types so later in cleanName we risk to have an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we just normalize all http -> https
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch @AymenLoukil !
Maybe just add this sentence to the description:
We can omit the link until we have a page in the LH docs. cc @kaycebasques @ekharvey
Sounds good. Also, see my note in #4359 about removing the SDTT scraping from the audit for now until we can get a more reliable solution. |
artifacts.JsonLD.map(async (jsonLD, idx) => { | ||
const errors = await validateJsonLD(jsonLD); | ||
|
||
errors.forEach(({message, path}) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an idea. wdyt about including all discovered top-level types in the results? I'm trying to debug why some of the errors are being flaky on http://www.lefigaro.fr/ and getting confirmation that it's actually finding the json+ld would be really nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 sounds like a very good idea, I'll add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will keep this in mind for the new result display logic.
sd-validation/expand.js
Outdated
|
||
jsonld.expand(inputObject, { | ||
// custom loader prevents network calls and alows us to return local version of the schema.org document | ||
documentLoader: ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract this fn to a fn declaration? will make this call into .expand a bit easier to parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
sd-validation/expand.js
Outdated
jsonld.expand(inputObject, { | ||
// custom loader prevents network calls and alows us to return local version of the schema.org document | ||
documentLoader: ( | ||
/** @type {string} **/url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this url is the url of a schema? can we call it schemaUrl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by Konrad.
sd-validation/jsonld.js
Outdated
|
||
const walkObject = require('./helpers/walkObject'); | ||
|
||
const CONTEXT = '@context'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why tho?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, good catch, I used it for an additional check that was here before but I ended up removing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by Konrad.
const walkObject = require('./helpers/walkObject'); | ||
|
||
const CONTEXT = '@context'; | ||
const KEYWORDS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comment for where this list is specified? aka how we maintain it
https://json-ld.org/spec/latest/json-ld/#syntax-tokens-and-keywords ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I'll leave a comment and yeah you got the right link 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by Konrad.
"publisher": "Cat Magazine" | ||
}`); | ||
|
||
assert.equal(errors.length, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the other one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed by Konrad, only one error now.
"image": "https://cats.rock/cat.bmp", | ||
"publisher": "Cat Magazine", | ||
"mainEntityOfPage": "https://cats.rock/magazine.html", | ||
"controversial": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VERY much so. 😡
🤣
assert.equal(errors[0].message, 'Unexpected property "controversial"'); | ||
}); | ||
|
||
it('passes if non-schema.org context', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like at the audit-level we'd consider this situation to be equivalent to zero json-ld on the page === not-applicable
. right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, only this one validator is not applicable, the whole audit still makes sense because we have json, json-ld and expansion validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true!
return errors; | ||
} | ||
|
||
// STEP 3: EXPAND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly does this expansion represent? i never really understood that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as presented on the meeting - it gives us normalized form of the json-ld
https://json-ld.org/spec/latest/json-ld-api/#expansion
https://json-ld.org/playground/
sd-validation/index.js
Outdated
expandedObj = await promiseExpand(inputObject); | ||
} catch (error) { | ||
errors.push({ | ||
validator: 'json-ld', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be the same validator? i'm fine with 'yes', just asking..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth to differentiate between those two, good call (note that this field is not used by LH - I added it because sd-validation aspires to be a separate package, and this info can be useful for someone else)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done already. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great job on this. this is a big area and it feels quite approachable due to how you structured the problem here. nice work!
|
||
const headings = [ | ||
{key: 'idx', itemType: 'text', text: 'Index'}, | ||
{key: 'path', itemType: 'text', text: 'Line/Path'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm wondering if we should also have a CSS selector to help point to where this was slurped up. i guess it'd be just script[type="application/ld+json" i]
with a :nth-child(N)
.. okay maybe we just communicate that in the description/docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, all of those json-ld's are scripts, and all of them are in <head>
, so it's hard to provide useful debugging info. IMO you idea with exposing top-level type will be great here (although we can't extract top-level type if json/json-ld/expansion fails :( )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- review comments addressed (thank you Paul, these were great!)
- validation of object properties recommended/required by SDTT removed (now we are using only schema.org data)
- audit description adjusted
sd-validation/jsonld.js
Outdated
|
||
const walkObject = require('./helpers/walkObject'); | ||
|
||
const CONTEXT = '@context'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, good catch, I used it for an additional check that was here before but I ended up removing
const walkObject = require('./helpers/walkObject'); | ||
|
||
const CONTEXT = '@context'; | ||
const KEYWORDS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I'll leave a comment and yeah you got the right link 👍
sd-validation/jsonld.js
Outdated
* @param {string} name | ||
* @returns {string | null} error | ||
*/ | ||
function validateField(name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I changed it to "validateKey"
sd-validation/schema.js
Outdated
// @ts-ignore | ||
const schemaStructure = new Map(require('./assets/schema_google')); | ||
const TYPE_KEYWORD = '@type'; | ||
const SCHEMA_ORG_URL = 'http://schema.org/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch @AymenLoukil !
|
||
const cleanKeys = keys | ||
// skip JSON-LD keywords | ||
.filter(key => key.indexOf('@') !== 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, but then all invalid keys starting with '@' would not be removed here, thus producing a Unexpected property "${key}"
error.
Since these keys were already caught by the previous validator (json-ld) and reported ('Unknown keyword'
) IMO there is no need to report them again.
assert.equal(errors[0].message, 'Unexpected property "controversial"'); | ||
}); | ||
|
||
it('passes if non-schema.org context', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, only this one validator is not applicable, the whole audit still makes sense because we have json, json-ld and expansion validation.
artifacts.JsonLD.map(async (jsonLD, idx) => { | ||
const errors = await validateJsonLD(jsonLD); | ||
|
||
errors.forEach(({message, path}) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 sounds like a very good idea, I'll add it
|
||
const headings = [ | ||
{key: 'idx', itemType: 'text', text: 'Index'}, | ||
{key: 'path', itemType: 'text', text: 'Line/Path'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, all of those json-ld's are scripts, and all of them are in <head>
, so it's hard to provide useful debugging info. IMO you idea with exposing top-level type will be great here (although we can't extract top-level type if json/json-ld/expansion fails :( )
return errors; | ||
} | ||
|
||
// STEP 3: EXPAND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as presented on the meeting - it gives us normalized form of the json-ld
https://json-ld.org/spec/latest/json-ld-api/#expansion
https://json-ld.org/playground/
sd-validation/index.js
Outdated
expandedObj = await promiseExpand(inputObject); | ||
} catch (error) { | ||
errors.push({ | ||
validator: 'json-ld', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be worth to differentiate between those two, good call (note that this field is not used by LH - I added it because sd-validation aspires to be a separate package, and this info can be useful for someone else)
*/ | ||
|
||
/** | ||
* This script can be used to generate schema-tree.json file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this live in a /scripts/ folder?
How is it used? (maybe something for the readme? though i can't tell..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I moved it and changed it to a nodejs script that fetches the schema.org jsonld and writes result to assets/
.
@@ -31,31 +31,6 @@ describe('schema.org validation', () => { | |||
assert.equal(errors[0].message, 'Unrecognized schema.org type http://schema.org/Dog'); | |||
}); | |||
|
|||
it('reports missing required fields', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't totally grok whats happening here swapping out the google schema to the other one but...
i'm kinda surprised we that 'required' fields isn't a thing anymore. I guess schema.org just doesn't have an opinion on whats required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah schema.org doesn't care, but Google requires some properties to enable rich snippets.
Will bring it up with @rviscomi if we want to add extra validation for this in the future.
sd-validation/expand.js
Outdated
@@ -11,6 +11,33 @@ const jsonld = require('jsonld'); | |||
const schemaOrgContext = require('./assets/jsonldcontext'); | |||
const SCHEMA_ORG_HOST = 'schema.org'; | |||
|
|||
/** | |||
* Custom loader that prevents network calls and alows us to return local version of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allows ;)
*/ | ||
'use strict'; | ||
|
||
// load data from https://github.com/schemaorg/schemaorg/blob/master/data/releases/3.4/schema.jsonld |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Via https://schema.org/docs/developers.html#defs it looks like this URL will redirect and work: https://schema.org/version/latest/schema.jsonld
This is a bit trickier than I thought because there are three different forms that the JSON-LD takes:
There are 4 types of failures:
Current status: Notes on the errors in the screenshot:
Instead of using a JSON parser I overwrite the property value at the given path with a random key, and then use the line number of the random key in the stringified JSON. |
Just throwing this out there, what do we think about splitting up this PR and trying to land some pieces of it like the standalone validation folder? I think Matt's got a decent hold on the usage of everything that it seems like most of the changes will likely be in core now, is that right @mattzeunert? |
@patrickhulce There'll still be a few changes in the validator to get it to provide line numbers. Other than that it's just adding new rendering logic. |
Ah, ok gotcha. Well maybe once the API solidifies for those we can try to break it up? |
Sure – is the main purpose to simplify the review? |
Yeah and ideally land pieces earlier so there's less that needs to keep being rebased, etc. I know most of it is generated files, but 13k LOC is a dousy :) Easier to tell if things are missing when each PR is focused. |
@patrickhulce @rviscomi Is there any reason not to merge the PR without the new result rendering logic? Especially since it's already 90% reviewed. Update: seems more like 98% reviewed 🙂. There are 4 small non-merge commits from me on the branch. I think we should get this merged, and possibly disable the audit for now if we don't want to it to get released. Update2: The plan is to disable the audit for now and merge this PR. |
14f4b03
to
910da3b
Compare
@@ -479,7 +479,7 @@ const defaultConfig = { | |||
{id: 'canonical', weight: 1, group: 'seo-content'}, | |||
{id: 'font-size', weight: 1, group: 'seo-mobile'}, | |||
{id: 'plugins', weight: 1, group: 'seo-content'}, | |||
{id: 'structured-data-automatic', weight: 1, group: 'seo-content'}, | |||
// {id: 'structured-data-automatic', weight: 1, group: 'seo-content'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there's a better way to disable the audit for now than what I'm doing in this commit?
@patrickhulce Removed the audit from the SEO category like you suggested. Much nicer solution than commenting out all the tests! I'm guessing AppVeyor is just flaky? |
Been dabbling around with the rendering logic a bit more. Can we remove the "Show all" button and just show all when the user clicks on the snippet? We can indicate that it's clickable when the user hovers over it – but maybe that's not discoverable enough? Should there be some kind of title for each JSON-LD item? Or at least a clearer separation. @rviscomi Do you have a strong concept of what we're going for? Should we ask a designer for help? Or just keep iterating ourselves? Different cases we need to handle:
|
It's inspired by the GitHub code diff UI, eg: @paulirish do you know if there's anything else we need to do to make sure this audit's results render properly on downstream services like web.dev?
Yeah I'm not sure if it'll be obvious that it's clickable.
Similar to the GitHub example, instead of a file name could we show the DOM address of the script block? When in devtools and clicked it should reveal it in the Elements panel.
Let's put any of these kinds of errors on line 1.
If there's 1 error, show that error. If there are 2 or more errors on the same line, show "X errors:" then an unordered list with each error.
+1. The type may not always be descriptive (or exist at all) so maybe just show the first ~10 lines with the "Show all" expander. |
Closing in favor of #4359 and upcoming PRs. We'll keep the branch around in case someone needs it <3 |
Summary
Fixes #4359
Depends on #5377(merged)Preview:
Results form running validation against the main pages of the top 1500 domains (that had JSON-LD): https://gist.github.com/kdzwinel/9c9e209e3b1bb239e01920f4aec10108
Questions:
Notes:
TODO:
if there are only warnings (e.g. only "recommended" fields are missing form the JSON-LD object) - don't fail the audit, mark it as a warning