Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linter for MDN URL structure #5273

Closed
wants to merge 2 commits into from
Closed

Linter for MDN URL structure #5273

wants to merge 2 commits into from

Conversation

bershanskiy
Copy link
Contributor

Summary

This is linter from #5201. Currently, it is very experimental because I would like to fix as many actual bugs before committing to a particular linter implementation.

Questions for bike-shedding:

  1. Should linter enforce capitalization?
    MDN uses case-insensitive URLs (serves the same content regardless of capitalization), but that would weaken the linter.
  2. ...

TODO list:

  • Linter is still incomplete (it's in the middle of rewrite from first iteration to the second iteration).
  • Need to find all special cases when an API links to a "Using [something]" instead of the dry "[parent].[child]" docs pages
  • Avoid multiple linter runs for the same data
  • Need to resolve all PRs in Linter for mdn_url #5201 before linter can be finalized.

CC @vinyldarkscratch @sideshowbarker

A checklist to help your pull request get merged faster:

  • Summarize your changes
  • Data: link to resources that verify support information (such as browser's docs, changelogs, source control, bug trackers, and tests)
  • Data: if you tested something, describe how you tested with details like browser and version
  • Review the results of the linter and fix problems reported (If you need help, please ask in a comment!)
  • Link to related issues or pull requests, if any

@ghost ghost added the linter 🏡 Issues or pull requests regarding the tests / linter of the JSON files. label Dec 5, 2019
@bershanskiy bershanskiy mentioned this pull request Dec 5, 2019
@bershanskiy
Copy link
Contributor Author

@vinyldarkscratch @sideshowbarker
FYI, this is in a very early stage (that's why I made PR only after @vinyldarkscratch asked for it) and there are very many false positives which I'm aware of. This is intentional because I wanted to fix all actual bugs first before making any test cases for data that is subject to change.

I don't think we should merge any linter until all URL issues are fixed, and I think #2537 won't be fixed this year.

@queengooborg queengooborg added the not ready ⛔ This is not yet ready to be merged. It's pending a decision, other PR, or a prerequisite action. label Dec 5, 2019
@queengooborg
Copy link
Collaborator

Thanks for creating this PR, and for taking initiative on this! This has already revealed so many issues along the way, primarily due to copy-pastes of existing data that weren't fully updated. To answer your question regarding capitalization enforcement, since MDN is case-insensitive I don't think it really matters much, but I'd say that there's no reason not to enforce it.

I'm wondering if this new test file should also go into test-links.js, since it's...well, testing links. However, before that, I'm hoping to get #5187 merged (to prepare for a little project I have to add automatic link fixing.)

Since we're still working on fixing up the URLs, and this linter is still experimental, I'm gonna go ahead and add the "not ready" label to let the other maintainers know. 😉

@bershanskiy
Copy link
Contributor Author

bershanskiy commented Dec 5, 2019

To answer your question regarding capitalization enforcement, ... there's no reason not to enforce it.

That's my thought process exactly.

I'm wondering if this new test file should also go into test-links.js, since it's...well, testing links.

I created a separate file because it does not share any logic with that file. test-links.js reads in the data as a sting and puts it through processLinks() function that finds long URL string in that string and suggests short URLs. As such, test-links.js does not need the concept of data hierarchy, so uses strings for efficiency. On the other hand, this test uses actual objects and heavily relies on data hierarchy. This test generates regular expressions for acceptable URLs for each level in the data tree and matches mdn_urls against those. Therefore, the overlap is rather minimal.

FYI, I do care about performance too and try to avoid unnecessary costly (complex) regular expressions. I'll share final benchmark data later when patch is more complete.

Edit: typo.

@sideshowbarker
Copy link
Collaborator

I wonder if this linter should have support for checking if each mdn_url is a 404 or not — and expose that support as a non-default option. (I’m not familiar in general with the existing linter code here, so I don’t know if any of the existing linting features have optional behaviors that can be enabled…)

If such 404-checking were added, I think it would be need to be as a non-default option, due to the fact it’s going to be making a network request for every single mdn_url value in every JSON file in the repo, and waiting for each response, and that’s gonna take a long time to complete…

@sideshowbarker
Copy link
Collaborator

  1. Should linter enforce capitalization?

Yes, I think it would be best if it did

MDN uses case-insensitive URLs (serves the same content regardless of capitalization), but that would weaken the linter.

So yeah then what the linter would be helping to enforce in this case is consistency in styling of the URLs (beyond just strictly checking for complete mismatches). But I think doing that level of style checking would be consistent with the purpose of the linter (and with linting in general).

Copy link
Collaborator

@sideshowbarker sideshowbarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this (initial code) and tested it and found it works as expected (great!). I have no changes to suggest at this point, but I will also happily further updates/refinements

@bershanskiy
Copy link
Contributor Author

bershanskiy commented Dec 6, 2019

I wonder if this linter should have support for checking if each mdn_url is a 404 or not

I agree that URL checker would be nice (not only 404 but any non-200 code, especially 301 and 302). However, I don't think we should add it in this PR because it'll already be pretty large (300+ lines of code).

I’m not familiar in general with the existing linter code here

As far as I know, #5228 introduces broken URL check for shortened URLs (aka URLs to bug reports most of which are in the notes to the compat data). I don't really know how it'll work out, so for now I just plan to wait it out.

If such 404-checking were added, I think it would be need to be as a non-default option, due to the fact it’s going to be making a network request for every single mdn_url value in every JSON file in the repo, and waiting for each response, and that’s gonna take a long time to complete…

It definitely won't make a lot of requests by default. However, I do want to lint all new URLs, so I want to add "incremental" linting -- lint only files that were changed in the PR and check only URLs that were changed in the PR. Specifically:

  1. Ask git for names of files that differ between the current PR source HEAD and mdn/master
  2. Lint only those files (with all linters)
  3. For each file that changed, extract URLs from the mdn/master version (which are presumed to be good) and the current HEAD version; lint only the new URLs, if there are any.

This approach reduces lint time from ~8 seconds to less than a second for most commits I work on. (I like this because I'm really impatient).

Of course, once in a while (e.g., before each release or once a day) someone/something should do a complete (non-incremental) lint to sanity-check.

Edit: tyop.

@bershanskiy
Copy link
Contributor Author

I reviewed this (initial code) and tested it and found it works as expected (great!). I have no changes to suggest at this point, but I will also happily further updates/refinements

I'm pushing all the most recent stuff to https://github.com/bershanskiy/browser-compat-data/tree/linter-mdn_url2 because I need to pull/merge/rebase from mdn/master all the time to incorporate the latest fixes. I'll push all the good stuff to this branch once it's in a presentable state.

@queengooborg
Copy link
Collaborator

queengooborg commented Dec 6, 2019

I wonder if this linter should have support for checking if each mdn_url is a 404 or not

I agree that URL checker would be nice (not only 404 but any non-200 code, especially 301 and 302). However, I don't think we should add it in this PR because it'll already be pretty large (300+ lines of code).

I’m not familiar in general with the existing linter code here

As far as I know, #5228 introduces broken URL check for shortened URLs (aka URLs to bug reports most of which are in the notes to the compat data). I don't really know how it'll work out, so for now I just plan to wait it out.

Don't worry, #5228 covers all of that. It checks every URL within BCD (mdn_url, spec, notes...) for a non-2xx code, and reports back. 😉

It definitely won't make a lot of requests by default. However, I do want to lint all new URLs, so I want to add "incremental" linting -- lint only files that were changed in the PR and check only URLs that were changed in the PR.

This certainly brings up an interesting way to optimize linter time! I'd say that this is something we should probably implement for not just this linter, but all of them. Let's get a discussion going for this -- can you open up an issue so we can get that going?

@queengooborg queengooborg changed the title Linter for mdn_url Linter for MDN URL structure Feb 15, 2020
/** @type {Identifier} */
// TODO(bershanskiy): remove require, if applicable
// See https://github.com/mdn/browser-compat-data/pull/5795
const data = require(filename);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for #5795.

@bershanskiy
Copy link
Contributor Author

I have updated the linter and rebased the PR to pull in changes from master. This PR is still work-in-progress and should wait for:

@ddbeck
Copy link
Collaborator

ddbeck commented Mar 9, 2020

Thanks for this summary @bershanskiy. That makes it a lot easier for me to prioritize reviews.

@bershanskiy bershanskiy mentioned this pull request Mar 9, 2020
5 tasks
@bershanskiy
Copy link
Contributor Author

Tracking list of URLs that might need to be updated:
(I'll update this list as I work on linter.)

LocalFileSystem and LocalFileSystemSync

These two pages contain documentation of their methods (as a section of the pages) instead of having dedicated pages for each method. I believe it's better to move each method to a separate page, following the pattern established by the rest of MDN. Thoughts?

Linter output for api\LocalFileSystem.json and api\LocalFileSystemSync.json:

× api\LocalFileSystem.json
              Actual: "https://developer.mozilla.org/docs/Web/API/LocalFileSystem#resolveLocalFileSystemURL"
              Expected: "https://developer.mozilla.org/docs/Web/API/LocalFileSystem/resolveLocalFileSystemURL"
              Actual: "https://developer.mozilla.org/docs/Web/API/LocalFileSystem#requestFileSystem"
              Expected: "https://developer.mozilla.org/docs/Web/API/LocalFileSystem/requestFileSystem"
× api\LocalFileSystemSync.json
              Actual: "https://developer.mozilla.org/docs/Web/API/LocalFileSystemSync#resolveLocalFileSystemSyncURL()"
              Expected: "https://developer.mozilla.org/docs/Web/API/LocalFileSystemSync/resolveLocalFileSystemSyncURL"
              Actual: "https://developer.mozilla.org/docs/Web/API/LocalFileSystemSync#requestFileSystem"
              Expected: "https://developer.mozilla.org/docs/Web/API/LocalFileSystemSync/requestFileSystemSync"

WebExtensions devtools.* APIs

All WebExtension DevTools APIs have patterns with slugs devtools.something instead of devtools/something.

E.g.: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/devtools.panels
Also, there is no dedicated page like https://developer.chrome.com/extensions/devtools, only https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/user_interface/devtools_panels

I propose:

  1. Create https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/devtools
  2. Make all applicable pages children of it, that us move all devtools.something to devtools/something.

@bershanskiy
Copy link
Contributor Author

This PR is stale, I'll open a new one once I update the patch.

@Elchi3 Elchi3 removed their request for review August 26, 2020 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linter 🏡 Issues or pull requests regarding the tests / linter of the JSON files. not ready ⛔ This is not yet ready to be merged. It's pending a decision, other PR, or a prerequisite action.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants