Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve link checker #6797

Merged
merged 16 commits into from
Nov 8, 2021
Merged

improve link checker #6797

merged 16 commits into from
Nov 8, 2021

Conversation

jrhizor
Copy link
Contributor

@jrhizor jrhizor commented Oct 5, 2021

A few changes:

  • output will be less verbose
  • the link checker will run on changed files (compared to master) on prs
    • this should be blocking for merging connector prs, docs, etc.
  • Instead of being very permissive around 401/403 codes (which can hide actual errors), I switched to specifying specific exclusions for those.
    • All maven repo links were giving 403 even when valid so I added a domain exclusion.
    • Please let me know if you'd prefer all the specific exclusions to go into dock-link-check.json instead.

The only error I'm still seeing is an all-too-typical Gitbook transient error. This extension only allows retrying 429 errors, not this weird 0 error from Gitbook. We could try adding 0 as an allowed response code maybe, but that feels wrong when Gitbook is likely being actually inconsistent at serving pages.

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Oct 5, 2021
@tuliren
Copy link
Contributor

tuliren commented Oct 5, 2021

this should be blocking for merging connector prs, docs, etc.

This can be a problem for new connector. For example, the connector doc won't be available until the PR is merged. By activating the doc link check, people are forced to have two PRs to add a new connector. This was why the check was turned off on PR in the first place. GL engineers were affected by this the most.

@jrhizor
Copy link
Contributor Author

jrhizor commented Oct 5, 2021

@tuliren do you have an example of a type of link that would fail? A relative markdown link shouldn't, and this doesn't check things like docurls for definitions.

@jrhizor
Copy link
Contributor Author

jrhizor commented Oct 5, 2021

Just to be clear: my # 1 priority here is making the master build doesn't get broken when PRs are "passing" which is the current (and very common) case. Happy to change the checks or advocate for process changes if necessary.

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem with running it on PRs is that if we're pointing to a new doc introduced in that PR via its URL e.g: https:// then it's less obvious if I'm about to break master. This being said we can always check the reason for failure and make sure it's ok.

You should make connector teams aware of this change since they complained about the previous state where this ran on PRs

@tuliren
Copy link
Contributor

tuliren commented Oct 6, 2021

@tuliren do you have an example of a type of link that would fail? A relative markdown link shouldn't, and this doesn't check things like docurls for definitions.

For example, this README file in the destination connector template links to the connector documentation in gitbook, which requires the PR to be merged first:

https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connector-templates/destination-java/README.md.hbs#L4

There may be other similar cases.

@tuliren
Copy link
Contributor

tuliren commented Oct 6, 2021

Just to be clear: my # 1 priority here is making the master build doesn't get broken when PRs are "passing" which is the current (and very common) case. Happy to change the checks or advocate for process changes if necessary.

I definitely agree with you. Originally the check was set up for both master and PR. Had to take it out of PR because of complaints from the connector team.

@tuliren
Copy link
Contributor

tuliren commented Oct 6, 2021

You should make connector teams aware of this change since they complained about the previous state where this ran on PRs

I think we can add a reminder in the PR template saying that if the PR link check fails, make sure the failure can be resolved after the PR is merged.

@cgardens
Copy link
Contributor

cgardens commented Oct 6, 2021

We could try adding 0 as an allowed response code maybe, but that feels wrong when Gitbook is likely being actually inconsistent at serving pages.

FWIW, this might be worth trying. If when we receive an error code of 0 there is nothing actionable for the developer to do (which is sounds like there isn't)then I don't think we should block.

Copy link
Contributor

@tuliren tuliren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does 401 / 403 hide actual errors? To minimize false positive, I'd suggest keep 401 / 403 and also add 0 to the aliveStatusCodes list.

Can you add the following TODOs to the PR template?

- If there is a link checker failure, please make sure:
  - [ ] If the failure is caused by http status 401 or 403, please add `<!-- markdown-link-check-disable-next-line -->` before the failed broken link.
  - [ ] If the failure is caused by http status 0, it is usually because it takes a long time for the website to return a response. It is safe to ignore.
  - [ ] If the failure is caused by documentation that will be published by merging the current PR, it's safe to ignore.
  - [ ] For all the other failed links, please make sure they are fixed.

If we ignore 401 / 403 / 0, we can remove the first two bullets in the above TODO.

Please read [How to find your API key](https://support.iterable.com/hc/en-us/articles/360043464871-API-Keys-#creating-api-keys).
<!-- markdown-link-check-enable -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use <!-- markdown-link-check-disable-next-line --> by the way, which only requires one line of change.

@tuliren
Copy link
Contributor

tuliren commented Nov 3, 2021

We should probably merge this PR soon. There are tons of broken links because there is no link check in PR.

@jrhizor, do you want to present this in the next dev meeting?

@jrhizor jrhizor temporarily deployed to more-secrets November 8, 2021 06:49 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets November 8, 2021 06:51 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets November 8, 2021 06:53 Inactive
@jrhizor
Copy link
Contributor Author

jrhizor commented Nov 8, 2021

I made some minor modifications to only have these fixes run on master instead of adding it to branches, so this shouldn't impact any existing workflows.

Short term (with Gitbook), I just don't think it makes sense to handle it as a build failure. After merging this PR, I'd like to remove the build on master pushes and switch it to a time-based run that doesn't show up on the master Github view. Instead, it could push to a channel on Github that either someone on the community team or the OC would be responsible for fixing. Then our build won't look broken all of the time if a relatively non-urgent linking problem exists and we'll have a clear assignee for fixing.

Then long term (when we're off of Gitbook and onto a better platform like Docusaurus or similar that allows testing locally against any branch) we'll want to revisit. At that point, we can make sure that people working on a PR are responsible for getting their links working before merging.

@jrhizor jrhizor merged commit 154ecce into master Nov 8, 2021
@jrhizor jrhizor deleted the jrhizor/improve-link-checker branch November 8, 2021 07:01
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
* improve link checker

* use ref instead of base_ref

* remove base, always compare to master for modified

* add failing to test

* don't do quiet for testing

* switch error to 404 not 403

* yes to both

* turn off verbose mode

* fix

* actually check things

* fix outstanding link problems

* revet change to run for everything

* use new format

* ignore gitbook failures

* switch back to only running on master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants