Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#25] Redirect links with configuration rules #250

Merged
merged 2 commits into from
Dec 30, 2022

Conversation

aeqz
Copy link
Contributor

@aeqz aeqz commented Dec 19, 2022

Description

Problem: We previously changed the default behaviour of Xrefcheck when following link redirects, but did not provide a way to configure it.

Solution: We are adding a new field in the configuration file to allow writing a list of redirect rules that will be applied to links that match them.

Related issue(s)

Relates #25

✅ Checklist for your Pull Request

Ideally a PR has all of the checkmarks set.

If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).

Related changes (conditional)

  • Tests

    • If I added new functionality, I added tests covering it.
    • If I fixed a bug, I added a regression test to prevent the bug from
      silently reappearing again.
  • Documentation

    • I checked whether I should update the docs and did so if necessary:
  • Public contracts

    • Any modifications of public contracts comply with the Evolution
      of Public Contracts
      policy.
    • I added an entry to the changelog if my changes are visible to the users
      and
    • provided a migration guide for breaking changes if possible

Stylistic guide (mandatory)

@aeqz
Copy link
Contributor Author

aeqz commented Dec 19, 2022

I have created this draft PR to discuss some things. This work is the continuation of #233 for allowing to define redirect rules in the configuration file.

-- HTTP code.
redirectRule :: Text -> Int -> RedirectConfig -> Maybe RedirectRuleOutcome
redirectRule link code rules =
rrOutcome <$> find (matchRule link code) rules
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #25, we were discussing about if the rule that applies to some link after receiving a redirect response should be the first one or the last one in the list. Here I have implemented it as the first one because it was easier, but I have been thinking about it and I have no preference.

Do you prefer any of the two options more than the other @YuriRomanowski ?

For example, having to write

externalRefsRedirects:
  - to: *
    outcome: valid
  - to: gitlab.com
    outcome: invalid

vs

externalRefsRedirects:
  - to: gitlab.com
    outcome: invalid
  - to: *
    outcome: valid

RedirectCycle _ ->
[int||
Cycle found in the following redirect chain:
TODO print chain
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about how to show a redirect chain of links. How about this?

Cycle found in the following redirect chain:
  -| http://github.com/some
  -> https://github.com/some
  -> https://github.com/other

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, cool. I like it.

@aeqz aeqz marked this pull request as ready for review December 21, 2022 15:27
README.md Outdated
* Permanent redirects (i.e. 301 and 308) are reported as errors.
* Temporary redirects (i.e. 302, 303 and 307) are assumed to be valid.
* The redirect rules can be specified with the `externalRefRedirects` within `networking`, which accepts an array of
rules like `{to: "^http://.*", on: temporary, outcome: follow }`. The rule applied is the first one that matches with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It remains to decide if the rule that applies is the first or the last that matches.

@@ -56,6 +56,16 @@ networking:
# a "429 Too Many Requests" response.
maxRetries: 3

# Rules to override the redirect behavior for external references that
# match.
externalRefRedirects:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the default behaviour from code to default rules.


isTemporaryRedirectCode :: Int -> Bool
isTemporaryRedirectCode = flip elem [302, 303, 307]
reqRes <- catch (liftIO (timeout maxTime $ reqLink $> RRDone)) $ \httpErr ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, instead of Maybe (), the result is Maybe ResponseResult, where Nothing means timeout and Just ResponseResult tells "Done" or "Follow this redirect".

@aeqz aeqz force-pushed the aeqz/#25-redirect-links-config-rules branch from c6ebfc3 to eaae9e9 Compare December 22, 2022 16:27
Copy link
Member

@Martoon-00 Martoon-00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's very decent work 👍

Most progress on the redirects is already behind.

I left numerous comments, and one more thing I wanted to suggest: please create a ticket for displaying redirects in the progress bar.

src/Xrefcheck/Verify.hs Show resolved Hide resolved
src/Xrefcheck/Config.hs Show resolved Hide resolved
src/Xrefcheck/Config.hs Outdated Show resolved Hide resolved
src/Xrefcheck/Verify.hs Outdated Show resolved Hide resolved
src/Xrefcheck/Verify.hs Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
src/Xrefcheck/Verify.hs Outdated Show resolved Hide resolved
@@ -54,8 +54,86 @@ data NetworkingConfig' f = NetworkingConfig
, ncMaxRetries :: Field f Int
-- ^ How many attempts to retry an external link after getting
-- a "429 Too Many Requests" response.
, ncExternalRefRedirects :: RedirectConfig
-- ^ Rules to override the redirect behavior for external references.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually make it optional and treat it as if the user said: I don't care about redirects, handle them somehow. And I would default this to [].

And later when we see a redirect that matched no rule, we would treat that as OK, since we have to be optimistic.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then we would be changing the default behavior that we previously implemented. Right? It is now provided with these two default rules:

- to: .*
  on: permanent
  outcome: invalid
- to: .*
  on: temporary
  outcome: valid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think we really want two things:

  1. By default, suggest to the user those defaults you implemented in the previous PR: temporary=ok, permanent=failure.
  2. Give the user a simple way to say "don't bother me with redirects".

And I think, making ncExternalRefRedirects optional and defaulting it to [] that would then be interpreted as "always ok" is the best way to handle the second point.

And IMO insisting on the old behaviour is not so important, different users may find it weird depending on the use cases they face in real life. It is easy to obtain our recommended defaults for config, and I think that's enough.

A made a little detour with my reasoning for this, but this is how I see it now. There are other options to approach all this, but to me [] default yet seems to be the winning one.

Copy link
Contributor

@YuriRomanowski YuriRomanowski Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use [] for the defaults, but add our previous behaviour as an example in the default config comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something. If we default ncExternalRefRedirects to [] (in the default config I guess) then, how do we suggest the previously implemented defaults? With a section in the README file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe we could write it as a comment in the default config, as our "recommendation" :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under "using [] as default" I understood - if ncExternalRefRedirects is missing in the config, parse it as if it was [] (it really required clarification). And our recommendation for this field will be in the default config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered that option. It sounds good 👍

@Martoon-00
Copy link
Member

Also, this PR does not fix #25 completely, there is at least 304 code that will need handling later. So I suggest changing PR description respectively.

src/Xrefcheck/Config.hs Outdated Show resolved Hide resolved
src/Xrefcheck/Verify.hs Show resolved Hide resolved
src/Xrefcheck/Config.hs Outdated Show resolved Hide resolved
src/Xrefcheck/Verify.hs Outdated Show resolved Hide resolved
src/Xrefcheck/Verify.hs Outdated Show resolved Hide resolved
src/Xrefcheck/Data/Redirect.hs Show resolved Hide resolved
tests/Test/Xrefcheck/ConfigSpec.hs Show resolved Hide resolved
src/Xrefcheck/Verify.hs Outdated Show resolved Hide resolved
@@ -54,8 +54,86 @@ data NetworkingConfig' f = NetworkingConfig
, ncMaxRetries :: Field f Int
-- ^ How many attempts to retry an external link after getting
-- a "429 Too Many Requests" response.
, ncExternalRefRedirects :: RedirectConfig
-- ^ Rules to override the redirect behavior for external references.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think we really want two things:

  1. By default, suggest to the user those defaults you implemented in the previous PR: temporary=ok, permanent=failure.
  2. Give the user a simple way to say "don't bother me with redirects".

And I think, making ncExternalRefRedirects optional and defaulting it to [] that would then be interpreted as "always ok" is the best way to handle the second point.

And IMO insisting on the old behaviour is not so important, different users may find it weird depending on the use cases they face in real life. It is easy to obtain our recommended defaults for config, and I think that's enough.

A made a little detour with my reasoning for this, but this is how I see it now. There are other options to approach all this, but to me [] default yet seems to be the winning one.

README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
src/Xrefcheck/Config/Default.hs Outdated Show resolved Hide resolved
src/Xrefcheck/Data/Redirect.hs Show resolved Hide resolved

config :: Int -> Config
config limit = localConfig
& cNetworkingL . ncExternalRefRedirectsL .~ [RedirectRule Nothing Nothing Nothing RROFollow]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this may be the time to add some smart constructor that would initialize all filtering fields to Nothing, and update those fields via record update or something. But it's fine to leave as-is too.

src/Xrefcheck/Verify.hs Outdated Show resolved Hide resolved
@@ -682,8 +689,11 @@ parseUri link = do
checkExternalResource :: [Text] -> Config -> Text -> IO (VerifyResult VerifyError)
checkExternalResource followed config@Config{..} link
| isIgnored = return mempty
| link `elem` followed = pure $
VerifyResult [RedirectChainCycle (link :| followed)]
| ncMaxRedirectFollows >= 0 && length followed > ncMaxRedirectFollows = pure $
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, makes sense.

That's a bit of C-style - to use a negative number to indicate "no restrictions" 😄

But okay, let's go with it.

src/Xrefcheck/Verify.hs Outdated Show resolved Hide resolved
# - 'on' accepts 'temporary', 'permanent' or a specific redirect HTTP code.
# Its absence also means that every response code matches.
# - 'outcome' accepts 'valid', 'invalid' or 'follow'. The last one follows
# the redirect by applying the same configuration rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what is meant by "applying the same configuration rules"? That, when a rule with follow matches, we process the next link with same configured list of rules? But we apply them always, why should we mention it separately?

Copy link
Contributor Author

@aeqz aeqz Dec 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, redirects were automatically followed by the HTTP client, so none of our behaviours were applied to each step, but to the entire chain. For example:

  • Timeout was applied to the entire chain.
  • Retry logic was applied to the entire chain.
  • Ignore check was only applied to the initial link.

This new feature allows to explicitly follow redirects, and we have decided to apply the same configuration rules at each step of the chain. But we are still applying the retry logic to the entire chain, and we plan to review it in other issue (#260).

I mean, maybe it is something that appears to be obvious, but it has not been as straightforward and that is why the clarification. I can remove it if it sounds confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It may sound a bit confusing (due to how obvious it seems), but the behaviour should be clarified.

Probably if we add something like:

so for instance, exclusion rules would apply to the following links

it would sound more benign, more like a hint about a possible trick rather than an obvious statement, and would still make the exact behaviour clear.

@YuriRomanowski
Copy link
Contributor

Thank you @aeqz, that's cool. Suggested just minor style and naming changes.

README.md Outdated Show resolved Hide resolved
src/Xrefcheck/Config/Default.hs Show resolved Hide resolved
README.md Show resolved Hide resolved
@aeqz aeqz requested a review from Martoon-00 December 29, 2022 09:38
@aeqz aeqz requested a review from YuriRomanowski December 29, 2022 09:38
src/Xrefcheck/Config.hs Outdated Show resolved Hide resolved
@aeqz aeqz force-pushed the aeqz/#25-redirect-links-config-rules branch 2 times, most recently from ca565fc to a7caf56 Compare December 30, 2022 10:24
@aeqz
Copy link
Contributor Author

aeqz commented Dec 30, 2022

I have squashed fixup and temporary commits for the ease of rebasing this PR branch.

@@ -118,7 +128,8 @@ overrideConfig config

defScanners = cScanners $ defConfig flavor
defExclusions = cExclusions $ defConfig flavor
defNetworking = cNetworking $ defConfig flavor
defNetworking = cNetworking (defConfig flavor)
& ncExternalRefRedirectsL .~ []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

# This option is similar to `maxRetries`, the difference is that
# this `maxTimeoutRetries` option limits only the number of retries
# caused by timeouts, and `maxRetries` limits the number of retries
# caused both by 429s and timeouts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@Martoon-00 Martoon-00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your work 👍

The last comment I have is that the second commit has some changes that seem to be unrelated to it, this may be worth a cleanup. But other than that, good, feel free to merge after fixing that.

aeqz added 2 commits December 30, 2022 17:11
Problem: We previously changed the default behaviour of Xrefcheck when
following link redirects, but did not provide a way to configure it.

Solution: We are adding a new field in the configuration file to allow
writing a list of redirect rules that will be applied to links that
match them.
@aeqz aeqz force-pushed the aeqz/#25-redirect-links-config-rules branch from a7caf56 to 05fe537 Compare December 30, 2022 16:16
@aeqz aeqz merged commit 20f3d05 into master Dec 30, 2022
@aeqz aeqz deleted the aeqz/#25-redirect-links-config-rules branch December 30, 2022 16:37
@int-index int-index mentioned this pull request Dec 27, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants