-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#25] Redirect links with configuration rules #250
Conversation
I have created this draft PR to discuss some things. This work is the continuation of #233 for allowing to define redirect rules in the configuration file. |
src/Xrefcheck/Config.hs
Outdated
-- HTTP code. | ||
redirectRule :: Text -> Int -> RedirectConfig -> Maybe RedirectRuleOutcome | ||
redirectRule link code rules = | ||
rrOutcome <$> find (matchRule link code) rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #25, we were discussing about if the rule that applies to some link after receiving a redirect response should be the first one or the last one in the list. Here I have implemented it as the first one because it was easier, but I have been thinking about it and I have no preference.
Do you prefer any of the two options more than the other @YuriRomanowski ?
For example, having to write
externalRefsRedirects:
- to: *
outcome: valid
- to: gitlab.com
outcome: invalid
vs
externalRefsRedirects:
- to: gitlab.com
outcome: invalid
- to: *
outcome: valid
src/Xrefcheck/Verify.hs
Outdated
RedirectCycle _ -> | ||
[int|| | ||
Cycle found in the following redirect chain: | ||
TODO print chain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about how to show a redirect chain of links. How about this?
Cycle found in the following redirect chain:
-| http://github.com/some
-> https://github.com/some
-> https://github.com/other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, cool. I like it.
README.md
Outdated
* Permanent redirects (i.e. 301 and 308) are reported as errors. | ||
* Temporary redirects (i.e. 302, 303 and 307) are assumed to be valid. | ||
* The redirect rules can be specified with the `externalRefRedirects` within `networking`, which accepts an array of | ||
rules like `{to: "^http://.*", on: temporary, outcome: follow }`. The rule applied is the first one that matches with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It remains to decide if the rule that applies is the first or the last that matches.
@@ -56,6 +56,16 @@ networking: | |||
# a "429 Too Many Requests" response. | |||
maxRetries: 3 | |||
|
|||
# Rules to override the redirect behavior for external references that | |||
# match. | |||
externalRefRedirects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the default behaviour from code to default rules.
|
||
isTemporaryRedirectCode :: Int -> Bool | ||
isTemporaryRedirectCode = flip elem [302, 303, 307] | ||
reqRes <- catch (liftIO (timeout maxTime $ reqLink $> RRDone)) $ \httpErr -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, instead of Maybe ()
, the result is Maybe ResponseResult
, where Nothing
means timeout and Just ResponseResult
tells "Done" or "Follow this redirect".
c6ebfc3
to
eaae9e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's very decent work 👍
Most progress on the redirects is already behind.
I left numerous comments, and one more thing I wanted to suggest: please create a ticket for displaying redirects in the progress bar.
@@ -54,8 +54,86 @@ data NetworkingConfig' f = NetworkingConfig | |||
, ncMaxRetries :: Field f Int | |||
-- ^ How many attempts to retry an external link after getting | |||
-- a "429 Too Many Requests" response. | |||
, ncExternalRefRedirects :: RedirectConfig | |||
-- ^ Rules to override the redirect behavior for external references. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually make it optional and treat it as if the user said: I don't care about redirects, handle them somehow. And I would default this to []
.
And later when we see a redirect that matched no rule, we would treat that as OK, since we have to be optimistic.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then we would be changing the default behavior that we previously implemented. Right? It is now provided with these two default rules:
- to: .*
on: permanent
outcome: invalid
- to: .*
on: temporary
outcome: valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think we really want two things:
- By default, suggest to the user those defaults you implemented in the previous PR: temporary=ok, permanent=failure.
- Give the user a simple way to say "don't bother me with redirects".
And I think, making ncExternalRefRedirects
optional and defaulting it to []
that would then be interpreted as "always ok" is the best way to handle the second point.
And IMO insisting on the old behaviour is not so important, different users may find it weird depending on the use cases they face in real life. It is easy to obtain our recommended defaults for config, and I think that's enough.
A made a little detour with my reasoning for this, but this is how I see it now. There are other options to approach all this, but to me []
default yet seems to be the winning one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use []
for the defaults, but add our previous behaviour as an example in the default config comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something. If we default ncExternalRefRedirects
to []
(in the default config I guess) then, how do we suggest the previously implemented defaults? With a section in the README file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe we could write it as a comment in the default config, as our "recommendation" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under "using []
as default" I understood - if ncExternalRefRedirects
is missing in the config, parse it as if it was []
(it really required clarification). And our recommendation for this field will be in the default config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't considered that option. It sounds good 👍
Also, this PR does not fix #25 completely, there is at least 304 code that will need handling later. So I suggest changing PR description respectively. |
@@ -54,8 +54,86 @@ data NetworkingConfig' f = NetworkingConfig | |||
, ncMaxRetries :: Field f Int | |||
-- ^ How many attempts to retry an external link after getting | |||
-- a "429 Too Many Requests" response. | |||
, ncExternalRefRedirects :: RedirectConfig | |||
-- ^ Rules to override the redirect behavior for external references. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think we really want two things:
- By default, suggest to the user those defaults you implemented in the previous PR: temporary=ok, permanent=failure.
- Give the user a simple way to say "don't bother me with redirects".
And I think, making ncExternalRefRedirects
optional and defaulting it to []
that would then be interpreted as "always ok" is the best way to handle the second point.
And IMO insisting on the old behaviour is not so important, different users may find it weird depending on the use cases they face in real life. It is easy to obtain our recommended defaults for config, and I think that's enough.
A made a little detour with my reasoning for this, but this is how I see it now. There are other options to approach all this, but to me []
default yet seems to be the winning one.
|
||
config :: Int -> Config | ||
config limit = localConfig | ||
& cNetworkingL . ncExternalRefRedirectsL .~ [RedirectRule Nothing Nothing Nothing RROFollow] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: this may be the time to add some smart constructor that would initialize all filtering fields to Nothing
, and update those fields via record update or something. But it's fine to leave as-is too.
src/Xrefcheck/Verify.hs
Outdated
@@ -682,8 +689,11 @@ parseUri link = do | |||
checkExternalResource :: [Text] -> Config -> Text -> IO (VerifyResult VerifyError) | |||
checkExternalResource followed config@Config{..} link | |||
| isIgnored = return mempty | |||
| link `elem` followed = pure $ | |||
VerifyResult [RedirectChainCycle (link :| followed)] | |||
| ncMaxRedirectFollows >= 0 && length followed > ncMaxRedirectFollows = pure $ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, makes sense.
That's a bit of C-style - to use a negative number to indicate "no restrictions" 😄
But okay, let's go with it.
src/Xrefcheck/Config/Default.hs
Outdated
# - 'on' accepts 'temporary', 'permanent' or a specific redirect HTTP code. | ||
# Its absence also means that every response code matches. | ||
# - 'outcome' accepts 'valid', 'invalid' or 'follow'. The last one follows | ||
# the redirect by applying the same configuration rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what is meant by "applying the same configuration rules"? That, when a rule with follow
matches, we process the next link with same configured list of rules? But we apply them always, why should we mention it separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, redirects were automatically followed by the HTTP client, so none of our behaviours were applied to each step, but to the entire chain. For example:
- Timeout was applied to the entire chain.
- Retry logic was applied to the entire chain.
- Ignore check was only applied to the initial link.
This new feature allows to explicitly follow redirects, and we have decided to apply the same configuration rules at each step of the chain. But we are still applying the retry logic to the entire chain, and we plan to review it in other issue (#260).
I mean, maybe it is something that appears to be obvious, but it has not been as straightforward and that is why the clarification. I can remove it if it sounds confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. It may sound a bit confusing (due to how obvious it seems), but the behaviour should be clarified.
Probably if we add something like:
so for instance, exclusion rules would apply to the following links
it would sound more benign, more like a hint about a possible trick rather than an obvious statement, and would still make the exact behaviour clear.
Thank you @aeqz, that's cool. Suggested just minor style and naming changes. |
ca565fc
to
a7caf56
Compare
I have squashed fixup and temporary commits for the ease of rebasing this PR branch. |
@@ -118,7 +128,8 @@ overrideConfig config | |||
|
|||
defScanners = cScanners $ defConfig flavor | |||
defExclusions = cExclusions $ defConfig flavor | |||
defNetworking = cNetworking $ defConfig flavor | |||
defNetworking = cNetworking (defConfig flavor) | |||
& ncExternalRefRedirectsL .~ [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
# This option is similar to `maxRetries`, the difference is that | ||
# this `maxTimeoutRetries` option limits only the number of retries | ||
# caused by timeouts, and `maxRetries` limits the number of retries | ||
# caused both by 429s and timeouts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your work 👍
The last comment I have is that the second commit has some changes that seem to be unrelated to it, this may be worth a cleanup. But other than that, good, feel free to merge after fixing that.
Problem: We previously changed the default behaviour of Xrefcheck when following link redirects, but did not provide a way to configure it. Solution: We are adding a new field in the configuration file to allow writing a list of redirect rules that will be applied to links that match them.
a7caf56
to
05fe537
Compare
Description
Problem: We previously changed the default behaviour of Xrefcheck when following link redirects, but did not provide a way to configure it.
Solution: We are adding a new field in the configuration file to allow writing a list of redirect rules that will be applied to links that match them.
Related issue(s)
Relates #25
✅ Checklist for your Pull Request
Ideally a PR has all of the checkmarks set.
If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).
Related changes (conditional)
Tests
silently reappearing again.
Documentation
Public contracts
of Public Contracts policy.
and
Stylistic guide (mandatory)