Skip to content

Commit

Permalink
Merge pull request #250 from serokell/aeqz/#25-redirect-links-config-…
Browse files Browse the repository at this point in the history
…rules

[#25] Redirect links with configuration rules
  • Loading branch information
aeqz authored Dec 30, 2022
2 parents a4dc29b + 05fe537 commit 20f3d05
Show file tree
Hide file tree
Showing 29 changed files with 979 additions and 111 deletions.
28 changes: 16 additions & 12 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ Unreleased

* [#176](https://github.com/serokell/xrefcheck/pull/176)
+ Enabled `autolink` extension for `cmark-gfm`, so now we're finding strings
like `www.google.com` or `https://google.com`, treating them as links
and checking.
like `www.google.com` or `https://google.com`, treating them as links
and checking.
* [#175](https://github.com/serokell/xrefcheck/pull/175)
+ Reorganize top-level config keys.
* [#178](https://github.com/serokell/xrefcheck/pull/178)
Expand All @@ -19,13 +19,13 @@ Unreleased
+ Add support for image links.
* [#199](https://github.com/serokell/xrefcheck/pull/199)
+ Now annotation
`<!-- xrefcheck: ignore all -->` instead of `<!-- xrefcheck: ignore file -->`
should be used to disable checking for links in file, so it's clearer that
file itself is not ignored (and links can target it).
`<!-- xrefcheck: ignore all -->` instead of `<!-- xrefcheck: ignore file -->`
should be used to disable checking for links in file, so it's clearer that
file itself is not ignored (and links can target it).
* [#215](https://github.com/serokell/xrefcheck/pull/215)
+ Now we notify user when there are scannable files that were not added to Git
yet. Also added CLI option `--include-untracked` to scan such files and treat
as existing.
yet. Also added CLI option `--include-untracked` to scan such files and treat
as existing.
* [#191](https://github.com/serokell/xrefcheck/pull/191)
+ Now we consider slash `/` (and only it) as path separator in local links for all OS,
so xrefcheck's report is OS-independent
Expand All @@ -40,10 +40,14 @@ Unreleased
redirect responses (i.e. 301 and 308) and passes for temporary ones (i.e. 302, 303, 307).
* [#231](https://github.com/serokell/xrefcheck/pull/231)
+ Anchor analysis takes now into account the appropriate case-sensitivity depending on
the configured Markdown flavour.
the configured Markdown flavour.
* [#254](https://github.com/serokell/xrefcheck/pull/254)
+ Now the `dump-config` command does not overwrite a file unless explicitly told with a
`--force` flag. Also, a `--stdout` flag allows to print the config to stdout instead.
the configured Markdown flavour.
* [#250](https://github.com/serokell/xrefcheck/pull/250)
+ Now the redirect behavior for external references can be modified via rules in the
configuration file with the `externalRefRedirects` parameter.

0.2.2
==========
Expand Down Expand Up @@ -95,7 +99,7 @@ Unreleased
+ Make possible to specify whether ignore localhost links, use
`check-localhost` CLA argument (by default localhost links will not be checked).
+ Make possible to ignore auth failures (assume 'protected' links
valid), use `ignoreAuthFailures` parameter of config.
valid), use `ignoreAuthFailures` parameter of config.
* [#66](https://github.com/serokell/xrefcheck/pull/66)
+ Added support for ftp links.
* [#74](https://github.com/serokell/xrefcheck/pull/83)
Expand Down Expand Up @@ -144,10 +148,10 @@ Unreleased
+ Switch to lts-17.3.
* [#53](https://github.com/serokell/xrefcheck/pull/53)
+ Make possible to include a regular expression in
`ignoreRefs` parameter of config to ignore external
references.
`ignoreRefs` parameter of config to ignore external
references.
+ Add support of right in-place ignoring annotations
such as `ignore file`, `ignore paragraph` and `ignore link`.
such as `ignore file`, `ignore paragraph` and `ignore link`.

0.1.2
=======
Expand Down
32 changes: 30 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,36 @@ There are several ways to fix this:
* This behavior can be disabled by setting `ignoreAuthFailures: false` in the config file.

1. How does `xrefcheck` handle redirects?
* Permanent redirects (i.e. 301 and 308) are reported as errors.
* Temporary redirects (i.e. 302, 303 and 307) are assumed to be valid.
* The rules from the default configuration are as follows:
* Permanent redirects (i.e. 301 and 308) are reported as errors.
* Temporary redirects (i.e. 302, 303 and 307) are assumed to be valid.
* Redirect rules can be specified with the `externalRefRedirects` parameter within `networking`, which accepts an array of
rules with keys `from`, `to`, `on` and `outcome`. The rule applied is the first one that matches with
the `from`, `to` and `on` fields, if any, where
* `from` is a regular expression, as in `ignoreExternalRefsTo`, for the source link in a single redirection step. Its absence means that
every link matches.
* `to` is a regular expression for the target link in a single redirection step. Its absence also means that every link matches.
* `on` accepts `temporary`, `permanent` or a specific redirect HTTP code. Its absence also means that
every response code matches.
* The `outcome` parameter accepts `valid`, `invalid` or `follow`. The last one follows the redirect by applying the
same configuration rules.

For example, this configuration forbids 307 redirects to a specific domain and makes redirections from HTTP to HTTPS to be followed:

```
externalRefRedirects:
- to: "https?://forbidden.com.*"
on: 307
outcome: invalid
- from: "http://.*"
to: "https://.*"
outcome: follow
```

The first one applies if both of them match.

* The number of redirects allowed in a single redirect chain is limited and can be configured with the
`maxRedirectFollows` parameter, also within `networking`. A number smaller than 0 disables the limit.

1. How does `xrefcheck` handle localhost links?
* By default, `xrefcheck` will ignore links to localhost.
Expand Down
12 changes: 6 additions & 6 deletions ftp-tests/Test/Xrefcheck/FtpLinks.hs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ import Test.Tasty (TestTree, askOption, testGroup)
import Test.Tasty.HUnit (assertBool, assertFailure, testCase, (@?=))
import Test.Tasty.Options as Tasty (IsOption (..), OptionDescription (Option), safeRead)

import Xrefcheck.Config (Config, cExclusionsL, defConfig)
import Xrefcheck.Config
import Xrefcheck.Core (Flavor (GitHub))
import Xrefcheck.Scan (ecIgnoreExternalRefsToL)
import Xrefcheck.Verify (VerifyError (..), checkExternalResource)
Expand Down Expand Up @@ -48,27 +48,27 @@ test_FtpLinks = askOption $ \(FtpHostOpt host) -> do
testGroup "Ftp links handler"
[ testCase "handles correct link to file" $ do
let link = host <> "/pub/file_exists.txt"
result <- runExceptT $ checkExternalResource config link
result <- runExceptT $ checkExternalResource emptyChain config link
result @?= Right ()

, testCase "handles empty link (host only)" $ do
let link = host
result <- runExceptT $ checkExternalResource config link
result <- runExceptT $ checkExternalResource emptyChain config link
result @?= Right ()

, testCase "handles correct link to non empty directory" $ do
let link = host <> "/pub/"
result <- runExceptT $ checkExternalResource config link
result <- runExceptT $ checkExternalResource emptyChain config link
result @?= Right ()

, testCase "handles correct link to empty directory" $ do
let link = host <> "/empty/"
result <- runExceptT $ checkExternalResource config link
result <- runExceptT $ checkExternalResource emptyChain config link
result @?= Right ()

, testCase "throws exception when file not found" $ do
let link = host <> "/pub/file_does_not_exists.txt"
result <- runExceptT $ checkExternalResource config link
result <- runExceptT $ checkExternalResource emptyChain config link
case result of
Right () ->
assertFailure "No exception was raised, FtpEntryDoesNotExist expected"
Expand Down
17 changes: 15 additions & 2 deletions src/Xrefcheck/Config.hs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

module Xrefcheck.Config
( module Xrefcheck.Config
, module Xrefcheck.Data.Redirect
, defConfigText
) where

Expand All @@ -15,11 +16,12 @@ import Universum
import Control.Lens (makeLensesWith)
import Data.Aeson (genericParseJSON)
import Data.Yaml (FromJSON (..), decodeEither', prettyPrintParseException, withText)

import Text.Regex.TDFA.Text ()
import Time (KnownRatName, Second, Time (..), unitsP)

import Xrefcheck.Config.Default
import Xrefcheck.Core
import Xrefcheck.Data.Redirect
import Xrefcheck.Scan
import Xrefcheck.Scanners.Markdown
import Xrefcheck.Util (Field, aesonConfigOption, postfixFields)
Expand Down Expand Up @@ -78,8 +80,16 @@ data NetworkingConfig' f = NetworkingConfig
-- this `maxTimeoutRetries` option limits only the number of retries
-- caused by timeouts, and `maxRetries` limits the number of retries
-- caused both by 429s and timeouts.
, ncMaxRedirectFollows :: Field f Int
-- ^ Maximum number of links that can be followed in a single redirect
-- chain.
, ncExternalRefRedirects :: Field f RedirectConfig
-- ^ Rules to override the redirect behavior for external references.
} deriving stock (Generic)

-- | A list of custom redirect rules.
type RedirectConfig = [RedirectRule]

-- | Type alias for ScannersConfig' with all required fields.
type ScannersConfig = ScannersConfig' Identity

Expand Down Expand Up @@ -118,7 +128,8 @@ overrideConfig config

defScanners = cScanners $ defConfig flavor
defExclusions = cExclusions $ defConfig flavor
defNetworking = cNetworking $ defConfig flavor
defNetworking = cNetworking (defConfig flavor)
& ncExternalRefRedirectsL .~ []

overrideExclusions exclusionConfig
= ExclusionConfig
Expand All @@ -138,6 +149,8 @@ overrideConfig config
, ncDefaultRetryAfter = overrideField ncDefaultRetryAfter
, ncMaxRetries = overrideField ncMaxRetries
, ncMaxTimeoutRetries = overrideField ncMaxTimeoutRetries
, ncMaxRedirectFollows = overrideField ncMaxRedirectFollows
, ncExternalRefRedirects = overrideField ncExternalRefRedirects
}
where
overrideField :: (forall f. NetworkingConfig' f -> Field f a) -> a
Expand Down
75 changes: 58 additions & 17 deletions src/Xrefcheck/Config/Default.hs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ module Xrefcheck.Config.Default where

import Universum

import Fmt (Builder)
import Text.Interpolation.Nyan

import Xrefcheck.Core
Expand Down Expand Up @@ -36,7 +37,7 @@ exclusions:
# List of POSIX extended regular expressions.
ignoreExternalRefsTo:
# Ignore localhost links by default
- ^(https?|ftps?)://(localhost|127\\.0\\.0\\.1).*
- (https?|ftps?)://(localhost|127\\.0\\.0\\.1).*

# Networking parameters.
networking:
Expand Down Expand Up @@ -65,22 +66,45 @@ networking:
# On other errors xrefcheck fails immediately, without retrying.
maxRetries: 3

# Querying a given domain that ever returned 429 before,
# this defines how many timeouts are allowed during retries.
#
# For such domains, timeouts likely mean hitting the rate limiter,
# and so xrefcheck considers timeouts in the same way as 429 errors.
#
# For other domains, a timeout results in a respective error, no retry
# attempts will be performed. Use `externalRefCheckTimeout` option
# to increase the time after which timeout is declared.
#
# This option is similar to `maxRetries`, the difference is that
# this `maxTimeoutRetries` option limits only the number of retries
# caused by timeouts, and `maxRetries` limits the number of retries
# caused both by 429s and timeouts.
# Querying a given domain that ever returned 429 before,
# this defines how many timeouts are allowed during retries.
#
# For such domains, timeouts likely mean hitting the rate limiter,
# and so xrefcheck considers timeouts in the same way as 429 errors.
#
# For other domains, a timeout results in a respective error, no retry
# attempts will be performed. Use `externalRefCheckTimeout` option
# to increase the time after which timeout is declared.
#
# This option is similar to `maxRetries`, the difference is that
# this `maxTimeoutRetries` option limits only the number of retries
# caused by timeouts, and `maxRetries` limits the number of retries
# caused both by 429s and timeouts.
maxTimeoutRetries: 1

# Maximum number of links that can be followed in a single redirect
# chain.
#
# The link is considered as invalid if the limit is exceeded.
maxRedirectFollows: 10

# Rules to override the redirect behavior for external references that
# match, where
# - 'from' is a regular expression for the source link in a single
# redirection step. Its absence means that every link matches.
# - 'to' is a regular expression for the target link in a single
# redirection step. Its absence also means that every link matches.
# - 'on' accepts 'temporary', 'permanent' or a specific redirect HTTP code.
# Its absence also means that every response code matches.
# - 'outcome' accepts 'valid', 'invalid' or 'follow'. The last one follows
# the redirect by applying the same configuration rules so, for instance,
# exclusion rules would also apply to the following links.
#
# The first one that matches is applied, and the link is considered
# as valid if none of them does match.
externalRefRedirects:
#{interpolateIndentF 4 externalRefRedirects}

# Parameters of scanners for various file types.
scanners:
# On 'anchor not found' error, how much similar anchors should be displayed as
Expand All @@ -95,7 +119,7 @@ scanners:
|]
where
ignoreLocalRefsFrom :: NonEmpty Text
ignoreLocalRefsFrom = fromList $ case flavor of
ignoreLocalRefsFrom = fromList $ case flavor of
GitHub ->
[ ".github/pull_request_template.md"
, ".github/issue_template.md"
Expand All @@ -108,7 +132,7 @@ scanners:
]

ignoreLocalRefsTo :: NonEmpty Text
ignoreLocalRefsTo = fromList $ case flavor of
ignoreLocalRefsTo = fromList $ case flavor of
GitHub ->
[ "../../../issues"
, "../../../issues/*"
Expand All @@ -121,3 +145,20 @@ scanners:
, "../../merge_requests"
, "../../merge_requests/*"
]

externalRefRedirects :: Builder
externalRefRedirects = case flavor of
GitHub ->
[int||
- on: permanent
outcome: invalid|]
GitLab ->
[int||
- on: permanent
outcome: invalid
# GitLab redirects non-existing files to the repository's main page
# with a 302 code instead of answering with a 404 response.
- from: https?://gitlab.com/.*/-/blob/.*
to: https?://gitlab.com/.*
on: 302
outcome: invalid|]
Loading

0 comments on commit 20f3d05

Please sign in to comment.