Skip to content

Commit

Permalink
Fix file exists checker (#44)
Browse files Browse the repository at this point in the history
The main problem was that the file exists checker uses go's built-in filepath globber, which doesn't support double-asterisk patterns (golang/go#11862).
Additionally, with patterns like `*.js` we need to search for all JS files not only on the first level but also in nested directories.

Based on made investigation this problem was fixed and additional test coverage was added to document that contract.

* Refactor code to remove lint issues
* Update Go min version to 1.14 on CI
  • Loading branch information
mszostok committed Oct 17, 2020
1 parent cbda10c commit 32b0e86
Show file tree
Hide file tree
Showing 18 changed files with 434 additions and 48 deletions.
14 changes: 7 additions & 7 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,30 +17,30 @@ jobs:
include:
- stage: test
name: "Build and unit-test with Go 1.12.x"
go: 1.12.x
go: 1.14.x
script: make test-unit
- stage: test
name: "Build and unit-test with Go 1.13.x"
go: 1.13.x
go: 1.15.x
script: make test-unit
- stage: test
name: "Hammer unit-test with Go 1.13.x"
go: 1.13.x
go: 1.15.x
script: make test-hammer
- stage: test
name: "Code Quality Analysis"
script: make test-lint
go: 1.13.x
go: 1.15.x
- stage: test-integration
name: "Integration testing on OSX"
script: make test-integration
os: osx
go: 1.13.x
go: 1.15.x
- stage: test-integration
name: "Integration testing on Linux"
script: make test-integration
os: linux
go: 1.13.x
go: 1.15.x
# Currently skipped as the new line encoding differs between linux and windows
# - stage: test-integration
# name: "Integration testing on Windows"
Expand All @@ -51,7 +51,7 @@ jobs:
# - echo $(pwd)
# - go test ./tests/integration/... -v -tags=integration
# os: windows
# go: 1.13.x
# go: 1.15.x

# push results to CodeCov
after_success:
Expand Down
6 changes: 4 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
.DEFAULT_GOAL = build
.DEFAULT_GOAL = all

# enable module support across all go commands.
export GO111MODULE = on
# enable consistent Go 1.12/1.13 GOPROXY behavior.
export GOPROXY = https://proxy.golang.org

# Build
all: build-race test-unit test-integration test-lint
.PHONY: all

# Build
build:
go build -o codeowners-validator ./main.go
.PHONY: build
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
// Always record the result of func execution to prevent
// the compiler eliminating the function call.
// Always store the result to a package level variable
// so the compiler cannot eliminate the Benchmark itself.
package file_exists_checker

import (
"fmt"
"log"
"os"
"path"
"testing"

"github.com/bmatcuk/doublestar/v2"
"github.com/mattn/go-zglob"
"github.com/yargevad/filepathx"
)

var pattern string
func init() {
curDir, err := os.Getwd()
if err != nil {
log.Fatal(err)
}
pattern = path.Join(curDir, "..", "..", "**", "*.md")
fmt.Println(pattern)
}

var pathx []string

func BenchmarkPathx(b *testing.B) {
var r []string
for n := 0; n < b.N; n++ {
r, _ = filepathx.Glob(pattern)
}
pathx = r
}

var zGlob []string

func BenchmarkZGlob(b *testing.B) {
var r []string
for n := 0; n < b.N; n++ {
r, _ = zglob.Glob(pattern)
}
zGlob = r
}

var double []string

func BenchmarkDoublestar(b *testing.B) {
var r []string
for n := 0; n < b.N; n++ {
r, _ = doublestar.Glob(pattern)
}
double = r
}
61 changes: 61 additions & 0 deletions docs/investigation/file_exists_checker/glob.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
## File exits checker

This document describes investigation about [`file exists`](../../../internal/check/file_exists.go) checker which needs to deal with the gitignore pattern syntax

### Problem

A [CODEOWNERS](https://docs.github.com/en/free-pro-team@latest/github/creating-cloning-and-archiving-repositories/about-code-owners#codeowners-syntax) file uses a pattern that follows the same rules used in [gitignore](https://git-scm.com/docs/gitignore#_pattern_format) files.
The gitignore files support two consecutive asterisks ("**") in patterns that match against the full path name. Unfortunately the core Go library `filepath.Glob` does not support [`**`](https://github.com/golang/go/issues/11862) at all.

This caused that for some patterns the [`file exists`](../../../internal/check/file_exists.go) checker didn't work properly, see [issue#22](https://github.com/mszostok/codeowners-validator/issues/22).

Additionally, we need to support a single asterisk at the beginning of the pattern. For example, `*.js` should check for all JS files in the whole git repository. To achieve that we need to detect that and change from `*.js` to `**/*.js`.

```go
pattern := "*.js"
if len(pattern) >= 2 && pattern[:1] == "*" && pattern[1:2] != "*" {
pattern = "**/" + pattern
}
```

### Investigation

Instead of creating a dedicated solution, I decided to search for a custom library that's supporting two consecutive asterisks.
There are a few libraries in open-source that can be used for that purpose. I selected three:
- https://github.com/bmatcuk/doublestar/v2
- https://github.com/mattn/go-zglob
- https://github.com/yargevad/filepathx

I've tested all libraries and all of them were supporting `**` pattern properly. As a final criterion, I created benchmark tests.

#### Benchmarks

Run benchmarks with 1 CPU for 5 seconds:

```bash
go test -bench=. -benchmem -cpu 1 -benchtime 5s ./file_matcher_libs_bench_test.go

goos: darwin
goarch: amd64
BenchmarkPathx 79 72276938 ns/op 7297258 B/op 40808 allocs/op
BenchmarkZGlob 126 47206545 ns/op 840973 B/op 10550 allocs/op
BenchmarkDoublestar 157 38041578 ns/op 3521379 B/op 22150 allocs/op
```

Run benchmarks with 12 CPU for 5 seconds:
```bash
go test -bench=. -benchmem -cpu 12 -benchtime 5s ./file_matcher_libs_bench_test.go

goos: darwin
goarch: amd64
BenchmarkPathx-12 78 73096386 ns/op 7297114 B/op 40807 allocs/op
BenchmarkZGlob-12 637 9234632 ns/op 914239 B/op 10564 allocs/op
BenchmarkDoublestar-12 151 38372922 ns/op 3522899 B/op 22151 allocs/op
```

#### Summary

With the 1 CPU , the `doublestar` library has the shortest time, but the allocated memory is higher than the `z-glob` library.
With the 12 CPU, the `z-glob` is a winner bot in time and memory allocation. The worst one in each test was the `filepathx` library.

> **NOTE:** The `z-glob` library has an issue with error handling. I've provided PR for fixing that problem: https://github.com/mattn/go-zglob/pull/37.
9 changes: 9 additions & 0 deletions docs/investigation/file_exists_checker/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
module github.com/mszostok/codeowners-validator/docs/investigation/file_exists_checker

go 1.15

require (
github.com/bmatcuk/doublestar/v2 v2.0.1
github.com/mattn/go-zglob v0.0.4-0.20201017022353-70beb5203ba6
github.com/yargevad/filepathx v0.0.0-20161019152617-907099cb5a62
)
6 changes: 6 additions & 0 deletions docs/investigation/file_exists_checker/go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
github.com/bmatcuk/doublestar/v2 v2.0.1 h1:EFT91DmIMRcrUEcYUW7AqSAwKvNzP5+CoDmNVBbcQOU=
github.com/bmatcuk/doublestar/v2 v2.0.1/go.mod h1:QMmcs3H2AUQICWhfzLXz+IYln8lRQmTZRptLie8RgRw=
github.com/mattn/go-zglob v0.0.4-0.20201017022353-70beb5203ba6 h1:nw6OKTHiQIVOSaT4xJ5STrLfUFs3xlU5dc6H4pT5bVQ=
github.com/mattn/go-zglob v0.0.4-0.20201017022353-70beb5203ba6/go.mod h1:MxxjyoXXnMxfIpxTK2GAkw1w8glPsQILx3N5wrKakiY=
github.com/yargevad/filepathx v0.0.0-20161019152617-907099cb5a62 h1:pZlTNPEY1N9n4Frw+wiRy9goxBru/H5KaBxJ4bFt89w=
github.com/yargevad/filepathx v0.0.0-20161019152617-907099cb5a62/go.mod h1:VtdjfTSVslSOB39qCxkH9K3m2qUauaJk/6y+pNkvCQY=
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ require (
github.com/google/go-github/v29 v29.0.3
github.com/hashicorp/go-multierror v1.0.0
github.com/kr/text v0.2.0 // indirect
github.com/mattn/go-zglob v0.0.4-0.20201017022353-70beb5203ba6
github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e // indirect
github.com/pkg/errors v0.9.1
github.com/sebdah/goldie/v2 v2.3.0
Expand Down
3 changes: 3 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,8 @@ github.com/mattn/go-colorable v0.1.4/go.mod h1:U0ppj6V5qS13XJ6of8GYAs25YV2eR4EVc
github.com/mattn/go-isatty v0.0.8/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hdxcsrc5s=
github.com/mattn/go-isatty v0.0.11 h1:FxPOTFNqGkuDUGi3H/qkUbQO4ZiBa2brKq5r0l8TGeM=
github.com/mattn/go-isatty v0.0.11/go.mod h1:PhnuNfih5lzO57/f3n+odYbM4JtupLOxQOAqxQCu2WE=
github.com/mattn/go-zglob v0.0.4-0.20201017022353-70beb5203ba6 h1:nw6OKTHiQIVOSaT4xJ5STrLfUFs3xlU5dc6H4pT5bVQ=
github.com/mattn/go-zglob v0.0.4-0.20201017022353-70beb5203ba6/go.mod h1:MxxjyoXXnMxfIpxTK2GAkw1w8glPsQILx3N5wrKakiY=
github.com/mitchellh/go-homedir v1.1.0 h1:lukF9ziXFxDFPkA1vsr5zpc1XuPDn/wFntq5mG+4E0Y=
github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
github.com/niemeyer/pretty v0.0.0-20200227124842-a10e7caefd8e h1:fD57ERR4JtEqsWbfPhv4DMiApHyliiK5xCTNVSPiaAs=
Expand Down Expand Up @@ -107,6 +109,7 @@ golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAG
golang.org/x/oauth2 v0.0.0-20190319182350-c85d3e98c914 h1:jIOcLT9BZzyJ9ce+IwwZ+aF9yeCqzrR+NrD68a/SHKw=
golang.org/x/oauth2 v0.0.0-20190319182350-c85d3e98c914/go.mod h1:gOpvHmFTYa4IltrdGE7lF6nIHvwfUNPOp7c8zoXwtLw=
golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190423024810-112230192c58 h1:8gQV6CLnAEikrhgkHFbMAEhagSSnXWGV915qUMm9mrU=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a h1:1BGLXjeY4akVXGgbC9HugT3Jv3hCI0z56oJR5vAMgBU=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
Expand Down
8 changes: 7 additions & 1 deletion hack/ci/run-lint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ set -E # needs to be set if we want the ERR trap

readonly CURRENT_DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )
readonly ROOT_PATH=${CURRENT_DIR}/../..
readonly GOLANGCI_LINT_VERSION="v1.23.8"
readonly GOLANGCI_LINT_VERSION="v1.31.0"

source "${CURRENT_DIR}/utilities.sh" || { echo 'Cannot load CI utilities.'; exit 1; }

Expand All @@ -19,6 +19,12 @@ golangci::install() {
}

golangci::run_checks() {
GOT_VER=$(golangci-lint version --format=short 2>&1)
if [[ "v${GOT_VER}" != "${GOLANGCI_LINT_VERSION}" ]]; then
echo -e "${RED}✗ golangci-lint version mismatch, expected ${GOLANGCI_LINT_VERSION}, available ${GOT_VER} ${NC}"
exit 1
fi

shout "Run golangci-lint checks"
LINTS=(
# default golangci-lint lints
Expand Down
34 changes: 21 additions & 13 deletions internal/check/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ import (
"context"
"fmt"
"strings"
"sync"

"github.com/mszostok/codeowners-validator/internal/ptr"
"github.com/mszostok/codeowners-validator/pkg/codeowners"
)

Expand All @@ -22,13 +24,18 @@ type (
}

Input struct {
RepoDir string
CodeownerEntries []codeowners.Entry
RepoDir string
CodeownersEntries []codeowners.Entry
}

Output struct {
Issues []Issue
}

OutputBuilder struct {
mux sync.Mutex
issues []Issue
}
)

type ReportIssueOpt func(*Issue)
Expand All @@ -41,18 +48,13 @@ func WithSeverity(s SeverityType) ReportIssueOpt {

func WithEntry(e codeowners.Entry) ReportIssueOpt {
return func(i *Issue) {
i.LineNo = uint64Ptr(e.LineNo)
i.LineNo = ptr.Uint64Ptr(e.LineNo)
}
}

func uint64Ptr(u uint64) *uint64 {
c := u
return &c
}

func (out *Output) ReportIssue(msg string, opts ...ReportIssueOpt) Issue {
if out == nil { // TODO: error?
return Issue{}
func (bldr *OutputBuilder) ReportIssue(msg string, opts ...ReportIssueOpt) *OutputBuilder {
if bldr == nil { // TODO: error?
return nil
}

i := Issue{
Expand All @@ -64,9 +66,15 @@ func (out *Output) ReportIssue(msg string, opts ...ReportIssueOpt) Issue {
opt(&i)
}

out.Issues = append(out.Issues, i)
bldr.mux.Lock()
defer bldr.mux.Unlock()
bldr.issues = append(bldr.issues, i)

return bldr
}

return i
func (bldr *OutputBuilder) Output() Output {
return Output{Issues: bldr.issues}
}

type SeverityType int
Expand Down
10 changes: 5 additions & 5 deletions internal/check/duplicated_pattern.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,13 @@ func NewDuplicatedPattern() *DuplicatedPattern {

// Check searches for doubles paths(patterns) in CODEOWNERS file.
func (d *DuplicatedPattern) Check(ctx context.Context, in Input) (Output, error) {
var output Output
var bldr OutputBuilder

// TODO(mszostok): decide if the `CodeownerEntries` entry by default should be
// TODO(mszostok): decide if the `CodeownersEntries` entry by default should be
// indexed by pattern (`map[string][]codeowners.Entry{}`)
// Required changes in pkg/codeowners/owners.go.
patterns := map[string][]codeowners.Entry{}
for _, entry := range in.CodeownerEntries {
for _, entry := range in.CodeownersEntries {
if ctxutil.ShouldExit(ctx) {
return Output{}, ctx.Err()
}
Expand All @@ -37,11 +37,11 @@ func (d *DuplicatedPattern) Check(ctx context.Context, in Input) (Output, error)
for name, entries := range patterns {
if len(entries) > 1 {
msg := fmt.Sprintf("Pattern %q is defined %d times in lines: \n%s", name, len(entries), d.listFormatFunc(entries))
output.ReportIssue(msg)
bldr.ReportIssue(msg)
}
}

return output, nil
return bldr.Output(), nil
}

// listFormatFunc is a basic formatter that outputs a bullet point list of the pattern.
Expand Down
Loading

0 comments on commit 32b0e86

Please sign in to comment.