Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: go build errors that don't correspond to source file until build cache is cleared #69179

Open
sipsma opened this issue Aug 30, 2024 · 6 comments
Assignees
Labels
GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@sipsma
Copy link

sipsma commented Aug 30, 2024

Go version

go version go1.23.0 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='0'
GOMOD='/src/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2731747166=/tmp/go-build -gno-record-gcc-switches'

What did you do?

When attempting to build a valid go program, we are getting errors about imports on certain lines of source code that don't match the actual contents of the source file they are reported on. The errors go away and the program builds successfully once the go build cache at /root/.cache/go-build is deleted and nothing else changes.

The repro is quite complex and takes a long time to trigger, so the best I could do for now was create a docker image with the go toolchain, source code and go build cache in place that reproduces the problem. Including commands for reproducing with that image below.

For more context, we are hitting this in Dagger, which is a container-based DAG execution engine that, among other things, does a lot of containerized building of Go code.

We specifically see this problem arise during integration tests, which will run, over the course of ~20min, many (probably 100+) go build executions in separate containers. The most relevant details I can think of:

  1. All containers are using the same go toolchain version (1.23.0 currently) and the same base image
  2. All containers have a shared bind mount for the go build cache (always mounted at /root/.cache/go-build) and the go mod cache (always mounted at /go/pkg/mod)
  3. Source code is always mounted at /src and built with the command go build -o /runtime . from within that /src directory
    • A lot of the source code will end up with similar and sometimes identical subpackages under /src/internal. They may also have the same go mod name at times.
  4. Builds can happen in parallel and in serial across the integration test suite
  5. The integration tests are quite heavy in terms of CPU usage and disk read/write bandwidth, the hosts are often under quite a bit of load
  6. We don't do any manual fiddling around with the go build cache; we just run commands like go build, go mod tidy, etc. in containers

What did you see happen?

As mentioned above, the best I could do for now was capture the state of one of the containers hitting this error in a docker image. I pushed the image to dockerhub at eriksipsma/corrupt-cache:latest. It's a linux/amd64 only image unfortunately since that's what our CI is, which is the only place I can get this to happen consistently.

Trigger the go build error:

$ docker run --rm -it eriksipsma/corrupt-cache:latest sh -c '/usr/local/go/bin/go build -C /src .'
go: downloading go.opentelemetry.io/otel v1.27.0
go: downloading go.opentelemetry.io/otel/sdk v1.27.0
go: downloading go.opentelemetry.io/otel/trace v1.27.0
go: downloading github.com/99designs/gqlgen v0.17.49
go: downloading golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa
go: downloading github.com/Khan/genqlient v0.7.0
go: downloading golang.org/x/sync v0.7.0
go: downloading github.com/vektah/gqlparser/v2 v2.5.16
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc v0.0.0-20240518090000-14441aefdf88
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.27.0
go: downloading go.opentelemetry.io/otel/log v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.27.0
go: downloading go.opentelemetry.io/otel/sdk/log v0.3.0
go: downloading go.opentelemetry.io/proto/otlp v1.3.1
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.27.0
go: downloading google.golang.org/grpc v1.64.0
go: downloading github.com/go-logr/logr v1.4.1
go: downloading go.opentelemetry.io/otel/metric v1.27.0
go: downloading golang.org/x/sys v0.21.0
go: downloading google.golang.org/protobuf v1.34.1
go: downloading google.golang.org/genproto/googleapis/rpc v0.0.0-20240515191416-fc5f0ca64291
go: downloading github.com/go-logr/stdr v1.2.2
go: downloading github.com/cenkalti/backoff/v4 v4.3.0
go: downloading github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0
go: downloading github.com/google/uuid v1.6.0
go: downloading github.com/sosodev/duration v1.3.1
go: downloading golang.org/x/net v0.26.0
go: downloading google.golang.org/genproto/googleapis/api v0.0.0-20240520151616-dc85e6b867a5
go: downloading golang.org/x/text v0.16.0
internal/dagger/dagger.gen.go:23:2: package dagger/test/internal/querybuilder is not in std (/usr/local/go/src/dagger/test/internal/querybuilder)
internal/dagger/dagger.gen.go:24:2: package dagger/test/internal/telemetry is not in std (/usr/local/go/src/dagger/test/internal/telemetry)

The errors refer to the source file at /src/internal/dagger/dagger.gen.go. However, the imports it's erroring on are not the actual imports in the source file:

$ docker run --rm -it eriksipsma/corrupt-cache:latest head -n25 /src/internal/dagger/dagger.gen.go
// Code generated by dagger. DO NOT EDIT.

package dagger

import (
        "context"
        "encoding/json"
        "errors"
        "fmt"
        "net"
        "net/http"
        "os"
        "reflect"
        "strconv"
        "strings"

        "github.com/Khan/genqlient/graphql"
        "github.com/vektah/gqlparser/v2/gqlerror"
        "go.opentelemetry.io/otel"
        "go.opentelemetry.io/otel/propagation"
        "go.opentelemetry.io/otel/trace"

        "dagger/bare/internal/querybuilder"
        "dagger/bare/internal/telemetry"
)
  • Note that the errors refer to dagger/test/internal/ but the actual imports in the source code are dagger/bare/internal
  • Also worth noting that other containers do build source code with similar package layouts and contents except the import is dagger/test/internal. So it seems like go build here is somehow finding something in the cache from a previous build and incorrectly using it for this one.

The error goes away if you first clear the build cache and then run the same go build command:

$ docker run --rm -it eriksipsma/corrupt-cache:latest sh -c 'rm -rf /root/.cache/go-build && /usr/local/go/bin/go build -C /src .'
go: downloading go.opentelemetry.io/otel/sdk v1.27.0
go: downloading go.opentelemetry.io/otel/trace v1.27.0
go: downloading go.opentelemetry.io/otel v1.27.0
go: downloading github.com/99designs/gqlgen v0.17.49
go: downloading github.com/Khan/genqlient v0.7.0
go: downloading golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa
go: downloading golang.org/x/sync v0.7.0
go: downloading github.com/vektah/gqlparser/v2 v2.5.16
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc v0.0.0-20240518090000-14441aefdf88
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.27.0
go: downloading go.opentelemetry.io/otel/log v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.27.0
go: downloading go.opentelemetry.io/otel/sdk/log v0.3.0
go: downloading go.opentelemetry.io/proto/otlp v1.3.1
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.27.0
go: downloading google.golang.org/grpc v1.64.0
go: downloading github.com/go-logr/logr v1.4.1
go: downloading go.opentelemetry.io/otel/metric v1.27.0
go: downloading golang.org/x/sys v0.21.0
go: downloading google.golang.org/protobuf v1.34.1
go: downloading google.golang.org/genproto/googleapis/rpc v0.0.0-20240515191416-fc5f0ca64291
go: downloading github.com/google/uuid v1.6.0
go: downloading github.com/sosodev/duration v1.3.1
go: downloading github.com/go-logr/stdr v1.2.2
go: downloading github.com/cenkalti/backoff/v4 v4.3.0
go: downloading github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0
go: downloading golang.org/x/net v0.26.0
go: downloading google.golang.org/genproto/googleapis/api v0.0.0-20240520151616-dc85e6b867a5
go: downloading golang.org/x/text v0.16.0

What did you expect to see?

For go build to not report errors that don't correspond to the source contents, and for the go build cache to not need to be cleared in order to get rid of the errors.

@sipsma sipsma changed the title go build: errors that don't correspond to source file until build cache is cleared cmd/go: go build errors that don't correspond to source file until build cache is cleared Aug 30, 2024
@seankhliao
Copy link
Member

I think somethhing to do with the module index?

Cache verify:

/src # export GODEBUG=gocacheverify=1
/src # /usr/local/go/bin/go build
panic: go: internal cache error: cache verify failed: id=e0ec2602ae240ba6cf7c4e5e75c7c3300aadf0258b376c72d3b612eadbf9e7c4 changed:<<<
	modroot /src
	package go1.23.0 go index v2 /src/internal/dagger
	file dagger.gen.go 2024-08-30 21:58:16.88202819 +0000 UTC 208312
	
	>>>
	old: cf4f84f1aa46a7c9ae121f6d99fa0c53ec025338cc6dcad292a986782e7254c1 815
	new: 704f06df319148867ac8bd9cbcd63e03491a3d078d3083719e85e7e89436f384 815

goroutine 21 [running]:
cmd/go/internal/cache.(*DiskCache).putIndexEntry(0xc0001325d0, {0xe0, 0xec, 0x26, 0x2, 0xae, 0x24, 0xb, 0xa6, 0xcf, ...}, ...)
	cmd/go/internal/cache/cache.go:444 +0x605
cmd/go/internal/cache.(*DiskCache).put(0xc0001325d0, {0xe0, 0xec, 0x26, 0x2, 0xae, 0x24, 0xb, 0xa6, 0xcf, ...}, ...)
	cmd/go/internal/cache/cache.go:524 +0x21a
cmd/go/internal/cache.(*DiskCache).Put(0x1?, {0xe0, 0xec, 0x26, 0x2, 0xae, 0x24, 0xb, 0xa6, 0xcf, ...}, ...)
	cmd/go/internal/cache/cache.go:494 +0x85
cmd/go/internal/cache.PutBytes({0xc19820, 0xc0001325d0}, {0xe0, 0xec, 0x26, 0x2, 0xae, 0x24, 0xb, 0xa6, ...}, ...)
	cmd/go/internal/cache/cache.go:529 +0xc7
cmd/go/internal/modindex.openIndexPackage.func1()
	cmd/go/internal/modindex/read.go:216 +0x1c7
cmd/go/internal/par.(*ErrCache[...]).Do.func1()
	cmd/go/internal/par/work.go:119 +0x13
cmd/go/internal/par.(*Cache[...]).Do(0xc1d7c0, {{0xc000014044, 0x4}, {0xc000000060, 0x14}}, 0xc000411848)
	cmd/go/internal/par/work.go:160 +0x11b
cmd/go/internal/par.(*ErrCache[...]).Do(0xc000146140?, {{0xc000014044, 0x4}, {0xc000000060, 0x14}}, 0x0?)
	cmd/go/internal/par/work.go:118 +0x46
cmd/go/internal/modindex.openIndexPackage({0xc000014044?, 0xc000146140?}, {0xc000000060?, 0xc000146140?})
	cmd/go/internal/modindex/read.go:205 +0x98
cmd/go/internal/modindex.GetPackage({0xc000014044, 0x4}, {0xc000000060, 0x14})
	cmd/go/internal/modindex/read.go:142 +0x14d
cmd/go/internal/modload.dirInModule.func2()
	cmd/go/internal/modload/import.go:723 +0x2f
cmd/go/internal/par.(*ErrCache[...]).Do.func1()
	cmd/go/internal/par/work.go:119 +0x13
cmd/go/internal/par.(*Cache[...]).Do(0xc1d440, {0xc000000060, 0x14}, 0xc000411a28)
	cmd/go/internal/par/work.go:160 +0xfb
cmd/go/internal/par.(*ErrCache[...]).Do(0xc1d3c0?, {0xc000000060?, 0xd?}, 0xc000411b08?)
	cmd/go/internal/par/work.go:118 +0x38
cmd/go/internal/modload.dirInModule({0xc0001d8d05, 0x1b}, {0xc0001d8d05?, 0xb}, {0xc000014044, 0x4}, 0x1)
	cmd/go/internal/modload/import.go:719 +0x271
cmd/go/internal/modload.importFromModules({0xc18e90, 0xfedba0}, {0xc0001d8d05, 0x1b}, 0xc00017c2d0, 0x0, 0x0)
	cmd/go/internal/modload/import.go:430 +0xcd9
cmd/go/internal/modload.(*loader).load(0xc0001a20e0, {0xc18e90, 0xfedba0}, 0xc000210c30)
	cmd/go/internal/modload/load.go:1852 +0xa5
cmd/go/internal/modload.(*loader).pkg.func1.1()
	cmd/go/internal/modload/load.go:1665 +0x25
cmd/go/internal/par.(*Queue).Add.func1()
	cmd/go/internal/par/queue.go:58 +0x5c
created by cmd/go/internal/par.(*Queue).Add in goroutine 19
	cmd/go/internal/par/queue.go:56 +0x1d3

With the module index on (default):

/src # export GODEBUG=goindex=1
/src # /usr/local/go/bin/go build
internal/dagger/dagger.gen.go:23:2: package dagger/test/internal/querybuilder is not in std (/usr/local/go/src/dagger/test/internal/querybuilder)
internal/dagger/dagger.gen.go:24:2: package dagger/test/internal/telemetry is not in std (/usr/local/go/src/dagger/test/internal/telemetry)

With module index off, build succeeds:

/src # export GODEBUG=goindex=0
/src # /usr/local/go/bin/go build

cc @matloob @samthanawalla

@seankhliao seankhliao added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Sep 1, 2024
@matloob matloob self-assigned this Sep 3, 2024
@matloob
Copy link
Contributor

matloob commented Sep 3, 2024

I'll look into this

@sipsma
Copy link
Author

sipsma commented Sep 16, 2024

Let me know if there's anything I can do to help debug further!

@sipsma
Copy link
Author

sipsma commented Sep 18, 2024

@matloob @seankhliao Quick question pending the investigation on your end, is there any obvious reason it would be a bad idea to run with GODEBUG=goindex=0 (disable the module index) for now?

Asking because this bug is causing lots of problems in our CI test suite specifically, so I'm considering just disabling the mod index there while things get figured out upstream here. In some preliminary runs I didn't notice any performance impact for us when running with it disabled for the whole test suite (and no occurrences of the bug so far 🤞).

However, that goindex setting is undocumented as far as I can tell so want to double-check with you disabling it isn't going to break go tooling in some other unobvious way or otherwise be a bad idea.

@matloob
Copy link
Contributor

matloob commented Sep 19, 2024

I don't think disabling the module index should cause any correctness problems.

(edit: correctness, not performance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants