Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go: make a module's dependencies available in input root during analysis #12771

Closed
tdyas opened this issue Sep 7, 2021 · 7 comments · Fixed by #13094
Closed

go: make a module's dependencies available in input root during analysis #12771

tdyas opened this issue Sep 7, 2021 · 7 comments · Fixed by #13094
Labels
backend: Go Go backend-related issues estimate: <1W

Comments

@tdyas
Copy link
Contributor

tdyas commented Sep 7, 2021

#12719 (comment) describes performance issues in the Go plugin. This issue is a potential solution.

The Go plugin takes an inordinate amount of time analyzing third party modules with go list. I verified with go list -x ./... and an empty Go module cache that Go will download third party dependencies. The theory is that Go is re-downloading items that have already been downloaded already.

Ideas for solution:

  1. Use an append-only cache for GOPATH (which is where the Go module cache is stored). This is the easier short-term solution.
  2. Populate the GOPATH with the output from other external module downloads. This probably plays better with remote execution where the remote environment may not have append-only caches.
@tdyas tdyas changed the title go: make downloaded transitive dependencies available in input root go: make downloaded transitive dependencies available in input root during analysis Sep 7, 2021
@tdyas tdyas changed the title go: make downloaded transitive dependencies available in input root during analysis go: make a module's dependencies available in input root during analysis Sep 7, 2021
@tdyas tdyas added the backend: Go Go backend-related issues label Sep 7, 2021
@stuhood
Copy link
Member

stuhood commented Sep 7, 2021

This probably plays better with remote execution where the remote environment may not have append-only caches.

While this is a concern, it's probably a lower priority concern. I expect that working with servers to implement the named_caches proposal would be easier than optimizing the capturing the relevant subsets of the GOPATH into and out of sandboxes (even with FUSE).

@stuhood
Copy link
Member

stuhood commented Sep 7, 2021

golang/go#43645 is half concerning and half reassuring: it sounds like the GOPATH was likely mounted strangely, but the developers seemed to take the issue seriously.

@tdyas
Copy link
Contributor Author

tdyas commented Sep 7, 2021

golang/go#43645 is half concerning and half reassuring: it sounds like the GOPATH was likely mounted strangely, but the developers seemed to take the issue seriously.

To clarify, the build cache is GOCACHE, the module cache is under GOPATH. Given the Pants plugin caches package archives in the CAS (and not in GOCACHE), this Go issue might not be a problem for the Pants plugin.

@stuhood
Copy link
Member

stuhood commented Sep 7, 2021

Given the Pants plugin caches package archives in the CAS (and not in GOCACHE), this Go issue might not be a problem for the Pants plugin.

The thrust of my comment above is that to the extent that it is possible, we should use the named/append-only caches when tools have well-behaved caches, and not put things in the CAS.

@tdyas
Copy link
Contributor Author

tdyas commented Sep 7, 2021

Given the Pants plugin caches package archives in the CAS (and not in GOCACHE), this Go issue might not be a problem for the Pants plugin.

The thrust of my comment above is that to the extent that it is possible, we should use the named/append-only caches when tools have well-behaved caches, and not put things in the CAS.

I believe Go's naming is confusing our discussion here. The "build cache" contains intermediate build outputs, unclear if package archives (i.e., .a files) are stored there. Moreover, it is an internal implementation detail of go build (and the plugin does not use go build but instead uses go tool compile directly).

My comment refers to the plugin storing package archives (which are the final build artifact for each package) in the CAS. These are in the CAS because they are the build artifacts that are used to produce executables. These should be in the CAS and should be stored in remote cache.

@stuhood
Copy link
Member

stuhood commented Sep 7, 2021

Ah, I see the confusion: I found that issue via a search for caching issues around the GOPATH, but the issue itself refers to the GOCACHE. Sorry about that.

@Eric-Arellano
Copy link
Contributor

Some notes from @tdyas, who iiuc has suggested solving this by populating the chroot with all downloaded deps and the go.sum, whereas now we are not so Go re-downloads things.

(Eric) to be clear, the idea is replacing DownloadedExternalModule with DownloadedExternalModules (plural) and that it downloads everything in the go.mod?

(Tom) well some thoughts:
for the actual download itself, you could split off downloading the external modules individually and then merge their Digests for the gopath subtree. so you could get some efficiency in not re-downloading things if just one dep changes.
code that needs external module source code should be able to request a (digest, path) combo to tell it where the module source is. whatever request/response types are needed for the rules I have no particular preference
but yeah a DownloadedExternalModules type sounds like it would satisfy that
but you can imagine also a DownloadExternalModuleWithDepRequest that just returns the Digest under the hood that had downloaded everything

Eric-Arellano added a commit that referenced this issue Oct 1, 2021
First part of #12771. We're re-downloading the same Go dependencies, which is substantially slowing down performance. This makes sense, when you say `go mod download module@version`, it will download all transitive deps. Because we are currently downloading one module at-a-time and not sharing a cache between those downloads, we will end up duplicating downloads.

Another problem is that we were not before respecting the `go.sum` on disk. When you say `go mod download module@version`, Go ignores the `go.mod` and `go.sum`. We need to respect the `go.sum` for security.

--

Now, we download all dependencies for each `go_mod` target all at once. (There is no reuse between `go_mod` targets.)

Followups will hook up `go_build_pkg.py` and our `ExternalModulePkgImportPaths` rule to use this.

In a followup, we can also teach Pants how to extract the relevant modules from the superset, like our Python "repository PEX" feature. This will allow us to keep precise cache keys.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Oct 1, 2021
#13070)

Next part of #12771 and builds off of #13068.

When generating targets, we use `go list` to determine what packages belong to each external module. Now, we use the result of `go mod download all`, rather than possibly downloading that module again. We can be confident no downloads are happening by setting `GOPROXY=off` and `-mod=readonly.`

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this issue Oct 1, 2021
…packages (#13076)

Like #13070, this makes more progress on #12771.

We do not yet use `DownloadedExternalModules` for the sources digest used in `go_build_pkg.py`, only for the analysis of the package. That will be fixed in a followup.

To facilitate landing this, we remove the `go-pkg-debug` goal. It seems unlikely we'd want that for production, and you can simulate it by adding logging. It can be added back if we decide it's valuable once there is less churn.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano referenced this issue Oct 3, 2021
…#13092)

Part of `https://github.com/pantsbuild/pants/issues/12771`. Now we consistently use `go mod download all`, and never `go mod download module@version`. See #13068 for the motivation.

A followup will fix this code to use the relevant subset of the downloaded digest.

[ci skip-rust]
[ci skip-build-wheels]
tdyas pushed a commit that referenced this issue Oct 4, 2021
Closes #12771.

This does not actually improve cache invalidation until we figure out #13093. But it does dramatically reduce the size of each sandbox, which #12719 identified is causing substantial slowdowns to set up.

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Go Go backend-related issues estimate: <1W
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants