Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gomod: Do not attempt to parse transitive dependencies #2880

Merged
merged 5 commits into from
Dec 22, 2020

Conversation

jurre
Copy link
Member

@jurre jurre commented Dec 16, 2020

Previously we would run go list -m -json all to generate a list of
dependencies to attempt to update.

The way that this is implemented, it causes every transitive
dependency to be listed, even if they are not required by this project.

This can happen when a module defines several packages, but only a
subset of those packages is used by the current repository.

It will also verify that each of those transitive dependencies is
available, and this causes issues when we rely on a dependency with
private transitive dependencies. We run into the scenario where we have
access to any of the transitive dependencies that are used, but since we
do not have access to an unused transitive dependency, we run into a
dependency_file_not_resolvable error.

This is especially unfortunate, since we won't attempt to update any
indirect dependencies for go_modules anyway.

By using go mod edit -json instead, we get an easier to parse list of
dependencies, and we grab just the direct ones.

In order to replicate the previous behavior of raising a
DependencyFileNotResolvable error when some of the packages are not
available, we now also run a go get, but it's debatable if we even
need to do this, since this error will be raised in subsequent steps (in
the FileUpdater), and not raising it here means that we can
consolidate our error handling for this scenario in one place.

Co-authored-by: Fatih Arslan fatih@github.com

@jurre jurre requested a review from a team as a code owner December 16, 2020 13:04
@@ -37,7 +37,7 @@
describe "parse" do
subject(:dependencies) { parser.parse }

its(:length) { is_expected.to eq(6) }
its(:length) { is_expected.to eq(5) }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer pull in rsc.io/sampler since it's not required


command = "go mod edit -print > /dev/null"
command += " && go list -m -json all"
command = "go get > /dev/null" # Verify the packages are installable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does add a bit of overhead, and as discussed in the PR body, I doubt that it's necessary. The upside is that we fail early in the sad path, the downside is we have to duplicate error handling between this class and the GoModUpdater, and it adds some overhead in the happy path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this slow down the pull request opening latency? I assume beause we're running this twice, this means there will be significant overhead as you meant. I'm in favor of removing it. In fact maye this method's only concern should be return the list of required packages. If those packages are installable or valid should be handled probably somewhere else. Thoughts on that ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this slow down the pull request opening latency?

A bit, it's hard to say how significant it would be, we definitely perform some heavier operations in the actual updating step, but I noticed our tests for this this class increased by ~40 seconds after adding this go get vs main, and our test go.mod files are typically very small. When we remove the go get, we speed up this step significantly as we do not need to make any network calls.

I'm in favor of removing it. In fact maye this method's only concern should be return the list of required packages. If those packages are installable or valid should be handled probably somewhere else. Thoughts on that ?

I tend to agree! 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah 👍 on removing go get from this step, seems like we're overloading the parser by doing resolvability that should be handled by the update checker/file updater.

Copy link
Contributor

@fatih fatih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏼 Just left a comment about go get behavior, otherwise I'm glad we changed this bit to more efficient. I think if we don't add go get and because we removed go list, this will also significantly improve the time we open pull requests for Go modules.

dependency = dependency_from_details(dep)
dependency_set << dependency if dependency
required_packages.each do |dep|
dependency_set << dependency_from_details(dep)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplification here 👍🏼

@@ -78,17 +69,18 @@ def module_info
end

File.write("go.mod", go_mod_content)
File.write("main.go", "package dummypkg\n func main() {}\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add this here? I think it would be nice to add some comment maybe explaining the reasoning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good call, we add it so that we can run go get, we perform this step in a temp directory where we copy in just the go.mod and go.sum files, and without the main.go we get go get .: path is not a package in module rooted at.

I agree that we should clarify this if we indeed decide to keep it, but if we get rid of go get, we can also nuke this step


command = "go mod edit -print > /dev/null"
command += " && go list -m -json all"
command = "go get > /dev/null" # Verify the packages are installable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this slow down the pull request opening latency? I assume beause we're running this twice, this means there will be significant overhead as you meant. I'm in favor of removing it. In fact maye this method's only concern should be return the list of required packages. If those packages are installable or valid should be handled probably somewhere else. Thoughts on that ?

@hmarr
Copy link
Contributor

hmarr commented Dec 16, 2020

I just want to make sure I understand the problem this solves. Does the following situation capture it correctly?

Project a depends on package c in module b. Module b includes both packages c and d. Package d in module b has a dependency on a private module e, which isn't accessible (e.g. because it's on a corporate network). Currently this causes update jobs to fail, but after this change those jobs would succeed?

@jurre
Copy link
Member Author

jurre commented Dec 16, 2020

I just want to make sure I understand the problem this solves. Does the following situation capture it correctly?

Project a depends on package c in module b. Module b includes both packages c and d. Package d in module b has a dependency on a private module e, which isn't accessible (e.g. because it's on a corporate network). Currently this causes update jobs to fail, but after this change those jobs would succeed?

Yes, I think that's exactly the scenario 👍 with the caveat that package e is not required in project a.

@hmarr
Copy link
Contributor

hmarr commented Dec 16, 2020

Sorry for all the questions - I just want to clarify this bit:

The way that this is implemented, it causes every transitive dependency to be listed, even if they are not required by this project.

By "required", do you mean like "needed by the project", as opposed to required in the go.mod? If that's the case, it sounds like we're still expecting "necessary" (but not necessarily required) transitive dependencies to show up. I have a suspicion that this change removes all transitive dependencies from the list, perhaps other than the non-Go-modules // indirect ones, as they're listed in the top level go.mod. Is that what you're expecting?

I ask because we probably will want to support updating of transitive dependencies at some point. Though if it's causing problems right now, I'm happy to pull it out and we can re-add it down the line.

@jurre
Copy link
Member Author

jurre commented Dec 16, 2020

Sorry for all the questions - I just want to clarify this bit:

No worries!

The way that this is implemented, it causes every transitive dependency to be listed, even if they are not required by this project.

By "required", do you mean like "needed by the project", as opposed to required in the go.mod? If that's the case, it sounds like we're still expecting "necessary" (but not necessarily required) transitive dependencies to show up. I have a suspicion that this change removes all transitive dependencies from the list, perhaps other than the non-Go-modules // indirect ones, as they're listed in the top level go.mod. Is that what you're expecting?

I ask because we probably will want to support updating of transitive dependencies at some point. Though if it's causing problems right now, I'm happy to pull it out and we can re-add it down the line.

No, the scenario where this happens is actually a little different, apologies for not spotting that:

module a requires module b
  module b defines pkg c and pkg d
    pkg d requires module e

module a only requires pkg c from module b

In this case, module a has no transitive dependency on module e. It's not required in the go.mod and it is never imported anywhere in the code, it has an entry in the go.sum, but other than that it's not needed nor referenced.

We still grab all the // indirect dependencies by using go mod edit -json, even though we discard them in the next step

(have shared the actual dependency chain privately as they contain private repo's)

@jurre
Copy link
Member Author

jurre commented Dec 18, 2020

So, the downside of this approach is that we do not parse any regular transitive dependencies, only those listed in the go.mod file, but not regular transitive deps for those that have a go.mod file.

Example provided by @hmarr:

gmtest $ cat go.mod
module foo
go 1.14
require github.com/stretchr/testify v1.6.1
gmtest $ go mod edit -json
{
	"Module": {
		"Path": "foo"
	},
	"Go": "1.14",
	"Require": [
		{
			"Path": "github.com/stretchr/testify",
			"Version": "v1.6.1"
		}
	],
	"Exclude": null,
	"Replace": null
}
gmtest $ go list -m -json all
{
	"Path": "foo",
	"Main": true,
	"Dir": "/tmp/gmtest",
	"GoMod": "/tmp/gmtest/go.mod",
	"GoVersion": "1.14"
}
{
	"Path": "github.com/davecgh/go-spew",
	"Version": "v1.1.0",
	"Time": "2016-10-29T20:57:26Z",
	"Indirect": true,
	"Dir": "/Users/harry/pkg/mod/github.com/davecgh/go-spew@v1.1.0",
	"GoMod": "/Users/harry/pkg/mod/cache/download/github.com/davecgh/go-spew/@v/v1.1.0.mod"
}
{
	"Path": "github.com/pmezard/go-difflib",
	"Version": "v1.0.0",
	"Time": "2016-01-10T10:55:54Z",
	"Indirect": true,
	"Dir": "/Users/harry/pkg/mod/github.com/pmezard/go-difflib@v1.0.0",
	"GoMod": "/Users/harry/pkg/mod/cache/download/github.com/pmezard/go-difflib/@v/v1.0.0.mod"
}
{
	"Path": "github.com/stretchr/objx",
	"Version": "v0.1.0",
	"Time": "2018-01-06T01:13:53Z",
	"Indirect": true,
	"Dir": "/Users/harry/pkg/mod/github.com/stretchr/objx@v0.1.0",
	"GoMod": "/Users/harry/pkg/mod/cache/download/github.com/stretchr/objx/@v/v0.1.0.mod"
}
{
	"Path": "github.com/stretchr/testify",
	"Version": "v1.6.1",
	"Time": "2020-06-05T10:48:45Z",
	"Dir": "/Users/harry/pkg/mod/github.com/stretchr/testify@v1.6.1",
	"GoMod": "/Users/harry/pkg/mod/cache/download/github.com/stretchr/testify/@v/v1.6.1.mod",
	"GoVersion": "1.13"
}
{
	"Path": "gopkg.in/check.v1",
	"Version": "v0.0.0-20161208181325-20d25e280405",
	"Time": "2016-12-08T18:13:25Z",
	"Indirect": true,
	"Dir": "/Users/harry/pkg/mod/gopkg.in/check.v1@v0.0.0-20161208181325-20d25e280405",
	"GoMod": "/Users/harry/pkg/mod/cache/download/gopkg.in/check.v1/@v/v0.0.0-20161208181325-20d25e280405.mod"
}
{
	"Path": "gopkg.in/yaml.v3",
	"Version": "v3.0.0-20200313102051-9f266ea9e77c",
	"Time": "2020-03-13T10:20:51Z",
	"Indirect": true,
	"Dir": "/Users/harry/pkg/mod/gopkg.in/yaml.v3@v3.0.0-20200313102051-9f266ea9e77c",
	"GoMod": "/Users/harry/pkg/mod/cache/download/gopkg.in/yaml.v3/@v/v3.0.0-20200313102051-9f266ea9e77c.mod"
}

Since we do not update transitive dependencies for go modules, I think we should update the FileParser to omit them entirely, and also remove the go get "safeguard".

Alternatively we could parse the go.sum file for all listed dependencies with a native helper, something along the lines of this will do the trick:

package parser

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io/ioutil"
	"path/filepath"
	"strings"

	"golang.org/x/mod/module"
)

type result struct {
	Packages []module.Version
}

func ParseSumfile() (interface{}, error) {
	data, err := ioutil.ReadFile("go.sum")
	if err != nil {
		return nil, err
	}

	file, err := filepath.Abs("go.sum")
	if err != nil {
		return nil, err
	}

	fmt.Println(file)

	m, err := readGoSum(file, data)

	res := result{Packages: m}

	p, err := json.Marshal(res)
	if err != nil {
		return nil, err
	}

	return string(p), nil
}

// emptyGoModHash is the hash of a 1-file tree containing a 0-length go.mod.
// A bug caused us to write these into go.sum files for non-modules.
// We detect and remove them.
const emptyGoModHash = "h1:G7mAYYxgmS0lVkHyy2hEOLQCFB0DlQFTMLWggykrydY="

// readGoSum parses data, which is the content of go.sum and returns the parsed module versions
func readGoSum(file string, data []byte) ([]module.Version, error) {
	dst := []module.Version{}

	lineno := 0
	for len(data) > 0 {
		var line []byte
		lineno++
		i := bytes.IndexByte(data, '\n')
		if i < 0 {
			line, data = data, nil
		} else {
			line, data = data[:i], data[i+1:]
		}
		f := strings.Fields(string(line))
		if len(f) == 0 {
			// blank line; skip it
			continue
		}
		if len(f) != 3 {
			return dst, fmt.Errorf("malformed go.sum:\n%s:%d: wrong number of fields %v", file, lineno, len(f))
		}
		if f[2] == emptyGoModHash {
			// Old bug; drop it.
			continue
		}
		mod := module.Version{Path: f[0], Version: f[1]}
		dst = append(dst, mod)
	}
	return dst, nil
}

But I would suggest we cross that bridge when we do decide to start updating transitive dependencies.

cc @fatih wdyt?

@fatih
Copy link
Contributor

fatih commented Dec 18, 2020

@jurre the current approach looks good for me if we don't want to return the transitive dependencies. It makes things easier in terms of readability as we only take care of listing the direct dependencies. As a bonus, we can later add a transitive_dependencies function that would return those if we're interested in those, which would make the code more readable in the future.

@hmarr
Copy link
Contributor

hmarr commented Dec 18, 2020

I'd rather get rid of all transitive dependencies than parse the go.sum. Particularly as go.sum files often contain old dependencies (if go mod tidy isn't run) and sometimes are missing altogether. Even when we do add support for transitive dependencies, I'd rather rely exclusively on the go.mod.

@jurre
Copy link
Member Author

jurre commented Dec 18, 2020

I'd rather get rid of all transitive dependencies than parse the go.sum. Particularly as go.sum files often contain old dependencies (if go mod tidy isn't run) and sometimes are missing altogether. Even when we do add support for transitive dependencies, I'd rather rely exclusively on the go.mod.

👍 I agree, that's what I've done in the latest commit FWIW, the Go code I posted was an alternative, but I agree it's probably problematic for the reasons you mentioned

@feelepxyz
Copy link
Contributor

@jurre if you rebase/force push this branch ci should kick in, accidentally turned off actions while testing their repo select UI last week 🤦

@jurre jurre force-pushed the jurre-fatih/stop-parsing-unrequired-indirect-go-deps branch from e39a7a6 to 7265bd9 Compare December 21, 2020 11:33
@jurre jurre changed the title Stop parsing unrequired indirect go packages gomod: Do not attempt to parse transitive dependencies Dec 21, 2020
@jurre
Copy link
Member Author

jurre commented Dec 21, 2020

@thepwagner I would love to get your thoughts on the trade-offs made in this PR. We're currently still running go get in the FileParser to verify the packages exist, but I wonder if it makes more sense to let the FileUpdater worry about this

jurre and others added 3 commits December 22, 2020 11:43
Previously we would run `go list -m -json all` to generate a list of
dependencies to attempt to update.

The way that this is implemented, it causes _every_ transitive
dependency to be listed, even if they are not required by this project.

This can happen when a module defines several packages, but only a
subset of those packages is used by the current repository.

It will also verify that each of those transitive dependencies is
available, and this causes issues when we rely on a dependency with
private transitive dependencies. We run into the scenario where we have
access to any of the transitive dependencies that are used, but since we
do not have access to an _unused_ transitive dependency, we run into a
`dependency_file_not_resolvable` error.

This is especially unfortunate, since we won't attempt to update _any_
indirect dependencies for go_modules anyway.

By using `go mod edit -json` instead, we get an easier to parse list of
packages that are _actually_ used in this repo.

In order to replicate the previous behavior of raising a
`DependencyFileNotResolvable` error when some of the packages are not
available, we now also run a `go get`, but it's debatable if we even
need to do this, since this error will be raised in subsequent steps (in
the `FileUpdater`), and not raising it here means that we can
consolidate our error handling for this scenario in one place.

Co-authored-by: Fatih Arslan <fatih@github.com>
Also cleans up any error handling that can only occur from running `go
get`, or any other command that actually downloads the modules.
@jurre jurre force-pushed the jurre-fatih/stop-parsing-unrequired-indirect-go-deps branch from 7265bd9 to 18d48b4 Compare December 22, 2020 10:56
Copy link
Contributor

@thepwagner thepwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dig the FileParser's reduction in scope to literally parsing the file.
We could probably do that without affecting transitive dependency support, but per discussion transitive dependencies will take out-of-scope follow-ups no matter what. This LGTM for what's supported today.

dependency = dependency_from_details(dep)
dependency_set << dependency if dependency
required_packages.each do |dep|
dependency_set << dependency_from_details(dep) unless dep["Indirect"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is skipping Indirect dependencies a requirement of this change?

I get that they're filtered out later, but is that behaviour correct?

I don't think we should scope creep into supporting indirect dependencies in this PR - more surprised that we're not currently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is skipping Indirect dependencies a requirement of this change?

It's more of a consequence, really. go list -m only includes transitive dependencies that are included in the go.mod file, which apparently are only transitive dependencies that aren't go modules.

We filter indirect dependencies out much more explicitly in the UpdateChecker though, and seems to be a deliberate limitation of dependabot currently.

go_modules/lib/dependabot/go_modules/file_parser.rb Outdated Show resolved Hide resolved
@jurre jurre merged commit 2ad98dd into main Dec 22, 2020
@jurre jurre deleted the jurre-fatih/stop-parsing-unrequired-indirect-go-deps branch December 22, 2020 16:24
@thepwagner thepwagner mentioned this pull request Jan 4, 2021
thepwagner added a commit that referenced this pull request Jan 5, 2021
Since #2880 , we no
longer detect unavailable git dependencies in the `FileParser`.

We didn't add the functionality to catch them downstream: in the
`UpdateChecker` and `FileUpdater`.
This is the former.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants