Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

Vendor pruning #120

Closed
sdboyer opened this issue Jan 23, 2017 · 40 comments
Closed

Vendor pruning #120

sdboyer opened this issue Jan 23, 2017 · 40 comments

Comments

@sdboyer
Copy link
Member

sdboyer commented Jan 23, 2017

NB: keep in mind that the we see vendor/ as a temporary solution, and expect to eventually move to some newer, better version of GOPATH that largely obviates the need for it.

Currently, dep naively copies dependent repositories into vendor/ in their entirety. This may produce a vendor/ directory with a lot of unnecessary files, and many projects relying on existing community tools use some technique to prune out unneeded files.

It is unlikely that we could develop a pruning solution that would be as aggressive as every user wants, while still being safe for all other users. However, we would like to establish baseline pruning behavior that is safe for everyone, so that dep can perform it unconditionally, or with a minimum of necessary user input. If nothing else, this has performance benefits: if dep does the pruning, then it's possible for us to implement optimizations to simply avoid copying files that would later be removed.

There are a few, not fully orthogonal, classes of pruning we might consider:

  • Removing nested vendor/ directories. This is required for correct builds, and dep will always do it.
  • Removing by build tag. Files with a particular build (or os/arch) tag could be pruned, if the user explicitly requests it in the manifest. (Note that this would not affect whether they show up in lock.json - only whether they're pruned from vendor.)
  • Removing unused packages. dep keeps track of the individual packages that are necessary to build (either imported by code, or required in the manifest), and could selectively remove those that are not needed. There are sub-considerations here:
    • Should packages explicitly ignored in the manifest be pruning or preserved?
    • Should packages/directories be preserved if they contain non-.go files? What types of files?
  • Removing unnecessary files. Similar to the previous, should certain types of files be pruned out of vendored dependencies?

There's some debate as to whether removing files may be a violation of some software licenses. Getting clarity on this question would be ideal; if we can't, then we'll likely have to make additional pruning behaviors optional in order to avoid having dep blacklisted by legal departments.

Finally, pruning is intimately linked to how we approach the problem of vendor verification (#121).

Note that the logic for writing out the dependency tree lives in gps, and is likely to remain there. This issue is to discuss what the behavior and options should be; implementation will happen over there.

@kardianos
Copy link

keep in mind that the committee sees vendor/ as a temporary solution, and fully expects to eventually move to some newer, better version of GOPATH that largely obviates the need for vendor/.

This is news to to me. How do you plan to support checking in deps for those who want this?

@kardianos
Copy link

There are a few, not fully orthogonal, classes of stripping we might consider:

In govendor a popular approach has been to specify build tags to ignore. If you don't have any ignore build tags then you can keep the repo as is for those who wants that. It also serves to ignore appengine special files or other unsupported platforms.

@sdboyer
Copy link
Member Author

sdboyer commented Jan 23, 2017

keep in mind that the committee sees vendor/ as a temporary solution, and fully expects to eventually move to some newer, better version of GOPATH that largely obviates the need for vendor/.

This is news to to me.

Yeah, this is a newish thing. It's something we discussed within the committee, but I'm making it explicit here after discussions with @rsc .

How do you plan to support checking in deps for those who want this?

I have another draft issue I'm working on where we can explore that. Hopefully I'll get it up today.
Let's discuss specifics like this over there.

@sdboyer
Copy link
Member Author

sdboyer commented Jan 24, 2017

In govendor a popular approach has been to specify build tags to ignore. If you don't have any ignore build tags then you can keep the repo as is for those who wants that. It also serves to ignore appengine special files or other unsupported platforms.

Yeah, that makes a ton of sense, and I think it's complementary to the other approaches. I'm now reminded that @freeformz has definitely brought this up in past discussions, too.

Updating the original issue to include this...

@sdboyer sdboyer changed the title Vendor stripping Vendor pruning Jan 24, 2017
@olekukonko
Copy link

olekukonko commented Jan 25, 2017

Removing by build tag. Files with a particular build (or os/arch) Will this not affect cross compilation?

@jessfraz
Copy link
Contributor

It would, I don't think that level (os/arch tags) of pruning should be on by default, but pruning of unused packages should.

@olekukonko
Copy link

@jessfraz I couldn’t agree more ...

@sdboyer
Copy link
Member Author

sdboyer commented Jan 25, 2017

@olekukonko for sure, and that's what that's intended to mean - if we add such a feature, then the user could choose to enable pruning out dependencies entailed solely by a particular build tag.

To be clear, though, this would only affect the deps that actually get written out to vendor/. It would not have any bearing on the way that dependency selection works - the plan there remains to always consider all os/arch/release/build tags, and to record the resultant set in lock.json.

@krisnova
Copy link
Contributor

krisnova commented Jan 25, 2017

So speaking of pruning.. 1 concern that I hit while trying to vendor kops out as a library (I know I am crazy.. don't ask..)

The project used go-bindata to generate a file that was clobbered by vendoring... It was a fairly big deal that the file was in the vendor directory to use the tool as a library..

Not advocating that this is necessarily a good idea/design by any means.. Mostly sharing a real experience from the wild that might come up later.. especially if we start pulling files out of vendor/

Thoughts?

@sdboyer
Copy link
Member Author

sdboyer commented Jan 25, 2017

@kris-nova oh, this is a great question! Though...i actually can't quite picture the exact workflow you're describing with this:

The project used go-bindata to generate a file that was clobbered by vendoring... It was a fairly big deal that the file was in the vendor directory to use the tool as a library..

i think the answer depends considerably on the details. Could you be a little more specific about how this worked, what "the project" refers to (your attempt to vendor kops, or some dep that kops had with this weird requirement), where these generated files were, etc.?

@krisnova
Copy link
Contributor

Ah sorry, so here are the steps that I observed:

Basically there was a .go file that contained the bin data that was generated at compile time. It lived in the kops source tree. The file obviously never made it into kops version control.

When using kops as a dependency, we could generate the file and it would live appropriately in vendor/k8s.io/kops/some/path/to/bin.go. The dependency then behaved as expected. My concern was around that file getting removed if we run a prune (or any dep command for that matter) because of the note above:

Removing unnecessary files. Similar to the previous, should certain types of files be pruned out of vendored dependencies?

An remembered my experience with losing the file.. so I figured I would call it out..

..on second glance it looks like there is discussion around manifests that could prevent this happening..

either imported by code, or required in the manifest

so this point very well could be moot 😄

@Dr-Terrible
Copy link

Dr-Terrible commented Jan 25, 2017

Removing unnecessary files. Similar to the previous, should certain types of files be pruned out of vendored dependencies?

Speaking of pruning unnecessary files, a while ago I was hit by annoying issues with dependency managers (specifically GB) that blindly trim everything except *.go files: there is a high change you wipe out files used by pkg/text/template.ParseFiles()¹ just because they don't have a *.go extension, thus breaking compilation or run-time execution due to missing files.

So far, I encountered only an handful of packages that were affected by the strict trimming, therefor it's not really a high priority issue and I solved it by manually copying the missing files when necessary. The real problem here isn't the manual copy per se, but the fact that a strict trimming could break automatic deployment pipelines too — such as Heroku's tools and a-like. You can't fix an automatic deployment pipeline by manually copy the missing files (it's an oxymoron), especially if you don't have any control over the library/package that breaks.

It would be nice for golang/dep to not fall into the "trim everything except *.go files" trap.

There's some debate as to whether removing files may be a violation of some software licenses. Getting clarity on this question would be ideal; if we can't, then we'll likely have to make additional pruning behaviors optional in order to avoid having dep blacklisted by legal departments.

Pruning unnecessary files is a nice idea, in theory. Especially if you want to guarantee reproducible compilations (which I care a lot about), so ideally an optional feature is the only viable solution at the moment — at least until there is a reasonable heuristic about what can be safely pruned and what not. When the pruning breaks stuff, let the user disable the pruning (doesn't matter if by CLI or by editing a conf file, provided the behaviour is documented).

¹- template.ParseFiles() is just an example, there is plenty of Go APIs where an unforgiving pruning can break stuff;

@kardianos
Copy link

@Dr-Terrible govendor normally only copies *.go files and README / LICENSE style files.

However you can also specify grab everything in the repo. I think normal trimming is important. I also think being able to grab everything is an important out. Concrete uses are C interop and out of file resources (html for a web site).

@pdalpra
Copy link

pdalpra commented Jan 27, 2017

IMHO, I think that what LK4D4/vndr cleans up is good enough, especially the pruning of _test.go files.

@erizocosmico
Copy link
Contributor

erizocosmico commented Jan 27, 2017

Maybe the manifest could have an option for telling what prunning policy do you want and by default just copies *.go and legal/license stuff. Since dependencies can have manifests themselves, they can take care of that. In the end, it would be the responsibility of the dependency's developers to ensure it can be correctly vendored.

TL;DR: IMHO, copying README/LICENSE/etc and all files that can be go or cgo source code (excluding tests) it's what I think should be on by default. Any other (reasonable) case should be possible to handle but in an opt-in way.

@sdboyer
Copy link
Member Author

sdboyer commented Jan 28, 2017

Maybe the manifest could have an option for telling what prunning policy do you want and by default just copies *.go and legal/license stuff. Since dependencies can have manifests themselves, they can take care of that. In the end, it would be the responsibility of the dependency's developers to ensure it can be correctly vendored.

I'm pretty certain it's the right thing to do to have the pruning policy set in the current project's manifest; otherwise, there's no way of knowing which subset of files to hash (assuming we do a subset). Plus, teams that commit vendor/ would be constantly annoyed at superfluous diffs in their vendor/ resulting from different people setting different pruning policies on the CLI.

I'm less sure about having each dependent project's manifest specify its own policy, though. For one, when I set the policy for my own project, my needs may be different than my dependees - different os/arch, and they probably don't need my tests. So, it might be useful to bubble policy suggestions up from deps' manifests to help the user set a policy in their own manifest, but I think directly reusing them might not work out well. It depends a lot on what constitutes a "policy."

(BTW, side note: the distinction between the powers afforded to the current/root project's manifest, vs. a dependent project's manifest, is encoded in gps.RootManifest vs. gps.Manifest).

@dlsniper
Copy link

dlsniper commented Feb 8, 2017

Hi,

First of all, I would like to start by saying that I'm baffled by this note:

NB: keep in mind that the we see vendor/ as a temporary solution, and expect to eventually move to some newer, better version of GOPATH that largely obviates the need for it.

We don't have a tool that works good, out of the box with the current solution but we already deprecate the one that we are building... It's unfortunate and confusing to say the least.

Now, back on track to my issue. I've described this in Slack but here's the description of the issue again.

My colleague was using one of the popular vendoring tools. That said tool decided stripped out unused packages / files from the packages. Now that my colleague needed to use the said part of the missing dependency he was: "but why do I need to update my dependencies? I've already said it to vendor this package and all packages it contains".

From what I can see the upcoming tool will continue with this bad UX. Why is it a bad UX? Because people really expect a thing from a package manager that's very basic: copy the package contents from remote to vendor/ by default. If they either it needs a flag or reruns to have this functionality then the issue is revealed.

As such, I would highly vote/recommend/voice/shout/add opinion/etc. that the Package Manager should not do anything smart unless so instructed by the user. The user has to be in control on how this works because then the user can replicate this in any environment the tool will be used, be it a team of one or or a team of 500.

In ending of this, I would like to share a talk which made me rethink the way I approach problems when it comes to UX: https://www.youtube.com/watch?v=5tg1ONG18H8 I hope it will be of help as well.

@sdboyer
Copy link
Member Author

sdboyer commented Feb 8, 2017

From what I can see the upcoming tool will continue with this bad UX.

The entire purpose of this issue is to discuss what is safe to do, starting with some structured possible suggestions. If your perception of the proposal is that it's already decided we're doing all these things, then I'd encourage you to reread the proposal. In particular, this line:

It is unlikely that we could develop a pruning solution that would be as aggressive as every user wants, while still being safe for all other users.

Maybe that latter half isn't explicit enough: my own guideline is that safety comes first.

Personally, I'm becoming less and less inclined towards pruning at all because of the annoyances it creates for verification.

Because people really expect a thing from a package manager that's very basic: copy the package contents from remote to vendor/ by default.

I'm not sure that's the case (but I am sure that's an assertion absent evidence). For other package managers (in other langs) that include, say, build/codegen steps, the expectations are more complex than "just copying."

The user has to be in control on how this works because then the user can replicate this in any environment the tool will be used, be it a team of one or or a team of 500.

I addressed this in my comment immediately before yours: #120 (comment)

We don't have a tool that works good, out of the box with the current solution but we already deprecate the one that we are building... It's unfortunate and confusing to say the least.

vendor is not a tool. It's an implementation detail that is used by tools to achieve desired outcomes.

Also, I find your objection here surprising, given that the impulse to minimize the contents of vendor is basically the sole reason anyone cares about pruning in the first place.

@gyuho
Copy link

gyuho commented Mar 4, 2017

Could dep support features in https://github.com/sgotti/glide-vc? Removing nested vendors, stripping test, example files with sub-dependencies are necessary for large projects.

e.g.

glide update --strip-vendor
glide vc --only-code --no-tests

https://github.com/coreos/etcd/blob/master/scripts/updatedep.sh

We are evaluating dep as an alternative to glide, but dep does not seem to strip out anything, so /vendor directory size becomes too large.

Is there any large project using dep successfully that we can learn from?

@dongsupark
Copy link

As @gyuho mentioned, I also think that "dep" needs to have a way of cleaning up test files as well as unnecessary license files. I suppose, anyone who has touched large vendor trees would probably share the same concerns. Although the "dep prune" command already removes unused packages, it doesn't seem to clean up tests and license files. So I'm thinking about adding new options for the clean-ups.

From my understanding, however, "dep prune" could be split out into a separate tool in the future. In that case, it will be better to wait until the new prune repo could be created, right?

What's a better approach? Directly add those options to the current form of "dep prune", or wait until the new prune tool could be created?
/cc @tomwilkie

@tomwilkie
Copy link
Contributor

I also think that "dep" needs to have a way of cleaning up test files

+1, and other non-go files.

as well as unnecessary license files.

hmmm, I don't know about this. ~Most licenses (ie Apache 2.0, which covers lots of my world right now) need you to redistribute the license, and I wouldn't want to try and guess which licenses are safe to remove TBH.

From my understanding, however, "dep prune" could be split out into a separate tool in the future. In that case, it will be better to wait until the new prune repo could be created, right?

@sdboyer whats you're current thinking on this? I'm personally keen to see this stay as part of the dep tool, as for the time being I'm always going to be committing my dependancies - I've had too many repos disappear on me.

I'd like to see it rolled into dep ensure (as a flag, and even as a committed config file param or something) as I've already had an occurrence of someone forgetting to run prune and committing many millions of LOC of dependancies.

@sdboyer
Copy link
Member Author

sdboyer commented May 4, 2017

@sdboyer whats you're current thinking on this? I'm personally keen to see this stay as part of the dep tool, as for the time being I'm always going to be committing my dependancies - I've had too many repos disappear on me.

Ugh. I'm still pretty torn between the different guarantees I'd like us to be able to provide. But this really can't/oughtn't wait much longer, so lemme try to take a stab. Here's me, thinking through it in bullets:

  1. Different projects will have different pruning needs. We can't expect that one thing will be right for all projects. We can offer a set of pruning flags that the user can toggle individually, and will be applied automatically by dep ensure when writing out vendor/.
    a. These should probably just be flags - allowing e.g. arbitrary pattern matching is probably something we would ultimately regret.
    b. The only non-optional pruning rule is the removal of nested vendor/ dirs.
  2. Pruning rules are a property for the manifest, not a CLI argument. This guarantees consistency across individual runs, and members of teams.
  3. Long-term sanity in vendor/ verification #121 probably demands that we have a single way of computing verification hashes. That means hashes will only be recorded for the flag-less case (all files, excluding vendor/).

I think that basic approach should more or less satisfy these competing requirements. It should allow us to have verifiable vendor dirs, while still having automatic pruning within dep ensure.

The catch will be that it won't be possible to verify a more-than-default pruned vendor/ without having the original source repositories present (possibly a problem for e.g. a network-isolated build server). But, I think that's a price we can afford to pay.

I realize this sketch is basically me drawing dots and leaving you to connect them - sorry 😄 i don't have time for a more detailed writeup just now. But it's a tentative "yes" to making pruning work as an automated part of ensure.

@tomwilkie
Copy link
Contributor

Thanks for the info @sdboyer; I'll go and read #121 and start a new issue to discuss plans.

@kardianos
Copy link

@sdboyer You may be interested in govendor smarter tag ignoring logic:

https://github.com/kardianos/govendor/blob/master/context/tags_test.go
https://github.com/kardianos/govendor/blob/master/context/tags.go

I could contribute to dep if you'd like.

@sdboyer
Copy link
Member Author

sdboyer commented May 11, 2017

@kardianos oh oh oh interesting! yesyes, that work looks like the shape of what we'll ultimately need for #291. i'd be absolutely thrilled if we could incorporate what you've already done. that area is a bit particular, and i've not fully absorbed what you've written there, but this:

// However in govendor all questions are reversed. Rather then asking
// "What should be built?" we ask "What should be ingored?".

strongly suggests that we're in the right ballpark.

@VojtechVitek
Copy link

@sdboyer

  1. Pruning rules are a property for the manifest, not a CLI argument. This guarantees consistency across individual runs, and members of teams.

Does this mean we'll have the default CLI arguments only for the init command and then the settings would live in the Gopkg.toml file?

$ dep init --prune-flags=...

$ cat Gopkg.toml
[[ settings.prune ]]
  tests = true
  unused_pkgs = true
  unnecessary_files = true
  build_tag = ""

Or what's your thinking here?

@VojtechVitek
Copy link

VojtechVitek commented Jun 6, 2017

For anyone interested, I'm using the following .gitignore rules to filter out unnecessary files:

# .gitignore

# Ignore everything in vendor/, except for *.go files,
# LICENSE and COPYING. Ignore Go tests.
vendor/**/*
!vendor/**/
!vendor/**/*.go
!vendor/**/LICENSE*
!vendor/**/COPYING*
vendor/**/*_test.go

@sdboyer
Copy link
Member Author

sdboyer commented Jun 6, 2017

Does this mean we'll have the default CLI arguments only for the init command and then the settings would live in the Gopkg.toml file?

I don't really see a need for flags at all - why add the complexity to init? Just manually add settings to Gopkg.toml (definitely something along the lines of your example), and subsequent dep ensure runs will apply it.

@VojtechVitek
Copy link

@sdboyer sounds good, as long as dep init writes the default values to the Gopkg.toml file, so I can tweak the settings right away without having to dig deep into golang/dep documentation.

@BernhardValenti
Copy link

@VojtechVitek should those settings for Gopkg.toml be working with a dep from current master?

@sdboyer
Copy link
Member Author

sdboyer commented Jul 18, 2017

@wedgeV sadly no, those are just a proposal. we still need someone to implement the guts, here.

@sdboyer
Copy link
Member Author

sdboyer commented Aug 4, 2017

Closed in favor of #944

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests