Skip to content
This repository has been archived by the owner on Feb 3, 2018. It is now read-only.

gps for Implementors

sam boyer edited this page May 16, 2017 · 23 revisions

NOTE - these docs are out of date. They're still mostly accurate at a high-level, but details have been rearranged, and function signatures changed. The diagrams and general concepts sections have probably aged the best.

gps is a library for folks interested in fully solving the package management problem. It's designed to provide a simple API that does the hard work for you, letting you focus on your particular tool's desired workflows and UX instead of the writhing complexities of package management. Specifically, gps answers this question for a tool:

Given a tree of Go code, the dependencies it transitively imports, and some constraints on which versions of those imported packages are acceptable, what versions of those dependencies should be used?

There's a lot there, and much of it turns on what "should" means. This documentation is written with the goal of explaining how to create tools using gps that let you control how this question gets answered.

Note that it might be worth skimming the general introduction, if you haven't already.


Table of Contents

Minimum Viable Implementation

The absolute minimum required to get gps running looks something like this:

package main

// Derived from https://github.com/sdboyer/gps/blob/master/example.go (which compiles!)

import (
	"go/build"
	"os"
	"path/filepath"
	"strings"

	"github.com/sdboyer/gps"
)

func main() {
	// Assume the current directory is correctly placed on a GOPATH, and that it's the
    // root of the project.
	root, _ := os.Getwd()
	srcprefix := filepath.Join(build.Default.GOPATH, "src") + string(filepath.Separator)
	importroot := filepath.ToSlash(strings.TrimPrefix(root, srcprefix))

	params := gps.SolveParameters{
		RootDir:     root,
	}
    // Perform static analysis on the current project to find all of its imports.
	params.RootPackageTree, _ = gps.ListPackages(root, importroot)

	// Set up a SourceManager. This manages interaction with sources (repositories).
	sourcemgr, _ := gps.NewSourceManager(NaiveAnalyzer{}, ".repocache")
	defer sourcemgr.Release()

	solver, _ := gps.Prepare(params, sourcemgr)
	solution, err := solver.Solve()
	if err == nil {
		// If no failure, blow away the vendor dir and write a new one out,
		// stripping nested vendor directories as we go.
		os.RemoveAll(filepath.Join(root, "vendor"))
		gps.WriteDepTree(filepath.Join(root, "vendor"), solution, sourcemgr, true)
	}
}

type NaiveAnalyzer struct{}

// DeriveManifestAndLock is called when the solver needs manifest/lock data
// for a particular dependency project (identified by the gps.ProjectRoot
// parameter) at a particular version. That version will be checked out in a
// directory rooted at path.
func (a NaiveAnalyzer) DeriveManifestAndLock(path string, n gps.ProjectRoot) (gps.Manifest, gps.Lock, error) {
	return nil, nil, nil
}

// Reports the name and version of the analyzer. This is used internally as part
// of gps' hashing memoization scheme.
func (a NaiveAnalyzer) Info() (name string, version *semver.Version) {
	v, _ := semver.NewVersion("v0.0.1")
	return "example-analyzer", v
}

35 LoC! Hardly a robust or featureful implementation, but not too shabby for a working dependency manager that does the dep-fetching parts of what go get does (modulo arguments).

To explain how to implement gps in a somewhat more realistic tool, we're going to break this example down, building up from the simpler parts into the more complex. First up: SolveParameters.

Solver Inputs: SolveParameters

The SolveParameters struct holds all the arguments and options your tool provides to gps. The properties here determine both what data a solver will operate on, as well as control over some of its internal solving behaviors.

However, the only two required properties are the ones we see here:

root, _ := os.Getwd()
params := gps.SolveParameters{
	RootDir: root,
}
params.RootPackageTree, _ = gps.ListPackages(root, importroot)
	
solver, _ := gps.Prepare(params, sourcemgr)

RootDir is the root filesystem path of the project on which you want gps to operate, so os.Getwd() makes at least some sense for a naive case. It's probably wise, though not currently required, that your tool ensure that RootDir points to a directory under an active $GOPATH.

RootPackageTree is a PackageTree describing the root project. ListPackages() creates this object by parsing import statements from the tree of all packages contained within root project. RootPackageTree also has an ImportRoot property, of type ProjectRoot. ProjectRoots are just a type alias for string, and ListPackages() sets its second parameter as the ImportRoot for the PackageTree it returns.

The ProjectRoot type exists, despite being only a type alias for strings, because project roots are a crucial concept. gps relies heavily on the idea of "projects" - trees of packages, all of which are to be covered by a single vendor/ directory. A ProjectRoot is the import path at the root of that tree.

So, if we were to stop using os.Getwd() and explicitly set RootDir, our SolveParameters be composed like this:

params := gps.SolveParameters{
	RootDir: "/home/sdboyer/go/src/github.com/sdboyer/example-project",
}
params.RootPackageTree, _ = gps.ListPackages(params.RootDir, "github.com/sdboyer/example-project")

solver, _ := gps.Prepare(params, sourcemgr)

Almost everywhere that ProjectRoot is used in gps, it must correspond not only to the root of a project, but also to the root of a repository. This restriction may be relaxed in the future, but we use it for now; assuming that project root == repository root makes many problems much easier.

The only exception to this rule is the present situation: when declaring the project for which we're solving, it's OK for it not to be at a repository root. But projects are only consumable as dependencies if they're at the repository root, so this exception is really only applicable for company monorepos or other non-public cases where you can guarantee no one will ever need to import your project.

Manifests and Locks

Now, the fun stuff!

gps aims to make as few assumptions as possible, but package management is hard, and some assumptions are unavoidable. One of gps' primary assumptions is that there are two types of metadata that describe a project/package tree:

  • Manifests are primarily for expressing version constraints on dependencies. They do so by providing lists of ProjectConstraints. Manifests are the single most important input to a gps solver.
  • RootManifests compose Manifest, but provide some additional information - overrides and ignores. This reflects the special privileges afforded to the root project.
  • Locks describe an exact, reproducible build. Locks are the Solution that a gps solver returns - its outputs (Solution composes Lock). However, locks can also act as supplemental inputs.

While gps does define interfaces that a tool must implement, it does not care where a tool gets the information from. That said, it's probably a good idea to represent these as files contained within the project tree; for example, glide's manifest is glide.yaml, and its lock is glide.lock; its Config and Lock types, which handle those files, implement gps' Manifest and Lock, respectively. (Or they will once gps is merged in)

RootManifest data

In the example, we included neither a RootManifest nor a Lock. This works - gps will simply treat every external import of the root project as having no version constraint - but for a real tool, not providing at least a RootManifest sorta misses the point of using gps.

Let's say the project we're operating on imports github.com/foo/bar, and we only want to accept the master branch for it. This manifest expresses that requirement:

m := gps.SimpleManifest{
	Deps: []gps.ProjectConstraint{
		{
			Ident: gps.ProjectIdentifier{
				// The project root to which the constraint applies
				ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
			},
			// The constraint itself - a branch named master
			Constraint: gps.NewBranch("master"),
		},
	},
}

Constraints themselves are mostly what meets the eye - they only allow versions of a certain type through. There's a little more to ProjectIdentifiers, though. They're another layer atop of ProjectRoots.

ProjectIdentifier

ProjectIdentifiers allow you to designate an alternate network location from which a given root import path should be sourced:

gps.ProjectIdentifier{
	ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
	NetworkName: "github.com/sdboyer/bar",
}

This tells gps that we want to fulfill import requiremnts from the tree of github.com/foo/bar by sourcing it from (presumably) a fork. You can also specify the exact URL:

gps.ProjectIdentifier{
	ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
	NetworkName: "git@github.com:sdboyer/bar",
}

If NetworkName is not specified, then it is inferred from ProjectRoot. Thus, these are equivalent:

gps.ProjectIdentifier{
	ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
}
gps.ProjectIdentifier{
	ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
	NetworkName: "github.com/foo/bar", // specifying this is unnecessary and redundant
}

Manifests also allow you to express test-only dependency constraints. The mechanics are essentially the same - build up a []ProjectConstraint - but test imports and constraints are only incorporated if a flag explicitly indicates they should be. For now, that flag is not accessible to tools; gps hardcodes it to only be enabled for the root project's test dependencies. (That'll make more sense when we get to ProjectAnalyzer a bit later.)

Ignoring Packages

One of gps's goals is that solving should produce a solution that not only meets version constraints, but can actually compile. To that end, gps looks to static analysis of the source to determine what all packages have to be present in order for a build to be successful. In the future, gps will also perform type compatibility analysis between importers and importees, and reject solutions that fail that check.

For the most part, this works well out of the box. gps knows how to ignore stdlib import paths, the old unqualified appengine paths, and some other common ones. But if your user's project has, either locally or transitively, import paths that gps needs to ignore, then RootManifest.IgnorePackages() is how you tell gps to do so. They're reported through this method on the expectation that most tools will want to provide the user a facility for defining ignores as part of their manifest, so grouping them together makes sense.

Do note that it's a method on RootManifest, not Manifest. Defining ignores are one of the special privileges given only to the root project.

When you tell gps to ignore an import path, it's important to remember that gps not only ignores the package at that import path, but it also ignores any new import paths introduced in that package. For example, let's say we have the following import graph: package root imports packages A and B, etc.:

If the solver were operating on this import graph and were told to ignore B, we might visualize it as so:

The import link from Root to B is ignored when looking at Root's import paths. B never makes it into the import graph, and as a result, D is never discovered at all: no version constraint checks are made against it, and the returned Solution (which we'll get to soon) will not include it. C, however, is still considered as normal, because it is reachable through A.

Now, gps could add a different type of 'ignore' mechanism that allows D to be discovered through B, but skips checks on B and omits it from the solution. A use case will have to present itself, though, and the mechanism it will be given a different name; ignores will continue to operate as-is.

Overrides

Overrides, like ignores, are a special privilege of the root project, and are reported through RootManifest.Overrides(). They allow the root to enforce that a ProjectRoot will always have a particular Constraint and/or NetworkName, superceding any ProjectConstraint reported from any dependency (or root's own) Manifest.GetDependencyConstraints().

Put another way, overrides make it impossible for certain types of conflicts to occur.

For example, imagine a simple depgraph:

Now, say that A wants to source C from Cfork instead. That's all fine for A when acting on its own, but for this project it's a problem, because B still wants C to come from C, not Cfork:

This disagreement between A and B on where C should be sourced from results in a conflict. With overrides, though, the root project has the power to step in and decree that Cfork should be used:

Of course, the RootManifest could also mandate that regular C be used, or some entirely different source.

The same idea applies with version constraints - if the RootManifest specifies a Constraint override for a given ProjectRoot, then that will be swapped in for all dependencies on that ProjectRoot.

It's worth noting that all other version constraint interactions involving narrowing what's acceptable, by computing constraint intersection. Overrides are the only way a constraint can widen beyond what any individual dependency demands.

Overrides are a powerful feature, and are tremendously useful for asserting control over an unruly depgraph and ecosystem. But they must be used with care. Because they are a special privilege of the root project, any "fixes" made via a project's overrides will have no effect if that project is pulled in as something else's dependency. Overuse of overrides has the potential to create an "arms race" in the ecosystem: the more people use overrides, the more other people have to use overrides in order to achieve a sane build.

Note that overrides are expressed a bit differently from normal constraints . Instead of a []ProjectConstraint, RootManifest.Overrides() must return a ProjectConstraints. They hold the same information, but are arranged in a map instead (though normal constraints are likely to become like overrides).

Lock data

While Locks are solver outputs, they do play some role as solver inputs, as well. When a root lock is present, the solver will attempt to preserve the exact versions specified in the lock. This is the main strategy gps relies on to minimize changes in the depgraph.

Let's say that we'd already solved for github.com/foo/bar, and thus had some Lock information available. Real implementations would likely have their own type, but we can use gps.SimpleLock for now:

// the root import path, again
rootimport := gps.ProjectRoot("github.com/foo/bar")
// the revision of it that we want
version := gps.Revision("af281525a8a371ca6929f63c88e569c1c62137ed")
// the network name (here, we're getting it from a fork)
url := "github.com/sdboyer/bar"
// a list of contained packages that are actually imported
pkgs := []string{"github.com/foo/bar"}


l := gps.SimpleLock{
	{
		gps.NewLockedProject(rootimport, version, url, pkgs)
	}
}

Note: LockedProject instances must be created by NewLockedProject(). This is done to ensure data consistency: passing a nil Version causes a panic.

We can specify any type of version for a LockedProject:

// v1.0.0 is a valid semver version; this creates a semver version
semver := gps.NewVersion("v1.0.0")
// "some-ol-tag" is not valid semver; this creates a 'plain' version
plainver := gps.NewVersion("some-ol-tag")
// "master" branch version
branch := gps.NewBranch("master")
// revision
revision := gps.Revision("af281525a8a371ca6929f63c88e569c1c62137ed")

The preference here is strongly for Revisions - being immutable, they provide the greatest assurance of a reproducible build. For gps to really properly avoid changes, though, both revision and version are needed. Given our example input manifest, that would look like this:

pair := gps.NewBranch("master").Is("af281525a8a371ca6929f63c88e569c1c62137ed")

Finally, wrapping the manifest and lock up together, we create our SolveParameters, then prepare and execute a solver run:

manifest := gps.SimpleManifest{
	Deps: []gps.ProjectConstraint{
		{
			Ident: gps.ProjectIdentifier{
				ProjectRoot: gps.ProjectRoot("github.com/foo/bar"),
			},
			Constraint: gps.NewBranch("master"),
		},
	},
}

lock := gps.SimpleLock{
	gps.NewLockedProject(
		gps.ProjectRoot("github.com/foo/bar"),
		gps.NewBranch("master").Is("af281525a8a371ca6929f63c88e569c1c62137ed"),
		"github.com/sdboyer/bar",
		[]string{"github.com/foo/bar"},
	),
}

params := gps.SolveParameters{
	RootDir: "/home/sdboyer/go/src/github.com/sdboyer/example-project",
	ImportRoot: gps.ProjectRoot("github.com/sdboyer/example-project"),
	Manifest: manifest,
	Lock: lock,
}

solver, _ := gps.Prepare(params, sourcemgr)
solution, fail := solver.Solve()

Turns out, for this particular case, it is not possible that the solver could return a Solution where the version of github.com/foo/bar changed. The exact reason why is complicated, but even if the master branch of github.com/foo/bar has new commits, and even if our example-project's source was importing thirty new projects, either the solver would return a Solution with rev af28152, or solving would fail (or gps has a bug).

That's a pretty strong guarantee, but an important one for users of your tool: locked versions change only if there's absolutely no other choice.

Unless, of course, the user wants them to change. That's up next!

ToChange, ChangeAll and Downgrade

The ToChange, ChangeAll, and Downgrade properties work in tandem with root Lock data to determine how much, and what kind, of change should be allowed in solving.

As a general rule, if your tool has Lock data available for the root project, you should always include it in the SolveParameters. This is true even when your user wants to update - they've run something like <yourtool> update github.com/foo/bar. To fulfill that user intent, pass that information along via SolveParameters.ToChange:

// Imagine we're building on the previous params
params.ToChange = []gps.ProjectRoot{
	"github.com/foo/bar",
}

solver, _ := gps.Prepare(params, sourcemgr)
solution, fail := solver.Solve()

By putting the ProjectRoot that we want to unlock into the ToChange slice, we bypass the information from the lock and allow github.com/foo/bar to update to whatever its master branch points to at the moment we happened to check.

The other option is using the global setting:

params.ChangeAll = true

Solving here would have the same effect on our version of github.com/foo/bar, but is generally more appropriate to use when the user has issued a command equivalent to <tool> update --all or <tool> update (without any projects specified).

In most cases, setting ChangeAll to true has the same effect as passing a list of all ProjectRoots in the lock to ToChange. There are some subtle differences, however, and it's preferable - not to mention easier - to only use ChangeAll if the user has explicitly requested a global update.

One other note: ToChange and ChangeAll are named as they are, rather than ToUpgrade and UpgradeAll, because your tool can control the direction of change, up or down, with the Downgrade property. (It's Downgrade, rather than Upgrade, so that the zero value corresponds to the common case - upgrading.)

Keep in mind that, this only applies when dependencies are tagged with valid semver tags, and semver range constraints are applied - in such cases, gps will work from the bottom of the constrained range, rather than the top. The PHP community has found this capability useful in CI for ecosystem robustness; turning that flag on and quickly running tests can help users ensure their constraint ranges are honest.

Trace and Tracelogger

gps provides a tracing facility, which generates explanatory output as the solver moves through each significant step in the solving process. You can enable it (and pass the output straight to stdout) like this:

params.Trace = true
params.TraceLogger = log.New(os.Stdout, "", 0)

and gps will spew forth as it solves. TraceLogger takes a *log.Logger; you can redirect that output however you like.

As part of gps' goal of making dependency management as not-hellish as possible, we invest considerable effort in making this trace output informative. In the future, gps may provide a machine-friendly version of it, but for now, it's meant for human eyes, looking at a terminal. It looks something like this:

✓ select (root)
| ? attempt github.com/foo/bar with 1 pkgs; 1 versions to try
| | try github.com/foo/bar@1.0.0
| ✓ select github.com/foo/bar@1.0.0 w/1 pkgs
| | ? attempt bitbucket.com/hinkle/crinkle with 1 pkgs; 4 versions to try
| | | try bitbucket.com/hinkle/crinkle@1.0.3
| | | ✗ bitbucket.com/hinkle/crinkle@1.0.3 not allowed by constraint <=1.0.1:
| | | |   <=1.0.2 from (root)
| | | |   <=1.0.1 from github.com/foo/bar@1.0.0
| | | try bitbucket.com/hinkle/crinkle@1.0.2
| | | ✗ bitbucket.com/hinkle/crinkle@1.0.2 not allowed by constraint <=1.0.1:
| | | |   <=1.0.1 from github.com/foo/bar@1.0.0
| | | try bitbucket.com/hinkle/crinkle@1.0.1
| | ✓ select bitbucket.com/hinkle/crinkle@1.0.1 w/1 pkgs
| | | ? attempt bang with 1 pkgs; 1 versions to try
| | | | try github.com/quark/quiggle@1.0.0
| | | ✓ select github.com/quark/quiggle@1.0.0 w/1 pkgs
✓ found solution with 3 packages from 3 projects

In English: this trace describes a solve run that, after parsing the root project's code successfully, first looked for a version for github.com/foo/bar and found 1.0.0 acceptable; then it accepted bitbucket.org/hinkle/crinkle at 1.0.1 after rejecting 1.0.3 and 1.0.2 due to constraints from the root and github.com/foo/bar. Finally, github.com/quark/quiggle was attempted, and worked on the first version tried, 1.0.0.

Looking at traces for known solver inputs is probably the fastest way to get an intuitive handle for how gps solving works. gps includes a large test suite of different inputs with expected outputs. Running them with go test -v -short will include the trace output for each fixture. You can additionally isolate a specific fixture by passing its name, like so:

$ go test -v -short -gps.fix="simple dependency tree"`

Trace and Tracelogger are the last of the SolveParameters, so we're ready to move on to the next major part.

Preparing a Solver

With our SolveParameters in hand, we're ready to Prepare() a Solver for a run:

solver, err := gps.Prepare(params, sm)
solution, err := solver.Solve()

Prepare() validates the provided SolveParameters. If it returns without error, then solving can proceed, initiated by calling Solve(). Solve() takes no parameters, as the solver operates on the parameters you originally passed to Prepare().)

This is the main workhorse of gps, and where all the complexity lies, but its return values are pretty simple: either a Solution is returned, or a failure in the form of an error. But there are other things to explore first, so we'll come back to this later.

Solve.HashInputs()

There's another step that your tool may want to include prior to solving, assuming you have a Lock produced by a previous Solve() run handy: hashing the solver parameters.

lock := // some procedure for loading a previous run's lock data, aka solution

solver, _ := gps.Prepare(params, sm)
digest, _ := solver.HashInputs()
if !bytes.Equal(digest, lock.InputHash()) {
	solution, err := solver.Solve()
}

Every successfully prepared Solver can hash its inputs. The hash incorporates certain of the inputs from your SolveParameters, combined with the set of external packages imported by the root project.

If the old Lock's digest matches the one generated by the Solver, it guarantees that the solution already in the lock is a valid one for the solve run you're about to perform, possibly rendering the solve run unnecessary. Effectively, this is a long-term memoization technique, with the hash acting as the cache key.

This is very powerful, as it can allow tool to avoid a lot of unnecessary work. If nothing else, all hash inputs can be computed locally - no network activity required. And, thanks to Go's fast static analysis, it can be done quite quickly; on a decent SSD, even a project the size of Kubernetes generally takes only a few seconds.

All that said, it's crucial to understand the limits of the guarantee.

Most importantly, matching hash digests absolutely do not guarantee that solving will produce the same solution as what's described in the lock. It merely guarantees that it's possible the solver would reach the same solution. That means there are cases (e.g., user requesting updates) where your tool might want to proceed to solve anyway, even if the digests match - or not even bother checking the digests at all.

Also...well, it's not actually a guarantee. There are failure modes outside of gps's control that would make the lock's solution invalid. All of them have a decidedly left-pad-ish flavor.

  • If the old lock contains a reference to an upstream project that was [re]moved, then it is necessarily not a valid solution. Realistically, solving is likely to fail, though it's possible it would find a solution without the missing project.
  • If the old lock pins a project to a (tag) version that has been [re]moved, then we're in a grey area. Strictly speaking, the lock is invalid, because a solve run without the lock could not reproduce the lock. However, in the interest of depgraph stability, if gps detects a situation like this, it will still try to honor the lock's version of reality - but that's not guaranteed to work.

There are various techniques your users can employ to defend themselves against these possibilities, the simplest of which is committing the generated vendor/ directory. Maintaining central mirrors of all dependencies is another possibility. gps may, in future, add a function to the SourceManager that checks a Lock against these kinds of upstream issues.

The SourceManager

The other thing needed to prepare a Solver is a SourceManager, which is responsible for negotiating all interactions with actual source repositories, both over the network and on disk.

While SourceManager is an interface, it's not really intended that tools implement their own. Rather, they should generally use the SourceMgr type provided by gps, created via NewSourceManager(). While the solver never explicitly type asserts that it is working with a *SourceMgr, it's still fairly tightly coupled to it in subtle ways. (The SourceManager interface was only really used to facilitate testing.)

The original example sets up a SourceMgr like so:

cache := ".repocache"
sourcemgr, _ := gps.NewSourceManager(NaiveAnalyzer{}, cache, false)
defer sourcemgr.Release()

We'll deal with NaiveAnalyzer in the next section. First, the second parameter: cache.

gps needs a lot of information from source repositories to do its work. Some of that information is VCS-level metadata, but the rest comes from static analysis of code. Ordinarily, these are things we'd extract from repositories kept on $GOPATH. Unfortunately, that won't work for gps.

As it runs, the solver will request information from the SourceManager that requires checking out different versions of code to disk. It has to do this for a lot of versions, including many that ultimately won't appear in the solution. This is fine to do in an isolated area that is solely dedicated to this purpose; we needn't care about preserving existing disk state, and we can handle inevitable errors by blowing away repositories and starting again.

Such recourse is not available to us on the $GOPATH, where mutating repositories is a no-no, and they may contain local-only state. Trying to avoid unintended consequences there is a bottomless snake pit. So instead, we use the cache.

Of course, all that repository mutation also makes it tricky for more than one SourceMgr to operate on a given cache dir at a time. To that end, there is a global lock on the entire cache tree that NewSourceManager() takes, and is released by SourceMgr.Release(). You can pass true as NewSourceManager()'s final bool parameter to forcibly override this lock (if one exists).

While the SourceManager interface, and the SourceMgr implementation in particular, is primarily designed to meet the solver's needs, any tool that needs a Solver may well have other uses for it. It's worth exploring the interface to see what work it might be able to do for you. In particular, ListPackages() and ListVersions() may be helpful.

If your tool does make its own use of gps's SourceMgr, make sure to pass the same instance into Prepare() for some easy cache-based performance gains.

The ProjectAnalyzer

As we covered earlier, the root project has its Manifest and Lock explicitly declared up front as a component of SolveParameters. However, we need the same information from dependency projects - primarily the Manifest - but it's not possible to pass in that information up front. Instead, we collect it on the fly, via a ProjectAnalyzer.

SourceMgr relies on a ProjectAnalyzer in order to fulfill its GetManifestAndLock() method. SourceMgr takes care of making sure the requested Version of the repository in question is checked out, then calls the injected ProjectAnalyzer.DeriveManifestAndLock() method with the appropriate ProjectRoot (root import path) and the path to the checkout's root directory.

type NaiveAnalyzer struct{}

func (a NaiveAnalyzer) DeriveManifestAndLock(path string, n gps.ProjectRoot) (gps.Manifest, gps.Lock, error) {
	return nil, nil, nil
}

Returning all nils from the method is basically saying, "no problems, but also, there are no constraints here - for whatever gps finds in this project's import graph, any version will do." Clearly not an optimal implementation, but it does work.

There are several things to keep in mind when implementing a ProjectAnalyzer:

  • In general, an error should only be returned if the tool detects an actual problem - e.g., an invalid manifest file. Simply not being able to find manifest or lock information should probably return nil, nil, nil. You could return an error, but doing so eliminates the version from consideration, and will likely cause solving to seize up and fail.
  • gps caches the results your ProjectAnalyzer returns from DeriveManifestAndLock() with respect to the revision being analyzed. Thus, ProjectAnalyzer implementations should be stateless: given the same input code tree, DeriveManifestAndLock() should always return the same results.
  • Tools should generally not be doing any static analysis of source code in DeriveManifestAndLock(); gps does all the necessary import graph analysis elsewhere.
  • Pursuant to the previous, $GOPATH does not matter here. The last few elements of the root path and ProjectRoot are unlikely to align in the way they would need to if $GOPATH were in play.
  • If you're building a tool for public consumption, it should probably interoperate with the full range of existing tooling: glide, gb, godep, gpm, etc. That means learning how to read their manifests and/or locks. For tools like godep that really only have a Lock-like concept (Godeps.json), return it as that: nil, lock, nil. Please, take care here. When popular tools fail to do this, it fractures the Go ecosystem.

glide's implementation is a good reference, particularly with respect to its interoperation with other tools. Now, there's just one last big thing to consider in your ProjectAnalyzer, and it relates to that nil, lock, nil return pattern.

Preferred Versions

Locks, being solver outputs, mostly don't matter to solving - though as we covered earlier, an exception is made for the root project's lock. That small exception allows the solver to keep versions stable across solves if possible (unless updates are requested).

gps has a similar exception for Locks that come from dependencies. Versions coming out of a dependency's lock are referred to as preferred versions. To explain how they work, we need an example.

Given that we're solving for Root on this import graph:

When it comes time for the solver to pick a version for D, then:

  • IF the SolveParameters contain no lock data (from Root), OR the version for D fails to meet constraints
  • AND IF the ProjectAnalyzer's DeriveManifestAndLock() method reports a Lock for B
  • AND IF that Lock contains a version for D (this is B's' preferred version of D)
  • AND IF that preferred version of D is acceptable according to all other constraints
  • THEN the solution will select D at B's preferred version.

This fairly abstruse path is important for the Go ecosystem because it builds a bridge to the way most Go dependency management tools (such as godep) have historically worked: simply locking to a revision. If your tool reads in a Godeps.json file and returns it as the Lock from DeriveManifestAndLock(), then the versions given therein will probably end up in the Solution. That'll be the case unless/until:

  • Some project, root or otherwise, expresses an incompatible version constraint
  • Another dependency's Lock expresses a preferred version for the same dep

In practice, this means gps-based tools can transparently act like godep for transitive dependencies until the user says otherwise, or the depgraph makes it impossible. We sometimes call this property transitive version stability.

Now, nothing blows up if two different deps express a preferred version for a third, but obviously, only one can ultimately win. Again, an example: say that the solver is trying to pick a version for C, on which both A and B depend. Currently, gps allows only one preferred version to be expressed, so the solver would pick one of A or B based on complex (but deterministic) criteria. At that point, the same rules apply - the picked preferred version will be used only if it meets all constraints, and the root lock doesn't offer an [acceptable] version.

A tool can implement preferred versions by returning a Lock from DeriveManifestAndLock():

type PreferredVersionsAnalyzer struct{}

func (a PreferredVersionsAnalyzer) DeriveManifestAndLock(path string, n gps.ProjectRoot) (gps.Manifest, gps.Lock, error) {
	return findAManifestElseNil(), findSomeLockDataElseNil(), nil
}

Now, preferred versions are definitely sorta magical. If that's not acceptable for your tool, you can ensure they're never used by always returning a nil Lock:

type NoMagicAnalyzer struct{}

func (a NoMagicAnalyzer) DeriveManifestAndLock(path string, n gps.ProjectRoot) (gps.Manifest, gps.Lock, error) {
	return findAManifestElseNil(), nil, nil
}

Solutions and Failures

We've finally got everything prepped - it's time to solve!

solver, _ := gps.Prepare(params, sourcemgr)
solution, err := solver.Solve()
if err == nil {
	os.RemoveAll(filepath.Join(root, "vendor"))
	gps.WriteDepTree(filepath.Join(root, "vendor"), solution, sourcemgr, true)
}

If solving fails, the error returned will (or should - this is an area we're actively improving) describe what caused the failure. This could be something simple, or something complex. Either way, it's gps' goal that the error be digestible for users, so sending it back to them in some form is recommended.

If no error is returned, then the solver found a Solution. At this point, most real tools would probably want to persist the solution, typically in the form of a lock file.

Persistence or no, with the solution in hand, we can write out our dependency tree. The example implementation does so in-place after blowing away the existing vendor/ directory. That's extremely unsafe, of course; real tools will probably want to write out a tree first to a tmp dir, then swap it in for the existing vendor/ directory only if doing so caused no errors.

The final boolean parameters to WriteDepTree determines whether or not vendor/ directories nested within dependencies should be removed. This should generally be an aggressive "yes!" - if any code is present in nested vendor directories, not stripping it out will mean the user isn't actually getting the selected solution.

The only time vendor stripping should create any kind of problem is if dependencies are storing either modified or just plain not-vendor code under their vendor/ directory. There isn't a lot gps can do about this, really, which is why one of our foundational assumptions is that the vendor/ directory is for...uh, upstream vendored code. Nevertheless, this option is available if the use case demands no stripping, or if the tool has its own version stripping logic that it prefers to use.