Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2: list dependencies from either import path or go binary #71

Closed
wants to merge 29 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
7f8ce88
set up v2 folder
Bobgy Jun 19, 2021
e020f62
v2: i2: list deps in go binary
Bobgy Jun 19, 2021
55b8538
fix
Bobgy Jun 19, 2021
a571fd5
update
Bobgy Jun 20, 2021
3ea68bd
fix
Bobgy Jun 20, 2021
2eafb6e
cleanup
Bobgy Jun 20, 2021
682cc82
merge to package gocli
Bobgy Jun 20, 2021
5bd2c4a
fix unit tests
Bobgy Jun 20, 2021
b85f547
rename
Bobgy Jun 20, 2021
acfe58a
cleanup
Bobgy Jun 22, 2021
6f64046
fix
Bobgy Jun 22, 2021
f885c2c
extract more metadata from binary
Bobgy Jun 22, 2021
b9da23c
cleanup
Bobgy Jun 22, 2021
6973c71
cleanup
Bobgy Jun 22, 2021
517e811
address comments
Bobgy Jun 26, 2021
3cd4f5f
clean up
Bobgy Jun 26, 2021
0749a9a
update README with current status
Bobgy Jul 24, 2021
cee56ce
add comments clarifying what tests/modules are
Bobgy Jul 24, 2021
4f9aff3
test: simplify test module
Bobgy Jul 31, 2021
1f90f0e
address feedback: add comments & clean up temp files
Bobgy Aug 15, 2021
fff0db5
vendor go/runtime/debug
Bobgy Aug 15, 2021
89f12c3
test: add version to gocli.ExtractBinaryMetadata test
Bobgy Aug 15, 2021
8d1a5b8
modify go/runtime/debug to parse build info from go version -m comman…
Bobgy Aug 15, 2021
1fb170d
rm third_party/uw-labs/lichen
Bobgy Aug 15, 2021
7b29d8f
add example build info data
Bobgy Aug 15, 2021
6371c33
refactor: expose our own Module type that does not have a replace field
Bobgy Aug 15, 2021
2357868
add more comments
Bobgy Aug 15, 2021
2d930b0
add another approach to get go modules using packages.Visit
Bobgy Aug 30, 2021
801af76
trim +incompatible from version
Bobgy Sep 4, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions v2/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
go-licenses
dist
.DS_Store
223 changes: 223 additions & 0 deletions v2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# go-licenses
Bobgy marked this conversation as resolved.
Show resolved Hide resolved

## **THIS IS STILL UNDER DEVELOPMENT**

A tool to automate license management workflow for go module project's dependencies and transitive dependencies.

## Install

Download the released package and install it to your PATH:
TODO: udpate URL after release.

```bash
curl -LO download-url/go-licenses-linux.tar.gz
tar xvf go-licenses-linux.tar.gz
sudo mv go-licenses/* /usr/local/bin/
Bobgy marked this conversation as resolved.
Show resolved Hide resolved
# or move the content to anywhere in PATH
```

## Output Example

<!-- TODO: update NOTICES folder of this repo. -->
<!-- [NOTICES folder](./NOTICES) is an example of generated NOTICES for go-licenses tool itself. -->

Examples used in Kubeflow Pipelines:

* [go-licenses.yaml (config file)](https://github.com/kubeflow/pipelines/blob/master/v2/go-licenses.yaml)
* [license_info.csv (generated)](https://github.com/kubeflow/pipelines/blob/master/v2/third_party/license_info.csv)
* [NOTICES/licenses.txt (generated)](https://github.com/kubeflow/pipelines/blob/master/v2/third_party/NOTICES/licenses.txt)

## Usage

### One-off License Update

1. Get version of the repo you need licenses info:

```bash
git clone <go-mod-repo-you-need-license-info>
cd <go-mod-repo-you-need-license-info>
git checkout <version>
```

1. Write down a minimal config file specifying your module name and which binary to analyze:

```yaml
module:
go:
module: github.com/google/go-licenses/v2
path: .
binary:
path: dist/linux/go-licenses
```

1. Get dependencies from go modules and generate a `license_info.csv` file of their licenses:

```bash
go-licenses csv
```

The csv file has three columns: `depdency`, `license download url` and inferred `license type`.

Note, the format is consistent with [google/go-licenses](https://github.com/google/go-licenses).

1. The tool may fail to identify:

* Download url of a license: they will be left out in the csv.
* SPDX ID of a license: they will be named `Unknown` in the csv.

Please check them manually and update your `go-licenses.yaml` config to fix them, refer to [the example](./go-licenses.yaml). After your config fix, re-run the tool to generate lists again:

```bash
go-licenses csv
```

Iterate until you resolved all license issues.

1. Download notices, licenses and source folders that should be distributed along with the built binary:

```bash
go-licenses save
```

Notices and licenses will be concatenated to a single file called `NOTICES/license.txt`.
Source code folders will be copied to `NOTICES/<module/import/path>`.

Notices folder location can be configured in [the go-licenses.yaml example](./go-licenses.yaml).

Some licenses will be rejected based on its [license type](https://github.com/google/licenseclassifier/blob/df6aa8a2788bdf5ac382148c2453a407a29819b8/license_type.go#L341).

### Integrating in CI

Typically, I think we should check `licenses_info.csv` into source control and
download license contents when releasing.

An early idea for CI is to run a simple script:

1. clones the repo, run `go-licenses csv`.
1. verifies if generated `licenses_info.csv` if up-to-date as the version in the repo.

We might worry about flakiness, because various dependencies could be down
temporarily. Another simpler idea is to let the script do:

1. If `go.mod` has been updated, but not the license files.
1. Fails and says you should update the license files.

## Implementation Details

Rough idea of steps in the two commands.

`go-licenses csv` does the following to generate the `license_info.csv`:

1. Load `go-licenses.yaml` config file, the config file can contain
* module name
* built binary local path
* module license overrides (path excludes or directly assign result license)
1. All dependencies and transitive dependencies are listed by `go version -m <binary-path>`. When a binary is built with go modules, used module info are logged inside the binary. Then we parse go CLI result to get the full list.
1. Scan licenses and report problems:
1. Use <github.com/google/licenseclassifier/v2> detect licenses from all files of dependencies.
1. Report an error if no license found for a dependency etc.
1. Get license public URLs:
1. Get a dependency's github repo by fetching meta info like `curl 'https://k8s.io/client-go?go-get=1'`.
1. Get dependency's version info from go modules metadata.
1. Combine github repo, version and license file path to a public github URL to the license file.
1. Generate CSV output with module name, license URL and license type.
1. Report dependencies the tool failed to deal with during the process.

`go-licenses save` does the following:

1. Read from `license_info.csv` generated in `go-licenses csv`.
1. Call [github.com/google/licenseclassifier](https://github.com/google/licenseclassifier) to get license type.
1. Three types of reactions to license type:
* Download its notice and license for all types.
* Copy source folder for types that require redistribution of source code.
* Reject according to <https://github.com/google/licenseclassifier/blob/df6aa8a2788bdf5ac382148c2453a407a29819b8/license_type.go#L341>.

## Credits

go-licenses/v2 is greatly inspired by

* [github.com/google/go-licenses](https://github.com/google/go-licenses) for the commands and compliance workflow
* [github.com/mitchellh/golicense](https://github.com/mitchellh/golicense) for getting modules from binary
* [github.com/uw-labs/lichen](https://github.com/uw-labs/lichen) for the vendored code to extract structured data from `go version -m` result.

## Comparison with similar tools

<!-- TODO(Bobgy): update this to a table -->

* go-licenses/v2 was greatly inspired by [github.com/google/go-licenses](https://github.com/google/go-licenses), with the differences:
* go-licenses/v2 works better with go modules.
* no need to vendor dependencies.
* discovers versioned license URLs.
* go-licenses/v2 scans all dependency files to find multiple licenses if any, while go-licenses detects by file name heuristics in local source folders and only finds one license per dependency.
* go-licenses/v2 supports using a manually maintained config file `go-licenses.yaml`, so that we can reuse periodic license changes with existing information.
* go-licenses/v2 was mostly written before I learned [github.com/github/licensed](https://github.com/github/licensed) is a thing.
* Similar to google/go-licenses, github/licensed only use heuristics to find licenses and assumes one license per repo.
* github/licensed uses a different library for detecting and classifying licenses.
* go-licenses/v2 is a rewrite of [kubeflow/testing/go-license-tools](https://github.com/kubeflow/testing/tree/master/py/kubeflow/testing/go-license-tools) in go, with many improvements:
* better & more robust github repo resolution ratio
* better license classification rate using google/licenseclassifier/v2 (it especially handles BSD-2-Clause and BSD-3-Clause significantly better than GitHub license API).
* automates licenses that require distributing source code with it (copied from local module src cache)
* simpler process e2e (instead of too many intermediate steps and config files)
* rewritten in go, so it's easier to redistribute the binary than python

## Roadmap

General directions to improve this tool:

* Build backward compatible behavior compared to google/go-licenses v1.
* Ask for more usage & feedback and improve robustness of the tool.

## TODOs

### Features

#### P0

* [ ] Use cobra to support providing the same information via argument or config.
* [ ] Implement "check" command.
* [ ] Support use-case of one modules folder with multiple binaries.
* [x] Support customizing allowed license types.
* [x] Support replace directives.
* [x] Support modules with +incompatible in their versions, ref: <https://golang.org/ref/mod#incompatible-versions>.

#### P1

* [ ] Support installation using go get.
* [ ] Refactor & improve test coverage.

#### P2

* [ ] Support auto inclusion of licenses in headers by recording start line and end line of a license detection.
* [ ] Check header licenses match their root license.
* [ ] Find better default locations of generated files.
* [ ] Improve logging format & consistency.
* [ ] Tutorial for integration in CI/CD.

## License Workflow Design Overview

This section introduces full workflow to comply with open source licenses.
In each workflow stage, we list several options and what this tool prefers.

1. List dependencies - Options
* (Preferred) List dependencies in a go binary
* List all go module dependencies

1. Detect licenses for a dependency
* Files to consider - options:
* (Preferred) Scan every file
* Only look into common license file names like LICENSE, LICENSE.txt, COPYING, etc.
* License classifier - options:
* (Preferred) [google/licenseclassifier/v2](https://github.com/google/licenseclassifier/tree/main/v2)
* [licensee](https://github.com/licensee/licensee)
* GitHub license API
* many other options
* Manual configs to overcome what we cannot automate
* (not supported yet) allowlist for licenses
* (supported) override manually examined licenses
* (supported) exclude self-owned proprietary dependencies
* (supported) pin config to dependency version to avoid stale configs

1. Comply with license requirements by redistributing:
* attribution/copyright notice
* licenses in full text
* dependency source code for licenses that require so
25 changes: 25 additions & 0 deletions v2/deps/deps.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
// Copyright 2021 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package deps

type GoModule struct {
// go import path, example: github.com/google/licenseclassifier/v2
ImportPath string
// version, example: v1.2.3, v0.0.0-20201021035429-f5854403a974
Version string
// local directory of dependency's source code, example on MacOS:
// /Users/username/go/pkg/mod/github.com/!puerkito!bio/goquery@v1.6.1
SrcDir string
}
101 changes: 101 additions & 0 deletions v2/deps/go_binary.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
// Copyright 2021 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package deps

import (
"context"
"fmt"

"github.com/google/go-licenses/v2/goutils"
lichenmodule "github.com/google/go-licenses/v2/third_party/uw-labs/lichen/module"
)

type goModuleRef struct {
// go import path, example: github.com/google/licenseclassifier/v2
ImportPath string
// version, example: v1.2.3, v0.0.0-20201021035429-f5854403a974
Version string
}
Bobgy marked this conversation as resolved.
Show resolved Hide resolved

// Parse dependencies from metadata in a go binary.
// Prerequisites:
// * The go binary must be built with go modules without any further modifications.
// * The command must run with working directory same as to build the analyzed
// go binary, because we need the exact go modules info used to build it.
//
// Here, I am using [1] as a short term solution. It runs [4] go version -m and parses
// output. This is preferred over [2], because [2] is an alternative implemention
// for go version -m, and I expect better long term compatibility for go version -m.
//
// The parsing command output hack is still unfavorable in the long term. As
// dicussed in [3], golang community will move go version parsing into an individual
// module in golang.org/x. We can use that module instead after it is built.
//
// References of similar implementations or dicussions:
// 1. https://github.com/uw-labs/lichen/blob/be9752894a5958f6ba7be9e05dc370b7a73b58db/internal/module/extract.go#L16
// 2. https://github.com/mitchellh/golicense/blob/8c09a94a11ac73299a72a68a7b41e3a737119f91/module/module.go#L27
// 3. https://github.com/golang/go/issues/39301
// 4. https://golang.org/pkg/cmd/go/internal/version/
func ListModulesInGoBinary(Path string) (refs []goModuleRef, err error) {
defer func() {
if err != nil {
err = fmt.Errorf("ListModulesInGoBinary(Path='%s'): %w", Path, err)
}
}()
depsBuildInfo, err := lichenmodule.Extract(context.Background(), Path)
if err != nil {
return nil, err
}
if len(depsBuildInfo) != 1 {
return nil, fmt.Errorf("len(depsBuildInfo) should be 1, but found %v", len(depsBuildInfo))
}
refs = make([]goModuleRef, 0)
for _, buildInfo := range depsBuildInfo {
for _, ref := range buildInfo.ModuleRefs {
refs = append(refs, goModuleRef{
ImportPath: ref.Path,
Version: ref.Version,
})
}
}
return refs, nil
}

func JoinModuleRefWithLocalModules(refs []goModuleRef) (modules []GoModule, err error) {
localModules, err := goutils.ListModules()
if err != nil {
return
}
localModulesDict := goutils.BuildModuleDict(localModules)

for _, ref := range refs {
localModule, ok := localModulesDict[ref.ImportPath]
if !ok {
return nil, fmt.Errorf("Cannot find %v in current dir's go modules. Are you running this tool from the working dir to build the binary you are analyzing?", ref.ImportPath)
}
if localModule.Dir == "" {
return nil, fmt.Errorf("Module %v's local directory is empty. Did you run go mod download?", ref.ImportPath)
}
if localModule.Version != ref.Version {
return nil, fmt.Errorf("Found %v %v in go binary, but %v is downloaded in go modules. Are you running this tool from the working dir to build the binary you are analyzing?", ref.ImportPath, ref.Version, localModule.Version)
}
modules = append(modules, GoModule{
ImportPath: ref.ImportPath,
Version: ref.Version,
SrcDir: localModule.Dir,
})
}
return modules, nil
}
Loading