Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvements #204

Merged
merged 4 commits into from
Aug 31, 2023
Merged

Performance Improvements #204

merged 4 commits into from
Aug 31, 2023

Conversation

inteon
Copy link
Contributor

@inteon inteon commented Apr 10, 2023

Greatly improves performance of the go-licenses tool by preventing a duplicate expensive operations and parallelising them.

The performance improvement for cert-manager:

# master
$ time make update-licenses
...
real    2m12.426s
user    1m0.442s
sys     0m2.430s
# this PR
$ time make update-licenses
...
real    0m13.249s
user    0m17.688s
sys     0m1.853s

@inteon inteon force-pushed the improvements branch 2 times, most recently from 67592db to 1d3504e Compare April 26, 2023 15:32
@Bobgy Bobgy mentioned this pull request Apr 26, 2023
7 tasks
licenses/classifier.go Outdated Show resolved Hide resolved
@inteon
Copy link
Contributor Author

inteon commented Apr 26, 2023

@Bobgy this PR is ready for review.

check.go Show resolved Hide resolved
@inteon
Copy link
Contributor Author

inteon commented Jun 21, 2023

@Bobgy Is there something I can do still to get this merged?

@Bobgy
Copy link
Collaborator

Bobgy commented Jun 22, 2023

@Bobgy Is there something I can do still to get this merged?

Sorry I am not spending enough time on this. I tried looking at it initially, but some part of the code was a bit hard to understand. I don't remember now.

I think the simplest way to give us enough confidence is by setting up a bigger e2e test example and confirm the result has no diffence for this PR.

@inteon
Copy link
Contributor Author

inteon commented Jul 8, 2023

@Bobgy I added a more complex e2e test in #215.
This PR also runs this test successfully, giving us more confidence in this PR.

@inteon inteon force-pushed the improvements branch 2 times, most recently from 949bfd3 to 3008ef3 Compare July 10, 2023 20:38
@Bobgy
Copy link
Collaborator

Bobgy commented Jul 15, 2023

Thank you! I will take another quick look

Copy link
Collaborator

@Bobgy Bobgy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks great! I need to read library.go again not on my phone.

licenses/find.go Show resolved Hide resolved
licenses/module.go Show resolved Hide resolved
inteon added 2 commits July 30, 2023 11:28
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@inteon inteon requested a review from Bobgy August 24, 2023 17:53
@inteon
Copy link
Contributor Author

inteon commented Aug 25, 2023

@Bobgy Is this PR ready to be merged?

@gwatts
Copy link
Contributor

gwatts commented Aug 25, 2023

Thought i'd try this against our production code ahead of merge.

It's certainly faster.. 5 seconds vs 27 seconds, however it initially failed to complete for our code as github.com/jmespath/go-jmespath now reports "Unknown" for the license instead of Apache 2 which tripped up my template (emits E0825 14:22:53.325752 1024985 report.go:143] Error identifying license in "Unknown": no license found in the log)

It also picked up a license which master didn't before for github.com/google/go-github .. we have that in our go.mod, but haven't yet determined if its actually in the build.

Will dig in further as time permits.. aside from the go-jmespath case and the addition of one license, the csv output is identical between master and this branch..

There seems to be a change in behaviour for templated output, however.. my template looks like this:

{{ range . }}
## {{ .Name }}

* Version: {{ .Version }}
* License: [{{ .LicenseName }}]({{ .LicenseURL }})

{{ if ne .LicensePath "Unknown"}}
```
{{ .LicenseText }}
```
{{ end }}
{{ end }}

This now seems to range over each license for each package, whereas previously it looped over each package once. This means that for packages that have three licenses, it will emit .LicenseText three times which is fine, but it seems to be showing the same .LicenseText each time

Will dig in further when i get a bit of time

@inteon
Copy link
Contributor Author

inteon commented Aug 25, 2023

Could you verify that the Error identifying license in "Unknown": no license found error also happens on master for github.com/jmespath/go-jmespath? I think this is due to the new classifier version that we are using.
Also, could you check the LICENSE file for the specific version of github.com/jmespath/go-jmespath that you are using? I think it might be a weirdly formatted LICENSE.

Also, on master, there has been a change that adds all licenses in a LICENSE file to the report. That might explain your second issue where it shows multiple entries for the same file? Please take a look at the contents of that LICENSE file and check if it contains multiple licenses.

@gwatts thanks for the testing!

@gwatts
Copy link
Contributor

gwatts commented Aug 25, 2023

I just created a new throwaway Go project here with a package that imports github.com/jmespath/go-jmespath and github.com/gwatts/rootcerts

With go-license 1.6.0 it outputs (using go-licenses report ./... --ignore github.com/gwatts/lictest:

github.com/gwatts/rootcerts,https://github.com/gwatts/rootcerts/blob/ff6b9d3bf583/LICENSE,MIT
github.com/jmespath/go-jmespath,https://github.com/jmespath/go-jmespath/blob/v0.4.0/LICENSE,Apache-2.0

As of commit e41525c though it doesn't output go-jmespath at all for this test case:

github.com/gwatts/rootcerts,https://github.com/gwatts/rootcerts/blob/ff6b9d3bf583/LICENSE,MIT

And for this PR branch:

E0825 19:37:39.480783 1029605 report.go:143] Error identifying license in "Unknown": no license found
github.com/gwatts/rootcerts,https://github.com/gwatts/rootcerts/blob/ff6b9d3bf583/LICENSE,MIT
github.com/jmespath/go-jmespath,Unknown,Unknown

@inteon
Copy link
Contributor Author

inteon commented Aug 25, 2023

@gwatts https://github.com/jmespath/go-jmespath/blob/v0.4.0/LICENSE not being recoginised is due to a change in the behavior of licenseclassifier/v2 vs licenseclassifier, and I don't think we can do much about it. It also looks like the particular LICENSE file has been updated to a more uniform format: jmespath/go-jmespath@b89213a

Thanks to pointing out that the Unknown,Unknown entry is missing on master right now. Also, there is no error on master.
This means that this code is not executed on master: https://github.com/google/go-licenses/pull/204/files#diff-fee12aa2a5b1e9be6ab94d44ddebc2acd1d3a79d5183340ceee62dd114d49157L112
This is due to the licensePath parameter for Identify being empty, which causes the function to return without error (https://github.com/google/go-licenses/pull/204/files#diff-0132dacd132fe94ab157cd20f25a9c70cdfc9186f6884c98d92baa53cd6f3787L52).

In this branch, I moved the "no licenses found" check out of the Identify function: https://github.com/google/go-licenses/pull/204/files#diff-fee12aa2a5b1e9be6ab94d44ddebc2acd1d3a79d5183340ceee62dd114d49157R142-R150

I think we should actually show an error and add the Unknown entry because we did not find a license. This PR by accident fixes the bug.

We should however improve the error message (in "Unknown" does not make sense).
@gwatts could you confirm that 0465735 gives you a clearer error message?

Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@Bobgy
Copy link
Collaborator

Bobgy commented Aug 31, 2023

Thank you @inteon for the awesome perf improvements! It's great to see the test numbers.

Let's merge and release v2 alpha first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants