Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Trusty integration #3277

Merged
merged 10 commits into from
May 15, 2024
Merged

Improve Trusty integration #3277

merged 10 commits into from
May 15, 2024

Conversation

puerco
Copy link
Contributor

@puerco puerco commented May 8, 2024

Summary

This PR surfaces the Trusty maliciousness data on the comment added by minder when analyzing dependencies. It also adds the required piping to add the data to the rest of the required low scored dependencies.

I've broken the Trusty evaluator to more scoped utility functions to make them more testable and added a few initial unit tests. It needs a little bit more mocking to write an integration test and the rest of the unit tests

I've simplified the comment template, we now have a single template instead of 3 and it is now pure markdown which is smaller. Here' is a screenshot of the output with malicious dependencies:

image

Demo PRs showing

Change Type

Mark the type of change your PR introduces:

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

Added initial tests (up to 22% from 0 :) )

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

@puerco puerco requested a review from a team as a code owner May 8, 2024 05:57
@puerco puerco changed the title Malicious deps Surface malicious dependencies from Trusty data May 8, 2024
@coveralls
Copy link

coveralls commented May 8, 2024

Coverage Status

coverage: 49.324% (+0.2%) from 49.135%
when pulling eeb2ada on puerco:malicious-deps
into daccbc1 on stacklok:main.

yrobla
yrobla previously approved these changes May 8, 2024
jhrozek
jhrozek previously approved these changes May 8, 2024
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment, but looks good.

// Classify all dependencies, tracking all that are malicious or scored low
for _, dep := range prDependencies.Deps {
if err := classifyDependency(ctx, &logger, e.client, ruleConfig, prSummaryHandler, dep); err != nil {
return fmt.Errorf("classifying dependency: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, but do you think we should error out the whole evaluation here? I wonder if we could classify a dependency as something like unknown instead.

Copy link
Contributor Author

@puerco puerco May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this logic is responsible for blocking PRs introducing malware into the codebase, I would rather build a more resilient client to make the requests more robust and indeed fail if we can't get the trusty score instead of just letting it through as an unknown. WDYT?

The only two cases where this might fail are when there is an error talking to trusty or due to a misconfiguration in the profile.

@lukehinds
Copy link
Contributor

There is more payload from trusty we could leverage here.

Provenance: sigstore or historical provenance. It might make sense to surface a threshold here for folks to set within the policy. This could be an float (I think?) between 1-10, or a simply bool cc: @therealnb , @yrobla

We also have deprecated or achieved available as fields to report in the same way as malicious

@puerco puerco dismissed stale reviews from jhrozek and yrobla via ad5ca38 May 9, 2024 03:18
Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Invisible Unicode Characters Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Invisible Unicode Characters Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Mixed Scripts Detected.

@puerco
Copy link
Contributor Author

puerco commented May 9, 2024

OK, I've surfaced the rest of the malicious data and the score components on the PR comment. The rule evaluator will now honor minimal scores for provenance and activity in the rule configuration.

image

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Invisible Unicode Characters Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Mixed Scripts Detected.

@jhrozek
Copy link
Contributor

jhrozek commented May 10, 2024

I really like the new format, but it seems like a high-scoring package is getting flagged now as well, see e.g. jakubtestorg/bad-python#193

// summary score is used.
// If `evaluate_score` is set to something else (e.g. `provenance`)
// then that score is used, which comes from the details field.
EvaluateScore string `json:"evaluate_score" mapstructure:"evaluate_score"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this is probably why the "good" packages are now flagged as low scoring? Does it mean that everyone who deployed this profile with the old ruletype (before mindersec/minder-rules-and-profiles#111) would get all their deps flagged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh interesting. Let me check why that one is getting flagged. The removal of EvaluateScore should not affect this one in particular as it has a high score and also a high provenance component.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not reproduce it (see here) it is weird because the profile is the same and the previous EvaluateScore is not used anymore. I'll keep looking

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Invisible Unicode Characters Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Mixed Scripts Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Invisible Unicode Characters Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Mixed Scripts Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Invisible Unicode Characters Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Mixed Scripts Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Invisible Unicode Characters Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Mixed Scripts Detected.

@puerco puerco changed the title Surface malicious dependencies from Trusty data Improve Trusty integration May 13, 2024
@puerco
Copy link
Contributor Author

puerco commented May 13, 2024

This is how its looking now. I will investigate Jakubs bugfind next but we can merge as it is and iterate on smaller PRs.

image

@jhrozek
Copy link
Contributor

jhrozek commented May 14, 2024

This is how its looking now. I will investigate Jakubs bugfind next but we can merge as it is and iterate on smaller PRs.

Let me try to reproduce again so I don't send you down the wrong path

@jhrozek
Copy link
Contributor

jhrozek commented May 14, 2024

I think there are two issues: The evaluator has hardcoded default configuration which should include also the two new attributes and additionally it seems like the condition when evaluating the scores are reversed - shouldn't we check that the score of the package is higher than the config? Check out the diff below:

diff --git a/internal/engine/eval/trusty/config.go b/internal/engine/eval/trusty/config.go
index 8b673b494..534427641 100644
--- a/internal/engine/eval/trusty/config.go
+++ b/internal/engine/eval/trusty/config.go
@@ -60,16 +60,22 @@ func defaultConfig() *config {
 		Action: pr_actions.ActionSummary,
 		EcosystemConfig: []ecosystemConfig{
 			{
-				Name:  "npm",
-				Score: 5.0,
+				Name:       "npm",
+				Score:      5.0,
+				Provenance: 5.0,
+				Activity:   5.0,
 			},
 			{
-				Name:  "pypi",
-				Score: 5.0,
+				Name:       "pypi",
+				Score:      5.0,
+				Provenance: 5.0,
+				Activity:   5.0,
 			},
 			{
-				Name:  "go",
-				Score: 5.0,
+				Name:       "go",
+				Score:      5.0,
+				Provenance: 5.0,
+				Activity:   5.0,
 			},
 		},
 	}
diff --git a/internal/engine/eval/trusty/trusty.go b/internal/engine/eval/trusty/trusty.go
index 634fdb915..53b4dcf6c 100644
--- a/internal/engine/eval/trusty/trusty.go
+++ b/internal/engine/eval/trusty/trusty.go
@@ -253,14 +253,14 @@ func classifyDependency(
 		}
 	}
 
-	if ecoConfig.Score <= packageScore {
+	if ecoConfig.Score > packageScore {
 		reasons = append(reasons, TRUSTY_LOW_SCORE)
 	}
-	if ecoConfig.Provenance <= descr["provenance"].(float64) {
+	if ecoConfig.Provenance > descr["provenance"].(float64) {
 		reasons = append(reasons, TRUSTY_LOW_PROVENANCE)
 	}
 
-	if ecoConfig.Activity <= descr["activity"].(float64) {
+	if ecoConfig.Activity > descr["activity"].(float64) {
 		reasons = append(reasons, TRUSTY_LOW_ACTIVITY)
 	}
 	if len(reasons) > 0 {

puerco added 10 commits May 14, 2024 09:19
This PR surfaces the trusty malicious data to the comment added
by minder when the trusty evaluator inspects a PR.

Signed-off-by: Adolfo García Veytia (Puerco) <puerco@stacklok.com>
This commit adds a few unit tests to some of the new utility functions
handling the trusty evaluator.

Signed-off-by: Adolfo García Veytia (Puerco) <puerco@stacklok.com>
Signed-off-by: Adolfo García Veytia (Puerco) <puerco@stacklok.com>
Signed-off-by: Adolfo García Veytia (Puerco) <puerco@stacklok.com>
This commit drops the configuration getter from the trusty
ecosystem config. As we now have access to the individual components
we can write rules on each of them independent of each other.

Signed-off-by: Adolfo García Veytia (Puerco) <puerco@stacklok.com>
This commit implements the new trusty template which exposes all
score components.

Signed-off-by: Adolfo García Veytia (Puerco) <puerco@stacklok.com>
Signed-off-by: Adolfo García Veytia (Puerco) <puerco@stacklok.com>
This commit adds a simple test to ensure the comment template
parses correctly.

Signed-off-by: Adolfo García Veytia (Puerco) <puerco@stacklok.com>
Signed-off-by: Adolfo García Veytia (puerco) <puerco@stacklok.com>
Signed-off-by: Adolfo García Veytia (puerco) <puerco@stacklok.com>
Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Invisible Unicode Characters Detected.

Copy link
Contributor

@stacklokbot stacklokbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No Mixed Scripts Detected.

@puerco
Copy link
Contributor Author

puerco commented May 14, 2024

Ah good catch jakub, I've flippped the logic on the constants. Although those were not being used for the output anymore they were still being weighted to classify. I've pushed a change with the flipped comparisons.

@evankanderson
Copy link
Member

it seems like the condition when evaluating the scores are reversed - shouldn't we check that the score of the package is higher than the config?

Should we have a test for this? It feels like the sort of bug that we could easily accidentally introduce again.

@jhrozek
Copy link
Contributor

jhrozek commented May 14, 2024

it seems like the condition when evaluating the scores are reversed - shouldn't we check that the score of the package is higher than the config?

Should we have a test for this? It feels like the sort of bug that we could easily accidentally introduce again.

We do have a smoke test, but alas, no way of running it locally yet. You are right that we should do better on the unit testing front - not having proper tests is mostly my fault, we do have reasonable tests for the OSV evaluator and the current Trusty evaluator code was meant to be a "quick hack" until we can generalize the PR dependency evaluators into common code and built both OSV and trusty evaluators atop them. We just haven't prioritized that work.

Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for being patient with this long review!

@JAORMX
Copy link
Contributor

JAORMX commented May 15, 2024

I think we can merge this now. Let's start adding tests in further PRs

@JAORMX JAORMX merged commit f1e2219 into mindersec:main May 15, 2024
21 checks passed
@puerco
Copy link
Contributor Author

puerco commented May 15, 2024

Should we have a test for this? It feels like the sort of bug that we could easily accidentally introduce again.
@evankanderson as part of this improvement I've been splitting the logic into more atomic functions to make them testable and to be able to mock the whole trusty evaluator. I'll strat sending incremental PRs to add more tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants