Improve artifact support by storing artifact_id in the rule evaluation table and store artifact information during webhook processing #760

jhrozek · 2023-08-25T12:31:18Z

This pull request does two things in fact:

extends the DB schema and the associated Go code to be able to reference
artifacts when storing rule evaluation status. This allows to return policy
results that pertain to artifacts to the clients
extends the executor code that handles artifact published events to also
store artifact versions so that newly published artifacts can be evaluated

There will also be a follow up that will improve the EntityInfo map[string]string attribute of the RuleEvaluationStatus message to send
artifact-specific extended information to the clients, but I didn't want to
delay this PR any longer.

Arguably this PR could have been split into two, one adding just the support
for referencing artifacts in the rule_evaluation_status table and the other
extending the executor code to store artifact information, but I just coded
and tested everything together and it made for easier testing to test with
newly released artifacts.

There's multiple patches which should hopefully make the review easier, I also
tried to write reasonably descriptive commit messages.

jhrozek · 2023-08-25T12:33:21Z

Fixes: #715 (there's one item in the 715 todo list which I'll split into a separate ticket as this PR is already quite big)

lukehinds

code look good to me, but I would like @yrobla to take a look who has history with the implementation

JAORMX

The upsert logic seems fine to me. But, I'm confused by the in-code signature verification, that should be done at the policy level instead AFAIU. Let's catch up on this.

internal/engine/executor.go

jhrozek · 2023-08-28T11:52:44Z

The upsert logic seems fine to me. But, I'm confused by the in-code signature verification, that should be done at the policy level instead AFAIU. Let's catch up on this.

So we did have a rather long discussion with @JAORMX and @yrobla and agreed that since we're storing the data about signatures as part of the artifact version, we can just keep doing that, but we'll make the policy evaluation simpler and more effective by checking the signature and storing the result when we fetch the artifact version and then the policy will make a decision based on that -- currently the builtin policy ValidateSignature also grabs the signatures again during evaluation again after we've grabbed and stored them earlier in the db.

evankanderson

A few comments, I understand there's a little rework going on here at the moment.

evankanderson · 2023-08-28T13:17:31Z

database/migrations/000001_init.up.sql

@@ -240,7 +240,7 @@ CREATE UNIQUE INDEX users_organization_id_username_lower_idx ON users (organizat
 CREATE UNIQUE INDEX repositories_repo_id_idx ON repositories(repo_id);
 CREATE UNIQUE INDEX policies_group_id_policy_name_idx ON policies(provider, group_id, name);
 CREATE UNIQUE INDEX rule_type_idx ON rule_type(provider, group_id, name);
-CREATE UNIQUE INDEX rule_evaluation_status_results_idx ON rule_evaluation_status(policy_id, repository_id, entity, rule_type_id);
+CREATE UNIQUE INDEX rule_evaluation_status_results_idx ON rule_evaluation_status(policy_id, repository_id, COALESCE(artifact_id, -1), entity, rule_type_id);


NOT for this PR, but I'm feeling a little itchy on the schema here -- I'm not sure what to suggest yet, but the Upsert (which I introduced) is starting to feel like the beginnings of a hairball. On the other hand, I think "Upsert" is absolutely the right semantic if we want to know the latest rule evaluation result -- I think it's more more about the possible 4-way ID key with NULLs.

Would using DEFAULT -1 for artifact_id in the database simplify the rest, or just make things worse?

(I think Ozz or Yolanda warned me about this, to be fair)

Looking at the code below, it looks like we should be able to use 0 as the sentinel value as well as -1. I have a slight preference for using 0, as that aligns with Go and Proto's default values. It does mean that we need to start our artifact ID numbering at 1.

Starting at 1 should already be the case (Postgres SERIAL starts as 1 unless you tell it otherwise using CREATE SEQUENCE IIRC). I chose -1 specifically because it's clearly invalid, but so is 0, so I don't really have a preference and can change that.

The upsert does start to become a bit hairy, if it becomes too much as we add more entity types, we'll use just a single entity id instead and use a stored procedure instead of the cascading delete.

What did you think about DEFAULT -1 or DEFAULT 0 instead of needing to use COALESCE in the queries?

Since artifact_id is a foreign key to the artifacts table, would that work transparently w/o requiring a dummy artifact with id of -1? Additionally, would ON DELETE CASCADE still work? (Honest questions, that works with a NULL value, but I'm unsure if we defaulted to -1).

Great question! I'm not sure -- I'm not used to using quite so many relational features (mostly living a NoSQL world), so I'm not sure. Testing, it appears that you're right, and if we set a REFERENCES foreign key constraint, we can't have a DEFAULT value that doesn't exist in the corresponding table.

> CREATE TABLE test_base (id SERIAL PRIMARY KEY); > CREATE TABLE test_derived (id SERIAL PRIMARY KEY, parent_id INTEGER NOT NULL REFERENCES test_base(id) ON DELETE CASCADE); > INSERT INTO test_base VALUES (1); > INSERT INTO test_derived (parent_id) VALUES (1) > INSERT INTO test_derived (parent_id) VALUES (0) insert or update on table "test_derived" violates foreign key constraint "test_derived_parent_id_fkey" > ALTER TABLE test_derived ALTER COLUMN parent_id SET DEFAULT 0 > ALTER TABLE test_derived ADD COLUMN description TEXT > INSERT INTO test_derived (description) VALUES ('Happy days') insert or update on table "test_derived" violates foreign key constraint "test_derived_parent_id_fkey" > ALTER TABLE test_derived ALTER COLUMN parent_id DROP NOT NULL > ALTER TABLE test_derived ALTER COLUMN parent_id DROP DEFAULT > INSERT INTO test_derived (description) VALUES ('Happy days') select * FROM test_derived +----+-----------+-------------+ | id | parent_id | description | |----+-----------+-------------| | 1 | 1 | <null> | | 4 | <null> | Happy days | +----+-----------+-------------+

Which is to say that you're completely right about using COALESCE.

internal/engine/executor.go

In addition to referencing repository entities from the rule_evaluation_status table, we need to reference artifacts as well. To do that, we add a new column artifact_id to the rule_evaluation_table so that we can take advantage of cascaded deletes. Because this column can be nullable, we need to COALESCE() the column to a valid integer value, but one that's not valid for the model because otherwise comparing NULL that's passed in with NULL in the conflicting row never yields true. The same COALESCE() must be applied to the unique index.

…g artifact rules Changes the signature of createOrUpdateEvalStatus() to allow storing artifact rules when artifactID is passed. Because the number of parameters was growing, also changes the signature to accept a struct type so that the parameters are easily named.

In order to list rule evaluation status for both artifacts and repositories, let's add a new parameter to the ListRuleEvaluationStatusByPolicyIdParams method that specifies the entity type. Based on that type, the DB query performs a switch-case, either returning a matching instance of one entity type or all.

When mediator receives an event notification about a new artifact, the payload doesn't contain the artifact visibility and we'll need to retreive it by calling the GH API. In order to avoid passing a string around, let's add a new attribute to the ArtifactEventPayload struct

We'll have to get additional artifact information from github and we need a new API to do that.

…artifacts In order to handle artifact webhook events, we need to reuse several functions that were previously only available in the reconciler. This patch moves the functions to the container module, making them reusable.

…ceived via a webhook event Extends the executor so that artifact version is stored properly upon receiving a webhook message informing mediator about a new package. This allows, among others, the artifact information to be displayed to the clients. Fixes: #715

The ArtifactEventPayload protobuf message was a relic of how the artifacts were initially implemented by just looking at the github API. It had several problems, mainly that it didn't reflect how artifacts are stored in the DB, supported only a single tag. Let's converge on using VersionedArtifact which just contains an artifact and one version. This way we would be able to use the same struct in webhooks, init and reconciler.

database/migrations/000001_init.up.sql

JAORMX · 2023-08-29T11:28:12Z

database/query/policy_status.sql

-    eval_status = $5,
-    details = $6,
+) VALUES ($1, $2, $3, $4, $5, $6, $7, NOW())
+ON CONFLICT(policy_id, repository_id, COALESCE(artifact_id, 0), entity, rule_type_id) DO UPDATE SET


wouldn't we want to coalesce repository_id as well?

There would be no harm in coalescing, but currently repository_id cannot be NULL I think, at least the code doesn't ever pass NULL

Do we want to add a NOT NULL constraint (possibly including a DEFAULT) for these columns, with a comment that NULL doesn't play nicely with unique indexes?

sorry, NOT NULL for what exactly? The artifact_id /is/ NULL is the DB for entities that are not artifacts, the coalesce works only for a) the unique index and b) detecting conflicts.

The schema of the artifact_signature rule_type was built with an artifact having a single tag and was built to be used through the now removed ArtifactEventPayload struct. Changing the schema allows us to specify the tags better, as a subset of container tags. Since evaluating the schema with the builtin ingester would be too complex and would require special-casing the individual keys and values, it also makes sense to switch to a new ingester created specifically for artifacts.

- change coalesce to 0 to play better with default go types - rename cli to client to distinguish from CLI - safer type cast of packageVersionName to string, to be followed up on - move deletion of old artifacts inside a transaction - guard against NULL params - more descriptive comments on how do createOrUpdateEvalStatusParams work

jhrozek · 2023-08-29T11:53:11Z

the latest push should just make the linter happy

jhrozek · 2023-08-29T12:14:15Z

Since @yrobla and @evankanderson also reviewed the PR, I'm going to give them a bit to check if their concerns have been addressed. Thanks for the review and the time spent brainstorming @JAORMX !

evankanderson

A bunch of small comments, but I won't be bent out of shape if they don't get addressed.

evankanderson · 2023-08-29T13:25:36Z

database/query/policy_status.sql

-    eval_status = $5,
-    details = $6,
+) VALUES ($1, $2, $3, $4, $5, $6, $7, NOW())
+ON CONFLICT(policy_id, repository_id, COALESCE(artifact_id, 0), entity, rule_type_id) DO UPDATE SET


Do we want to add a NOT NULL constraint (possibly including a DEFAULT) for these columns, with a comment that NULL doesn't play nicely with unique indexes?

internal/engine/executor.go

evankanderson · 2023-08-29T13:59:10Z

internal/engine/ingester/artifact/artifact.go

+// NewArtifactDataIngest creates a new artifact rule data ingest engine
+func NewArtifactDataIngest(
+	_ *pb.ArtifactType,
+) (*Ingest, error) {
+	return &Ingest{}, nil
+}


Why this signature (just curious -- are we matching some other contract)?

sort of. It's more of a kludge because I found this part of mediator code hard to follow.

If you check NewRuleDataIngest, you'll see that we're instantiating an ingester based on what ingester was configured in the rule_type definition. The ingester configuration (which might be e.g. API endpoint for the REST ingester) is represented as a protobuf message, in this case called Artifact, but because currently the artifact ingester has no configuration, the structure is empty and because it's empty, it's unused in the NewArtifactDataIngest "constructor" so I used the empty assignment name.

But I thought (and this is the kludge because I had a hard time deconstructing the code) that passing the configuration object, even if empty around might make it easier for a future developer to just change the protobuf message by adding a field there and changing the NewArtifactDataIngest function.

Hope it makes sense.

internal/engine/ingester/artifact/artifact.go

evankanderson · 2023-08-29T14:08:48Z

internal/engine/ingester/artifact/artifact.go

+	for _, tag := range cfg.Tags {
+		if slices.Contains(versionedArtifact.Version.Tags, tag) {
+			return true
+		}
+	}


Not required, but consider using the k8s sets library (already in use elsewhere) for semantic clarity on what the loop intends:

haveTags := sets.New(versionedArtifact.Version.Tags) return haveTags.HasAny(cfg.Tags...)

Note that your semantics here are a bit strange -- empty would be a HasAll, but if you have at least one element, then you're using HasAny.

I'm not sure I understand the part about HasAll ? Otherwise suggestion taken. By the way, we're debating the semantics of how do we match all tags/all names in the epic, so this might change.

internal/engine/ingester/ingester.go

pkg/container/container.go

evankanderson

It turns out that I don't like fancy relational features, even when I understand their intended value. 😁

lukehinds previously approved these changes Aug 25, 2023

View reviewed changes

JAORMX reviewed Aug 28, 2023

View reviewed changes

internal/engine/executor.go Show resolved Hide resolved

internal/engine/executor.go Outdated Show resolved Hide resolved

evankanderson reviewed Aug 28, 2023

View reviewed changes

jhrozek added 8 commits August 29, 2023 09:29

providers: Add a GetPackageVersionById call

37ccf3d

We'll have to get additional artifact information from github and we need a new API to do that.

jhrozek dismissed lukehinds’s stale review via 1bb3737 August 29, 2023 11:10

jhrozek force-pushed the artifacts_rule_eval branch from e689009 to 1bb3737 Compare August 29, 2023 11:10

JAORMX reviewed Aug 29, 2023

View reviewed changes

jhrozek added 2 commits August 29, 2023 13:52

jhrozek force-pushed the artifacts_rule_eval branch from 1bb3737 to 7e4c6ce Compare August 29, 2023 11:52

JAORMX previously approved these changes Aug 29, 2023

View reviewed changes

evankanderson previously approved these changes Aug 29, 2023

View reviewed changes

This was referenced Aug 29, 2023

Type-safe artifact attribute casting when parsing JSON input #785

Closed

Consider returning a name.Reference from go-containertools in artifactImageRef #786

Closed

review comments, take 2

2217c4f

jhrozek dismissed stale reviews from evankanderson and JAORMX via 2217c4f August 29, 2023 19:43

evankanderson approved these changes Aug 29, 2023

View reviewed changes

jhrozek merged commit c8c4e92 into mindersec:main Aug 30, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve artifact support by storing artifact_id in the rule evaluation table and store artifact information during webhook processing #760

Improve artifact support by storing artifact_id in the rule evaluation table and store artifact information during webhook processing #760

jhrozek commented Aug 25, 2023 •

edited

Loading

jhrozek commented Aug 25, 2023

lukehinds left a comment

JAORMX left a comment

jhrozek commented Aug 28, 2023

evankanderson left a comment

evankanderson Aug 28, 2023

evankanderson Aug 28, 2023

evankanderson Aug 28, 2023

jhrozek Aug 28, 2023

evankanderson Aug 29, 2023 •

edited

Loading

jhrozek Aug 29, 2023

evankanderson Aug 29, 2023

evankanderson Aug 29, 2023

JAORMX Aug 29, 2023

jhrozek Aug 29, 2023

evankanderson Aug 29, 2023

jhrozek Aug 29, 2023

jhrozek commented Aug 29, 2023

jhrozek commented Aug 29, 2023

evankanderson left a comment

evankanderson Aug 29, 2023

evankanderson Aug 29, 2023

jhrozek Aug 29, 2023 •

edited

Loading

evankanderson Aug 29, 2023

jhrozek Aug 29, 2023

evankanderson left a comment

Improve artifact support by storing artifact_id in the rule evaluation table and store artifact information during webhook processing #760

Improve artifact support by storing artifact_id in the rule evaluation table and store artifact information during webhook processing #760

Conversation

jhrozek commented Aug 25, 2023 • edited Loading

jhrozek commented Aug 25, 2023

lukehinds left a comment

Choose a reason for hiding this comment

JAORMX left a comment

Choose a reason for hiding this comment

jhrozek commented Aug 28, 2023

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson Aug 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhrozek commented Aug 29, 2023

jhrozek commented Aug 29, 2023

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhrozek Aug 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson left a comment

Choose a reason for hiding this comment

jhrozek commented Aug 25, 2023 •

edited

Loading

evankanderson Aug 29, 2023 •

edited

Loading

jhrozek Aug 29, 2023 •

edited

Loading