Add the protobuf message and the selector API #3797

jhrozek · 2024-07-05T19:28:50Z

Summary

Define the protobuf API for selectors - Creates a new protobuf message SelectorEntity that represents a set of attributes for a typed entity to be used in profile selectors.

At the moment, onyl SelectorRepository and SelectorArtifact are used and implemented. Additionally, the key-value store is not implemented either as it requires a new set of provider interfaces to fill this store.

Add the selector evalautor - Adds a new module called selectors that initializes CEL environments and evaluates selector

Fixes #3757

Change Type

Mark the type of change your PR introduces:

Bug fix (resolves an issue without affecting existing features)
Feature (adds new functionality without breaking changes)
Breaking change (may impact existing functionalities or require documentation updates)
Documentation (updates or additions to documentation)
Refactoring or test improvements (no bug fixes or new functionality)

Testing

there are unit tests included. I also cherry-pick these from a larger branch that takes this code into account

Review Checklist:

Reviewed my own code for quality and clarity.
Added comments to complex or tricky code sections.
Updated any affected documentation.
Included tests that validate the fix or feature.
Checked that related changes are merged.

coveralls · 2024-07-05T19:37:25Z

coverage: 52.505% (+0.3%) from 52.204%
when pulling 2aeb88c on jhrozek:selectors_api
into 6274fe5 on stacklok:main.

proto/minder/v1/minder.proto

internal/engine/selectors/selectors.go

evankanderson · 2024-07-08T17:08:49Z

proto/minder/v1/minder.proto

+    // the name of the entity, same as the name in the entity message
+    string name = 2;


Is the name of the entity globally unique, or do we need a provider to scope the name (or do we not care about uniqueness guarantees here)?

I wanted to add the provider name as a separate attribute alongside name when we add fetching of the key-value pairs from the provider. Since you suggested to just use full_name were you thinking to make the provider name part of the full name as well?

I like how unambiguous that would be, but if we don't provide a very easy way "just ignore this repo" especially from the UI then the grammar gets less useful to the user.

I think we might want one property for the full_name, and one property for the provider. That would allow for full distinguishing, but would default to names that were recognizable to people without some sort of special prefix.

It might also simplify "apply this policy to all billing/fooserver images at all registries under Minder automation"

I added a provider message to the messages is that what you were describing? There's name and provider class exposed for start.

question (non-blocking): is there a particular reason why we're not calculating an exact, possibly unambiguous name and storing it alongside the entity? It really feels like we should have for repositories something like a URI/purl alongside repo name, owner, and provider, something like <provider>:<type>:<owner>:<name>, e.g. github-app-foo:repository:stacklok:minder. On top of that, we can define custom functions to ship with the CEL evaluator that operate on top of that.

We can do this as well and Evan suggested something similar. Good idea, thanks.

proto/minder/v1/minder.proto

internal/engine/selectors/selectors.go

jhrozek · 2024-07-11T20:36:41Z

thanks for the reviews @evankanderson @blkt . I'm going to resolve the comments that I think are resolved (but feel free to unresolve if you don't think they are fully addressed) and I'm going to comments on parts of the code where I'm unsure that I'm going in the right direction.

coveralls · 2024-07-11T20:44:27Z

coverage: 54.175% (+0.2%) from 53.955%
when pulling fdf1ff6 on jhrozek:selectors_api
into 079001e on stacklok:main.

jhrozek · 2024-07-11T20:49:22Z

internal/proto/internal.proto

+  string name = 1;
+  // the class of the provider, e.g. github-app
+  string class = 2;


This message is new in this PR version. It's just a way to avoid repeating what attributes we want to expose for provider.

Why is this both in SelectorRepository/SelectorArtifact and in SelectorEntity? It feels like both copies of the message should have the same contents, so maybe we should have a canonical place?

See the example here. Since the individual selectors can either be targetting a specific entity or any entity (but what is being evaluated is always an specific entity), the canonical place is the entity-specific message (e.g. SelectorRepository). What's stored in the generic message (SelectorEntity) is just a copy of the common data.

Is that confusing as a concept or would it help to make it crisper in the code in the sense that the provider methods that fill in the selector entity struct just fill in the entity-specific message and the generic attributes are copied by a common function?

internal/engine/selectors/selectors.go

jhrozek · 2024-07-11T20:54:21Z

internal/providers/selectors/interface.go

+//go:generate go run go.uber.org/mock/mockgen -package mock_$GOPACKAGE -destination=./mock/$GOFILE -source=./$GOFILE
+
+// RepoSelectorConverter is an interface for converting a repository to a repository selector
+type RepoSelectorConverter interface {


PTAL here. This is a provider-private interface, meaning it's not part of the public provider interface in pkg/ but it's in internal - the messages are stored in the private proto file as well.

This hopefully transfers the onus of creating the SelectorEntity message from the selector interface to the providers where the user of the selector module (typically the evaluator engine) just calls the provider instance to retrieve a SelectorEntity for this entity from its provider.

In a follow-up, I'd like to also add the key-value map here, but this PR uses just the attributes in the SelectorEntity message and the per-entity nested messages.

jhrozek · 2024-07-11T20:55:28Z

internal/providers/selectors/selector_entity.go

+)
+
+// entityInfoConverter is an interface for converting an entity from an EntityInfoWrapper to a SelectorEntity
+type entityInfoConverter interface {


just noticed the interface is misnamed, it's no longer tied to EntityInfoWrapper, but just a pair an entity as a proto.message and entity type.

internal/providers/selectors/selector_entity.go

jhrozek · 2024-07-11T20:57:07Z

internal/providers/selectors/selector_entity.go

+		converters: map[minderv1.Entity]entityInfoConverter{
+			minderv1.Entity_ENTITY_REPOSITORIES: newRepositoryInfoConverter(provider),
+			minderv1.Entity_ENTITY_ARTIFACTS:    newArtifactInfoConverter(provider),
+		},


@puerco this is one of the things I don't like here, sure the code doesn't use a straight-up switch-case but a factory which is somewhat nicer, but we still hardcode the entities we know about..
I don't know if this is a good direction.

We have a bunch of places where we switch over entity types. We might implement some sort of registration mechanism alla init(), something like this.

package custom import ( "github.com/stacklok/minder/internal/entities ) init() { err := entities.Register(customv1.Entity_MY_CUSTOM_ENTITY, &CustomEntity{}) if err != nil { log.Fatalf("failed registering custom entity") } } type CustomEntity struct { } func (c *CustomEntity) Converter(...) error { return nil }

The nice thing about this is that this could be dynamically loaded as a plugin, or be otherwise loaded in the system.
Could this be a decent tool for this purpose?

this is pretty much along the lines of what we discussed yesterday on Slack with @puerco . I'm not sure if the implementation will be a plugin (perhaps we should go all the way and use RPC plugins), that's a discussion we should have separately, but I agree that a registry of entities would be nice.

jhrozek · 2024-07-11T20:58:41Z

@dmjb I'd appreciate your thoughts here as well, you usually have a good direction on how to split the interfaces nicely.

evankanderson

It would help me a bit to see (in a comment) what this looks like wired in. I'm assuming that we'll have something like the following in executor.EvalEntityEvent:

	// entityProto is a SelectorEntity proto
	entityProto, err := provider.GetEntityObject(inf)
	// ...
	err = e.forProjectsInHierarchy(
		ctx, inf, func(ctx context.Context, profile *pb.Profile, hierarchy []uuid.UUID) error {
			selector := selectors.SelectorForProfile(profile)
			if !selector.Select(entityProto) {
				// record as skipped
				return nil  // or SkippedErr
			}
			// Get only these rules that are relevant for this entity type
			relevant, err := profiles.GetRulesForEntity(profile, inf.Type)
			if err != nil {
				return fmt.Errorf("error getting rules for entity: %w", err)
			}

But, this formulation suggests that the Env and SelectionBuilder interfaces shouldn't need to take a minderv1.Entity as a property, and should simply spit out a selector that works on generic SelectorEntity protobufs.

evankanderson · 2024-07-11T21:03:36Z

internal/proto/internal.proto

+  // one of repository, pull_request, artifact (see oneof entity)
+  minder.v1.Entity entity_type = 1;
+  // the name of the entity, same as the name in the entity message
+  string name = 2;
+  SelectorProvider provider = 3;
+
+  oneof entity {
+    SelectorRepository repository = 4;
+    SelectorArtifact artifact = 5;
+  }
+}


This is an internal implementation detail that currently makes it easier to implement CEL, right?

I'm concerned over time that encoding all the properties into a protocol buffer in core Minder will make it harder for new providers to add properties that make sense, but we can cross that bridge when we get there. (The alternative is that it's easy for providers to add new properties, but little uniformity across providers -- this pushes us on the higher-friction-but-more-commonality path, which might even be right.)

This is an internal implementation detail that currently makes it easier to implement CEL, right?

yes

I'm concerned over time that encoding all the properties into a protocol buffer in core Minder will make it harder for new providers to add properties that make sense, but we can cross that bridge when we get there. (The alternative is that it's easy for providers to add new properties, but little uniformity across providers -- this pushes us on the higher-friction-but-more-commonality path, which might even be right.)

Right, I hear you. The properties that are expressed as attributes are only those that those entities will most commonly have (names, is_fork/is_private and those are optional bool, too). I expect any provider-specific attributes to be just key-value pairs in a map.

We could even go with just a map, but the nice thing about passing proto messages to CEL is that you CEL can check if the attributes exist or warn about mismatch between entity types and the attribute. I think it would be nice to surface these to the user when they create the selectors.

We won't have those for a generic map.

evankanderson · 2024-07-11T21:04:57Z

internal/proto/internal.proto

+  string name = 1;
+  // the class of the provider, e.g. github-app
+  string class = 2;


Why is this both in SelectorRepository/SelectorArtifact and in SelectorEntity? It feels like both copies of the message should have the same contents, so maybe we should have a canonical place?

evankanderson · 2024-07-11T21:06:57Z

internal/proto/internal.proto

+  // not applicable to this provider
+  optional bool is_fork = 3;
+  // is_private is true if the repository is private, nil if "don't know" or rather
+  // not applicable to this provider
+  optional bool is_private = 4;


Do we want to include the "Custom properties" key/values that GitHub supports under a name like "tags", "labels", or "properties"?

yes, I wanted to do that in a follow-up but perhaps it makes sense to include the k-v property already to make the review easier in the sense that we'd see the whole message.

I added the properties to the message, filling them in would be done in a follow-up

evankanderson · 2024-07-11T21:08:08Z

internal/proto/internal.proto

This is to allow for access as both name and repository.name?

Maybe I should have spelled this out more prominently in the design doc. There's some examples in the unit tests

and here's a simple inline example. Given these selectors:

- comment: This should not apply to bad-go entity: repository selector: repository.name != 'jakubtestorg/bad-go' - comment: Skip any entity that contains demo in the name selector: entity.name.contains('demo') == false

The first one applies to repositories only, in this case the selector uses repository.name, but it could use also repository.provider.name/class, repository.is_fork or repository.is_private.

The second one applies to any entity type. It can use only attributes from the SelectorEntity message which currently are entity.name, entity.provider.name and entity.provider.class. (Let's talk about adding the purl or global name separately).

In both the entity-generic message and the entity-specific message we'll also add the k-v map.

evankanderson · 2024-07-11T21:09:16Z

internal/providers/github/selector_entity.go

+
+// ArtifactToSelectorEntity converts an Artifact to a SelectorEntity
+func (_ *GitHub) ArtifactToSelectorEntity(_ context.Context, a *minderv1.Artifact) *internalpb.SelectorEntity {
+	fullName := fmt.Sprintf("%s/%s", a.GetOwner(), a.GetName())


Should this include ghcr.io/? (I'm not sure)

I would say that not the name attribute, but I agree with both you and @blkt (in this comment) that we should have a global name as well.

I think just based on what users would expect in the simplest case we could have:

.name - for repos this would be orgname/reponame

.full_name - this could be a clone name perhaps. For repos, https://github.com/org/repo for artifacts registry_name/namespace/artifact_name

.id (help me come with better name :-)) - a minder-specific identifier. I liked @blkt's suggestion

btw note that as long as we expose the individual attributes the user can also combine them with logical operations in the CEL expression so we don't have to go overboard with providing too many one-off attributes.

evankanderson · 2024-07-11T21:28:19Z

internal/proto/internal.proto

Do we want a server component here, e.g. ghcr.io or eu.gcr.io?

yes, good point. I'm not sure if we store them for artifacts (or for repos for that matter, we do have a clone_url for repos), but I like the idea.

Shouldn't that be part of the provider information?

We can add that iteratively in another PR.

internal/providers/selectors/selector_entity.go

jhrozek · 2024-07-12T11:48:42Z

It would help me a bit to see (in a comment) what this looks like wired in. I'm assuming that we'll have something like the following in executor.EvalEntityEvent:
	// entityProto is a SelectorEntity proto
	entityProto, err := provider.GetEntityObject(inf)
	// ...
	err = e.forProjectsInHierarchy(
		ctx, inf, func(ctx context.Context, profile *pb.Profile, hierarchy []uuid.UUID) error {
			selector := selectors.SelectorForProfile(profile)
			if !selector.Select(entityProto) {
				// record as skipped
				return nil  // or SkippedErr
			}
			// Get only these rules that are relevant for this entity type
			relevant, err := profiles.GetRulesForEntity(profile, inf.Type)
			if err != nil {
				return fmt.Errorf("error getting rules for entity: %w", err)
			}
But, this formulation suggests that the Env and SelectionBuilder interfaces shouldn't need to take a minderv1.Entity as a property, and should simply spit out a selector that works on generic SelectorEntity protobufs.

@evankanderson I keep a branch with patches that will be sent later. The wiring looks like this in executor:

			// this could be a private function for now and we could just select an inf
			selectors, err := e.selBuilder.NewSelectionFromProfile(inf.Type, profile.Selection)
			if err != nil {
				return fmt.Errorf("error creating selectors: %w", err)
			}
			selected, err := selectors.Select(provsel.EntityToSelectorEntity(ctx, provider, inf.Type, inf.Entity))
			if err != nil {
				return fmt.Errorf("error selecting entity: %w", err)
			}

evankanderson · 2024-07-12T13:32:10Z

Why does extracting the entity properties need to know the profile? Naively, I'd expect that we could extract the entity properties once per evaluation, not once per profile in the evaluation.

jhrozek · 2024-07-12T13:44:03Z

Why does extracting the entity properties need to know the profile? Naively, I'd expect that we could extract the entity properties once per evaluation, not once per profile in the evaluation.

(not 100% sure I'm answering your question without more context)

it doesn't, we can call the extraction earlier. We should also only build the entity properties (the selector message) once we hit the first profile with non-zero selectors because extracting the entity properties might involve calling to the provider (e.g. to get the key-value pairs which we don't store in the database) which might be expensive.

internal/engine/selectors/selectors.go

internal/proto/internal.proto

Adds a new module called selectors that initializes CEL environments and evaluates selectors. Fixes: mindersec#3757

jhrozek requested a review from a team as a code owner July 5, 2024 19:28

blkt reviewed Jul 8, 2024

View reviewed changes

evankanderson reviewed Jul 8, 2024

View reviewed changes

jhrozek mentioned this pull request Jul 10, 2024

Move messages only used to generate Go structs out of minder.proto #3830

Merged

10 tasks

jhrozek force-pushed the selectors_api branch from 2aeb88c to 35e207a Compare July 11, 2024 20:35

jhrozek commented Jul 11, 2024

View reviewed changes

internal/engine/selectors/selectors.go Show resolved Hide resolved

jhrozek commented Jul 11, 2024

View reviewed changes

internal/providers/selectors/selector_entity.go Show resolved Hide resolved

jhrozek commented Jul 11, 2024

View reviewed changes

evankanderson reviewed Jul 11, 2024

View reviewed changes

jhrozek force-pushed the selectors_api branch from 35e207a to 14024f2 Compare July 18, 2024 11:09

JAORMX reviewed Jul 18, 2024

View reviewed changes

internal/engine/selectors/selectors.go Show resolved Hide resolved

internal/engine/selectors/selectors.go Show resolved Hide resolved

JAORMX reviewed Jul 18, 2024

View reviewed changes

internal/proto/internal.proto Outdated Show resolved Hide resolved

jhrozek force-pushed the selectors_api branch 2 times, most recently from b6e2cfc to fdf1ff6 Compare July 18, 2024 19:55

jhrozek added 5 commits July 19, 2024 09:01

Define the protobuf API for selectors

b6885ef

Add the selector evalautor

5200318

Adds a new module called selectors that initializes CEL environments and evaluates selectors. Fixes: mindersec#3757

Move converting entities to SelectorEntity structs to providers

b24ad32

Support properties in selectors

916fe15

Change the selectors to google.protobuf.Struct

dabb218

jhrozek force-pushed the selectors_api branch from fdf1ff6 to dabb218 Compare July 19, 2024 07:07

JAORMX approved these changes Jul 19, 2024

View reviewed changes

jhrozek merged commit 8d7f92f into mindersec:main Jul 19, 2024
21 checks passed

		// the name of the entity, same as the name in the entity message
		string name = 2;

Add the protobuf message and the selector API #3797

Add the protobuf message and the selector API #3797

Conversation

jhrozek commented Jul 5, 2024

Summary

Change Type

Testing

Review Checklist:

coveralls commented Jul 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blkt Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhrozek commented Jul 11, 2024

coveralls commented Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blkt Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhrozek commented Jul 11, 2024

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jhrozek commented Jul 12, 2024

evankanderson commented Jul 12, 2024

jhrozek commented Jul 12, 2024

blkt Jul 12, 2024 •

edited

Loading

coveralls commented Jul 11, 2024 •

edited

Loading

blkt Jul 12, 2024 •

edited

Loading