Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Putting it all together: Enables us to run rego rules on git contents #825

Merged
merged 2 commits into from
Sep 1, 2023

Conversation

JAORMX
Copy link
Contributor

@JAORMX JAORMX commented Sep 1, 2023

This adds the pieces together that we need in order to run rego rules on git content.

It proposes a new example rule that verifies that we have codeQL in a repository
to do static analysis.

In order to get this working, we had to create two custom rego rules that execute
on the memoryfs that we got from the git ingester.

There were two bugs that were fixes with this:

  • The clone_url was reset when registering the webhook.
  • We were casting to the wrong type in the git ingester.

This adds the pieces together that we need in order to run rego rules on git content.

It proposes a new example rule that verifies that we have codeQL in a repository
to do static analysis.

In order to get this working, we had to create two custom rego rules that execute
on the memoryfs that we got from the git ingester.

There were two bugs that were fixes with this:
* The `clone_url` was reset when registering the webhook.
* We were casting to the wrong type in the git ingester.
@JAORMX JAORMX requested a review from jhrozek September 1, 2023 12:17
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a bunch of questions because I'm a rego n00b

@@ -60,7 +60,8 @@ webhook_id = $8,
webhook_url = $9,
deploy_url = $10,
provider = $11,
clone_url = $12,
-- set clone_url if the value is not an empty string
clone_url = CASE WHEN sqlc.arg(clone_url)::text = '' THEN clone_url ELSE sqlc.arg(clone_url)::text END,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I'm confused by this case. Is this using the current value of clone_url if the clone_url argument is not empty, otherwise update clone_url with the clone_url argument?
We have existing tests for repositories, could we have a test case for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When registering the webhook, the webhook event returns a bunch of repository data which we update the row with. Unfortunately, the clone_url is missing, which caused the row to be updated with an empty clone_url. This ensures we never update it to an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest, I'm not quite sure where to test this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just thinking about a unit test like the ones we have here: https://github.com/stacklok/mediator/blob/main/pkg/db/repositories_test.go#L184

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to split this into a separate ticket, writing a test might be a good first issue for a newcomer

some i
workflowstr := file.read("./.github/workflows/codeql.yml")
workflow := yaml.unmarshal(workflowstr)
steps := workflow.jobs.analyze.steps[i]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with rego, does this mean "make sure that there exists some path workflow.jobs.analyze.steps" that contains github/codeql-action/analyze@ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right.


allow {
some i
workflowstr := file.read("./.github/workflows/codeql.yml")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this calling the function from our library?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is - are we going to have to provide a bunch of utility functions in the go code to make it possible to use rego evaluations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, file.read does not exist in the standard library of rego, I added it and ensured it called our in-memory filesystem where we have the git repo cloned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly we'd need to write things that pertain to the constraints and assumptions we have in our running evaluation environment. We could use most of these out of the box: https://www.openpolicyagent.org/docs/latest/policy-reference/#built-in-functions

// in the filesystem being evaluated (which comes from the ingester).
// It takes one argument, the path to the file to check.
// It's exposed as `file.exists`.
func FileExists(res *engif.Result) func(*rego.Rego) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is unused as of now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it is registered in the mediatorRegoLib and exposed as policy as file.exists

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, if you mean it's unused in a policy, that's right. It's not currently used. At least I didn't provide an example for it yet.

Copy link
Contributor

@lukehinds lukehinds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just echoing @jhrozek , what I know about rego could be written on the back of a matchbox. I love the idea though and can see how this will allow us to quickly spin up policy.

One question I have, although not enough to block this...would we be able to achieve this with other policy langs (jq, cue perhaps). I am guessing not, which is why rego was selected.

@JAORMX
Copy link
Contributor Author

JAORMX commented Sep 1, 2023

@lukehinds right, rego gives us a lot more flexibility with what we can do. We could potentially use jq to process a file in a directory (we'd need some modifications to the JQ evaluator), but it would be quite limited in that we would need to point to a specific file. With rego we get the flexibility that we can expand and traverse all files in a directory.

Cue would need to be explored, but it's more of a data representation language than a policy language. We could potentially build a policy representation on top of Cue, but we'd be in the same position we are now, we'd need to come up with a driver interface and write them.

Potentially we could replace our YAML data structure representations with Cue; that could be explored, but it still wouldn't replace rego in this case.

@JAORMX JAORMX merged commit d8e4171 into main Sep 1, 2023
13 checks passed
@JAORMX JAORMX deleted the git-rego-rule branch September 1, 2023 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants