-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Putting it all together: Enables us to run rego rules on git contents #825
Conversation
This adds the pieces together that we need in order to run rego rules on git content. It proposes a new example rule that verifies that we have codeQL in a repository to do static analysis. In order to get this working, we had to create two custom rego rules that execute on the memoryfs that we got from the git ingester. There were two bugs that were fixes with this: * The `clone_url` was reset when registering the webhook. * We were casting to the wrong type in the git ingester.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a bunch of questions because I'm a rego n00b
@@ -60,7 +60,8 @@ webhook_id = $8, | |||
webhook_url = $9, | |||
deploy_url = $10, | |||
provider = $11, | |||
clone_url = $12, | |||
-- set clone_url if the value is not an empty string | |||
clone_url = CASE WHEN sqlc.arg(clone_url)::text = '' THEN clone_url ELSE sqlc.arg(clone_url)::text END, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I'm confused by this case. Is this using the current value of clone_url if the clone_url argument is not empty, otherwise update clone_url with the clone_url argument?
We have existing tests for repositories, could we have a test case for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When registering the webhook, the webhook event returns a bunch of repository data which we update the row with. Unfortunately, the clone_url is missing, which caused the row to be updated with an empty clone_url. This ensures we never update it to an empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest, I'm not quite sure where to test this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just thinking about a unit test like the ones we have here: https://github.com/stacklok/mediator/blob/main/pkg/db/repositories_test.go#L184
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to split this into a separate ticket, writing a test might be a good first issue for a newcomer
some i | ||
workflowstr := file.read("./.github/workflows/codeql.yml") | ||
workflow := yaml.unmarshal(workflowstr) | ||
steps := workflow.jobs.analyze.steps[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with rego, does this mean "make sure that there exists some path workflow.jobs.analyze.steps" that contains github/codeql-action/analyze@
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's right.
|
||
allow { | ||
some i | ||
workflowstr := file.read("./.github/workflows/codeql.yml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this calling the function from our library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is - are we going to have to provide a bunch of utility functions in the go code to make it possible to use rego evaluations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, file.read
does not exist in the standard library of rego, I added it and ensured it called our in-memory filesystem where we have the git repo cloned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly we'd need to write things that pertain to the constraints and assumptions we have in our running evaluation environment. We could use most of these out of the box: https://www.openpolicyagent.org/docs/latest/policy-reference/#built-in-functions
// in the filesystem being evaluated (which comes from the ingester). | ||
// It takes one argument, the path to the file to check. | ||
// It's exposed as `file.exists`. | ||
func FileExists(res *engif.Result) func(*rego.Rego) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is unused as of now right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it is registered in the mediatorRegoLib
and exposed as policy as file.exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, if you mean it's unused in a policy, that's right. It's not currently used. At least I didn't provide an example for it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just echoing @jhrozek , what I know about rego could be written on the back of a matchbox. I love the idea though and can see how this will allow us to quickly spin up policy.
One question I have, although not enough to block this...would we be able to achieve this with other policy langs (jq, cue perhaps). I am guessing not, which is why rego was selected.
@lukehinds right, rego gives us a lot more flexibility with what we can do. We could potentially use Cue would need to be explored, but it's more of a data representation language than a policy language. We could potentially build a policy representation on top of Cue, but we'd be in the same position we are now, we'd need to come up with a driver interface and write them. Potentially we could replace our YAML data structure representations with Cue; that could be explored, but it still wouldn't replace rego in this case. |
This adds the pieces together that we need in order to run rego rules on git content.
It proposes a new example rule that verifies that we have codeQL in a repository
to do static analysis.
In order to get this working, we had to create two custom rego rules that execute
on the memoryfs that we got from the git ingester.
There were two bugs that were fixes with this:
clone_url
was reset when registering the webhook.